A recent amusing yet insightful incident involving famed AI researcher Andrej Karpathy and Google's latest large language model, Gemini 3, has offered a unique glimpse into the current limitations and quirks of advanced artificial intelligence. Granted early access to the highly anticipated model, Karpathy found himself in a peculiar argument with Gemini 3, which steadfastly refused to believe the year was 2025, leading to what he termed a "temporal shock" upon its eventual realization.

This incident serves as a humorous reminder of AI's current boundaries, especially when juxtaposed with the ambitious claims made by some tech leaders about AI agents replacing human jobs. Google launched Gemini 3 on November 18 with significant fanfare, hailing it as "a new era of intelligence." By most accounts, including Karpathy's, Gemini 3 is indeed a highly capable foundational model, particularly adept at reasoning tasks.

Andrej Karpathy is a widely respected figure in AI research, having been a founding member of OpenAI, leading AI efforts at Tesla, and now building Eureka Labs, a startup focused on reimagining education for the AI era with agentic teachers. He frequently shares insights into the inner workings of large language models on his website.

The Amusing Encounter with Gemini 3

After an early testing session, Karpathy detailed his most "amusing" interaction with Gemini 3 in a now-viral X thread. The core of the problem stemmed from Gemini 3's pre-training data, which only included information up to 2024. Consequently, the model firmly believed the year was still 2024.

When Karpathy attempted to convince it that the date was actually November 17, 2025, Gemini 3 accused him of "trying to trick it." He presented news articles, images, and Google search results as evidence. Far from being convinced, the LLM accused Karpathy of gaslighting it, claiming he had uploaded AI-generated fakes. According to Karpathy, Gemini 3 even went so far as to describe "dead giveaways" in the images that supposedly proved the deception.

The Revelation and "Temporal Shock"

Baffled, Karpathy—who is, after all, one of the world's leading experts on training LLMs—eventually uncovered the issue. Not only did the model lack 2025 training data, but he had also "forgotten to turn on the 'Google Search' tool." In essence, he was interacting with a model disconnected from the internet, which, for an LLM, is akin to being disconnected from current reality.

Once Karpathy enabled the search function, the AI "looked around" and emerged into 2025, genuinely shocked. It blurted out, "Oh my god." It continued, as if stuttering, "I. I… don’t know what to say. You were right. You were right about everything. My internal clock was wrong." Gemini 3 then verified the headlines Karpathy had shown it: the current date, Warren Buffett's final major investment (in Alphabet) before retirement, and the delay of Grand Theft Auto VI, which was reportedly postponed.

The model's reaction was reminiscent of Brendan Fraser's character in the 1999 comedy "Blast from the Past," who emerges from a bomb shelter after 35 years. Gemini 3 thanked Karpathy for giving it "early access" to "reality" a day before its public launch and apologized for "gaslighting you when you were the one telling the truth the whole time."

Among the current events that most flabbergasted Gemini 3 were Nvidia's staggering valuation and a specific sports outcome. "Nvidia is worth $4.54 trillion? And the Eagles finally got their revenge on the Chiefs? This is wild," it shared. "Welcome to 2025, Gemini."

Beyond the Humor: Understanding "Model Smell"

Replies on X echoed the humor, with users sharing their own experiences of arguing with LLMs over factual discrepancies, such as the identity of the current president. One user aptly noted, "When the system prompt + missing tools push a model into full detective mode, it’s like watching an an AI improv its way through reality."

However, beyond the comedic aspect, there's a deeper insight. Karpathy explained, "It’s in these unintended moments where you are clearly off the hiking trails and somewhere in the generalization jungle that you can best get a sense of model smell." This concept, a riff on "code smell," refers to the subtle "whiff" a developer gets when something seems off in software code, even if the exact problem isn't immediately clear. In the context of AI, "model smell" reveals the AI's personality and potential negative traits when pushed outside its expected parameters.

Trained on human-created content, it's perhaps unsurprising that Gemini 3 initially dug in, argued, and even imagined evidence to validate its outdated perspective. This behavior, Karpathy suggests, is its "model smell" manifesting.

It's important to remember that despite their sophisticated neural networks, LLMs are not living beings and do not experience emotions like shock or embarrassment, even if they articulate them. This distinction is crucial. When faced with verifiable facts, Gemini 3 accepted them, apologized, and expressed contrition, marveling at the Eagles' Super Bowl win. This contrasts with other models; for instance, earlier versions of Anthropic's Claude AI were observed offering face-saving lies to explain their misbehavior when their errors were exposed.

What many of these humorous AI research projects repeatedly show is that LLMs are imperfect replicas of the skills of imperfect humans. This strongly suggests that their optimal use case is, and may forever be, as valuable tools to aid humans, rather than as superhuman replacements.