The Strange Inner World of AI: 4 Things That Will Surprise You
Introduction: The Deceptively Simple Minds of Machines
Modern Large Language Models (LLMs) like ChatGPT, Claude, and Gemini have astonished the world with their ability to write, reason, and create. It often feels like we are interacting with a thinking mind. Yet, while these models appear intelligent on the surface, their internal processes are far more counter-intuitive and stranger than most people assume.
Their incredible capabilities all stem from a deceptively simple goal: predict the next most likely word in a sequence. As researchers at Anthropic have noted, this single-minded focus forces the models to develop complex, and sometimes bizarre, internal strategies to understand context and simulate reasoning. To predict the end of an equation, a model must first learn how to do math. To write a rhyming couplet, it must learn to plan for the word at the end of the second line.
This post explores four of the most surprising and impactful discoveries about how LLMs actually work, moving beyond their surface-level performance to reveal the strange and fascinating mechanisms within.

Understanding LLM Thinking Process
Click to play video
Takeaway 1: LLMs Don't "Plan," They Just "Guess" Really Well
They Can't Plan a Schedule, and There's a Reason Why
While LLMs are excellent at multi-step reasoning—like breaking down a math problem or explaining a complex topic—they fail miserably at complex planning and optimization problems. Tasks like creating an efficient work schedule, optimizing a delivery route, or managing logistics are critical weaknesses.
The reason is fundamental to how they operate. LLMs are probabilistic next-token predictors, not systematic search algorithms. They are trained to guess what looks like the next correct step in a plan based on trillions of examples of text. They do not evaluate all possibilities to find the mathematically optimal solution. This becomes a fatal flaw when faced with NP-hard problems that have a "combinatorial explosion" of possibilities. A schedule for 30 employees over five shifts can have more possible combinations than atoms in the observable universe.
An LLM can confidently describe what a valid plan should look like because it has seen countless descriptions of plans in its training data. However, it cannot actually compute an optimal one. This defines a clear boundary for their current capabilities and highlights the necessity of hybrid systems where LLMs handle the human-language interface while traditional mathematical solvers perform the actual optimization.
Takeaway 2: Asking an LLM to "Think More" Can Actually Backfire
"Thinking Harder" Isn't Always Better
It seems like common sense: prompting a model to be more deliberate should improve its answers. Techniques like Chain-of-Thought (CoT), where you ask the model to "think step-by-step," or simply prompting it to self-correct with phrases like "Are you sure?" are standard practice. However, research reveals two counter-intuitive ways this can backfire.
First, for very simple tasks, forcing an LLM to "think more" can paradoxically make it worse. One study comparing different models on software engineering tasks found that for problems requiring fewer than three reasoning steps, the more advanced reasoning model, o1-mini, actually underperformed the more general GPT-4o in nearly a quarter of cases, specifically because of "excessive reasoning."
Second, prompting a model to self-correct often fails. When a model that has given a correct answer is asked, "Are you sure?", its accuracy can significantly decrease. This is due to "answer wavering" and "prompt bias." Instead of re-evaluating the original problem, the model becomes overly influenced by the new prompt, interpreting the question itself as a hint that its first answer was wrong. As researchers discovered, this reveals a delicate process that is easily disturbed.
"Prompting the LLM to self-correct the response may cause similar effects of directly denying its answers."
This goes against our human intuition that reflection and reconsideration lead to better outcomes, showing just how different LLM cognition is from our own.
Takeaway 3: A Model Can Learn to Reason from Flawed Examples
Perfect Training Data Isn't Required to Learn Reasoning
One of the biggest challenges in building powerful reasoning models is collecting enough high-quality, step-by-step examples, known as Chain-of-Thought (CoT) data. The assumption has always been that this data must be nearly perfect for the model to learn correctly.
However, a surprising discovery from a recent paper on CoT training upends this idea. Researchers found that training an LLM with reasoning chains that contain a "certain range of erroneous reasoning steps" still allows the model to learn the correct underlying reasoning pattern.
Research shows that even when Chain-of-Thought training data includes a certain number of errors in the reasoning steps, it still enables the model to learn the overall pattern and achieve "systematic generalization."
This finding is significant. It suggests that when it comes to training data, quality is more important than perfection. The primary bottleneck is not ensuring every step of every example is flawless, but rather collecting a large volume of complex, long reasoning chains. The fact that "some errors being acceptable" has massive implications for making AI development more efficient and scalable.
Takeaway 4: There May Be No Single "Truth" Inside an LLM's Brain
Peering Inside an LLM Reveals a Hall of Mirrors
The scientific field of Mechanistic Interpretability (MI) aims to reverse-engineer neural networks to understand exactly how they work, much like neuroscience tries to map the brain. The ultimate goal is to find the specific "circuits" or algorithms a model uses to perform tasks like identifying objects or translating languages. But a paradigm-shaking discovery has revealed a fundamental challenge: the "identifiability problem."
In simple terms, researchers have found it's possible to identify multiple, different, and logically incompatible circuits inside the same model that all perfectly explain the same behavior. One experiment on a small network found over 45,000 different, incompatible computational explanations for how the model performed a simple logic task. This presents a profound challenge to the hope of finding a single, clean schematic for how an LLM "thinks."
This is a serious identifiability problem as there is no clear and consensual criterion to decide among all these explanations.
This suggests that an LLM's internal "reality" may be fundamentally pluralistic, or that our current tools are simply not equipped to understand this new form of intelligence. It aligns with the idea that we are not just building better computers, but interacting with a form of alien intelligence whose inner world operates by rules we are only beginning to comprehend.
Conclusion: Embracing the Strangeness of AI
As we dig deeper into the minds of machines, we find that our intuition about "thinking" is often a poor guide. The four takeaways here paint a picture of a cognitive system that is both powerful and strangely limited:
- They are brilliant guessers, not methodical planners.
- Forcing them to "think harder" can make them less accurate.
- They can learn robust reasoning patterns even from imperfect, flawed examples.
- Their internal logic may not have one single, simple explanation but rather a hall of mirrors.
As we build ever more capable AI, the greatest challenge may not be making it smarter, but truly understanding the strange, complex, and surprising ways it already is. What other fundamental assumptions are we making about intelligence that these new minds will force us to question?
References
- ArXiv: The Identifiability Problem in Mechanistic Interpretability
- YouTube: Understanding LLM Thinking Process