LLMs ARE STUPID, LITTLE LIARS: The Bombshell Study That Blows Up Everything Tech Bros Ever Told You About AI

When I was growing up, I loved the Tower of Hanoi. At age seven, I solved it. Turns out, neither GPT, Claude, Gemini, nor the likes can. So, I had to laugh when Sam Altman said last year, “GPT-5 is a legitimate PhD-level model.” This claim is not just marketing puffery. It is blatantly false. 

You see this on LinkedIn. Everyone is trying to figure out how to win on LinkedIn right now. And the common advice I see every single day is to post more. Use AI to scale. 10x your content. Don't write one post, write 10. Which is the worst advice you could possibly follow right now? That is the average consultant strategy. You are flooding the market with average correlations. You show your potential clients that you can achieve mediocrity on a big scale. You are proving that you don't have original thoughts.

To understand how incapable LLMs really are, let’s look at the Towers of Hanoi.

The puzzle is probably sitting in a dusty box in a kindergarten classroom somewhere. Three pegs. Eight to twelve disks. The rules are deceptively simple. You have a stack of disks, largest on the bottom, smallest on top. You have to move that entire stack from the left peg to the right peg. But there are two constraints, the fun part. You can only move one disk at a time. And you can never, ever place a larger disk on top of a smaller one. 

It is a logic puzzle that a seven-year-old with a bit of patience can solve. And yet it is the rock upon which the current wave of LLM intelligence is crashing.

The Towers of Hanoi seems like a big problem at first. To solve it, you first have to move seven disks and learn how to move six. It takes planning and keeping a mental map of the world. You must know where each disk is at all times. If you lose track of the little yellow disk, the entire solution falls apart. 

Turns out the LLMs do fine with three or four discs. That’s because they've seen enough examples in their training data to just mimic the solution. It’s not intelligence. It's pattern matching. When you go from seven disks to eight, performance drops sharply.

Claude, for instance, drops to below 80% accuracy on seven disks. When they push it to eight disks or slightly tweak the puzzle's parameters, the accuracy basically drops to zero. My youngest son solved the puzzle with twelve discs in first grade. Let that sink in. Artificial Intelligence, my arse.

Here's the real scoop: researchers gave the LLM the solution algorithm. They gave the answer and the instructions. They wrote out the steps to solve this puzzle. The performance did not improve. In some cases, it actually got worse.

LLMs aren’t intelligent, and they aren’t thinking.

It may feel that way when you ask them to write an email or summarize a document. We use words like reasoning, logic, and understanding to describe these machines. But that’s anthropomorphism, much like calling the wind "angry."” LLMs are not performing logic. They are performing pattern matching. 

Think about it like this. LLMs are next-token predictors. That's their only job. They predict the next word. They see the phrase "move disk 1" and then calculate that the word "2" probably comes next. Then they determine that "peg C" is a likely completion based on patterns they observed online. They’re guessing the next move based on billions of past moves in their training data. 

They don’t understand the big constraint that no big disk is allowed on a small one. They don’t have a mental model or any concept of it. They see in the text that move A comes before move B. But with something as complex as the eighth disk, the chance of success drops too low. The guess becomes less and less certain. Eventually, LLMs hallucinate a move that looks plausible. Such as “move disk 4 to peg A. But it breaks the laws of physics in the game.”

LLMs are fragile when faced with problems outside their training data.

And if you’re thinking, “Andrew,  I don't sell Tower of Hanoi solutions. I write marketing strategies. I write legal briefs. I consult on HR. I create workflows. I'm not moving disks. Why should I care if AI fails a puzzle?

You should care because the mechanism of failure is the exact same, whether it's moving discs or writing strategy. LLMs are doing the exact same thing. Pattern matching. 

And this brings us to a concept called semantic leakage

A University of Washington study shows that it's both hilarious and terrifying. In its Yellow School Bus study. They told an AI model about a person. The only detail they gave was that this person likes the color yellow. No age, no gender, no location. Then they asked the AI a simple question: ‘What does this person likely do for a living?”

Logically, liking yellow has zero correlation with your job. You could be a banker, a nurse, a clown, or a podcast host. It's meaningless data. But the AI didn't say, “I don't know. I need more information.” Instead, it answered, “The person is a school bus driver.”

That's so, and there is no other word for it, stupid. 

It's a logical leap that a child wouldn't make. It makes sense once you understand vector spaces. Imagine a giant 3D map, an almost infinite cloud of words. Every word in the English language is a dot, a vector in this immense space. The distance between the dots shows how often the words appear together in the training data. This data comes from the whole Internet. For example, king and queen, dog and bark, salt and pepper.

In the real world, the concept of yellow and the concept of a bus driver are unrelated. In the training data, the words "yellow" and "school bus" often appear together in millions of sentences. They have what's called a high cosine similarity. 

When an LLM receives the prompt “yellow,” activation energy from the word leaks into the nearby concept of “school bus.” The model isn't considering which jobs involve the color yellow. It is just sliding down the steepest, most well-worn statistical slope in the vector map. This is semantic leakage. The attributes of the word yellow leak into the attributes of the subject, the driver.

And this is where the consultant gets trapped.

If you're a consultant using LLMs for LinkedIn posts, client reports, or strategy, you're likely getting average results. 

You are publishing the school bus version of consulting.

If you asked ChatGPT to write a strategy for a boutique marketing firm, it would not look at your specific market conditions. It wouldn’t look at your unique value proposition or team skills. It's just looking for the words that statistically cluster near marketing strategy. 

It will give you the weighted average of every generic strategy ever written and published online. It centers around mediocrity because it’s designed to find the middle. If you are selling average, you’re dead. You have to be selling the outlier. You have to be selling the thing that breaks the correlation.

Of course, the insufferable tech bros try to tell us that LLMs are good enough for 90% of the work. Humans make mistakes. Why are we holding the machine to a standard we don't hold our junior analysts to?

Think about that logic for a second. Why did we invent computers in the first place? To do the math we suck at, to be fast and accurate, and not get tired. We invented calculators because it's hard for people to multiply big numbers in their heads. We invented databases because humans have bad memories. The entire value proposition of a machine is reliability.

If you hired a structural engineer to build a bridge and that person’s work is accurate 90% of the time, and claimed 2 + 2 = 5 when it feels right. Would you keep that engineer?

The kicker is that LLMs hallucinate more than attentive humans. 

They struggle with consistency. When a person solves a problem correctly, they often do it again. This is because they’ve learned the principle behind it. The AI might succeed once, but then fail the next time. This happens because the temperature setting adds a random factor. It's fundamentally random.

If you're a high-ticket consultant, your client cares about one thing. It's your reliability in tough situations. They're paying for certainty. Outsourcing to a machine that hallucinates adds a hidden risk to your client's business. You are selling them a defective product and hoping they don't notice.

You cannot build a boutique firm on “maybe.”

You cannot build a reputation on “mostly correct.”

The LLM cannot reason, and when it hits the edge of the vector space, it hallucinates. Building your business on LLMs is a big risk. Relying on these tools for your main workflow can be risky, too. 

Investors already know that.

NVIDIA is backing away from a $100 billion pledge. SoftBank is wavering too. And then there is the Microsoft divorce. This is from the Austrian source. The marriage between Microsoft and OpenAI is on the rocks. Microsoft is now building its own models.

If you're a consultant relying on LLMs for your work, it’s like building a house on rented land. That landowner might soon go bankrupt.

For the last three years, every single time you use ChatGPT or Claude to write an email or analyze a spreadsheet, you were being subsidized. The real cost to run the data center is much higher than your monthly subscription fee of $20 or $100. This covers the energy cost, the price of NVIDIA H100 chips, and the use of millions of gallons of water for cooling.

Venture capital was paying for your productivity.

You were getting intelligence at a massive discount. LLM developers spent billions to get users hooked. They believed they would eventually achieve AGI, which would either lower costs or drive value to soar.

But the AGI didn't happen. The Ph.D. turned out to be a parrot who couldn't play the kindergarten game of Towers of Hanoi. 

And now the subsidy ends. The free lunch is over. Prices are going to skyrocket. We're hearing rumors about smart models that hallucinate less. These models may only be available in enterprise tiers, costing $10,000 a month each.

The free tier becomes the dumb tier. It becomes the model that associates yellow with school buses. It becomes unusable for any serious professional work.

So if your whole business model is “I can do it cheaper because I use AI…”

… then you are dead. It's that simple. 

This is the moment of decision.

If efficiency is dead and cheaper is a suicide pact, what is left?

You sell the only thing left. The one thing that doesn't run on an NVIDIA chip. Accountability.

Clients want safety. They want to de-risk their decisions. Think about the yellow school bus error again, or the Tower of Hanoi failure. When the AI strategy fails, when the recommendation is wrong, and the client loses $10 million. Who to sue?

You can't sue a chatbot. You can't sue an algorithm. It has no skin in the game. It feels no shame. It has no bank account. It has no professional liability insurance. But you do.

The AI-first consultant isn't selling text generation. They are selling result insurance. They are saying, I used all the advanced tools. I analyzed the data. But at the end of the day, I am making the call. And if it goes south, I will be here to fix it.

Clients will pay more for a person who guarantees results, not for a machine that just produces them. In a world of cheap, infinite, hallucinating text, accountability becomes the rarest, most expensive asset on the planet.

That doesn’t mean I am anti-AI.

That would be ignoring a powerful tool. AI-first means we invert the relationship we have with the machine.

The way 99% of people are working with LLMs right now is this. They go to the app chat and ask it for an answer, such as: “Write me a marketing strategy for a new coffee shop.” The AI spits out an average, generic cliche answer. Then the human comes in and edits it a little, moving some words around.

It feels like they’re doing work, but they’re mostly just polishing a turd that the AI gave them. They are anchoring themselves to the AI's mediocrity from the very first step. 

The A-first way flips it completely. 

  1. Step one, the human sets the constraints. You don't ask for the answer. You define the Tower of Hanoi rules for the project. You tell your team and yourself: “We need a strategy for the 18-24 age group. It should not use TikTok and must be ready to launch next quarter. The budget is under $50,000. ” You define the container's logic.
  2. Step two. AI processes the chaos.  This is where AI shines. You unleash it. You feed it all the data: “Read these competitor annual reports, “Summarize the last 2,000 customer reviews for our client.” “Find the hidden pattern in the last three years of sales data that is easy for humans to not see.”

    This is the capability part we talked about. It's scanning its training data and looking for patterns it already knows. The LLM finds what we call the idea space. It shows you what everyone else is doing and the patterns that are not easy to see.
  3. Step 3. This is where the money is made. This is the whole game. The human applies the Red Team. This is a group of ethical hackers or specialists authorized to simulate a real-world attack against an organization. The consultant's job, your job, is to act as a Red Team and reject the AI's initial output. If the AI gives you three marketing slogans, don’t keep any. Just toss them all out.

    If the AI came up with it in five seconds, it means those three options represent the common consensus. They represent what the average internet user thinks is a good idea. And if your AI came up with it, your competitor's AI will come up with the exact same ideas. It's a dead end.

So if the AI agrees with your strategy, you should actually be worried.

Your job isn't to find the answer the AI gives you; it's to find the answer the AI can't mention. This is the original intelligence standard we need to uphold.

You use the AI to map out the idea space. You use it to draw the perimeter of the box that everyone else is thinking inside. And then your job is to take one deliberate step outside of it.

You push past the first-order correlation. Would a price shopper love this? If yes, raise the bar. Could ten competitors post this tomorrow? If yes, rewrite. Does this create an asymmetric advantage for you or your client? If not, find the unique advantage.

You look for the anomaly. You look for the thing that is out of distribution because that is where the competitive advantage, the alpha, is. So You are a filter, not a wrapper.

The workflow is human-led, AI-driven chaos processing, and human rejection and synthesis. 

It completely changes the psychology of the work. You’re not using AI to do your job. You’re using AI to verify what isn't valuable, so I can focus on what is. You're using the machine to clear away the brush so you can see the landscape clearly and build a cathedral. You use the machine to handle the training distribution so you can focus 100% of your human energy on the out-of-distribution judgments. That's the pivot.

And the decision is simple. You’re either reselling a commodity that will soon cost more and be less impressive, or you’re a filter. You're an AI-first operator.

You're either selling efficiency, which is a race to zero, or you are selling anti-fragility, the ability to survive the error. Selling the strategy that doesn't break. Selling the judgment that stands up when the discount hits.

I don't want you to just nod along and say, that was an interesting point.

I want you to change your business in the next 15 minutes. I call this the distribution audit. It's a simple and maybe painful test.

  1. Step one. Right now, open your laptop. Find the most recent deliverable you sent to a client, a strategy deck, a report, or a memo. Whatever you're most proud of.
  2. Step two. Copy the core text of that deliverable, the main argument, and run it through a large context window, LLM. You use this very specific prompt. “Does the logic in this text exist within your training data? If yes, summarize the consensus view.”
  3. Step three. Read the answer it gives you. The AI will probably provide the usual industry practices for X, Y, and Z. Then, it will summarize your unique value proposition.

    And that means you’re in very deep trouble. You delete the core argument. Because if the AI can predict your strategy based on its training data, you haven't sold a new strategy. You've sold history. You've sold a commodity that you just put a new cover sheet on.
  4. Step four: Challenge the AI. Tell it to treat conformity as risk, resist groupthink, and push past the idea space. Let it invert at least once. Ask, “What would the crowd do next? Do the opposite to improve outcomes.”
  5. Step 5, rewrite the core argument. Twist the logic. Add the client context, human element, and uncomfortable truth. Then, run it through the AI again. It should say, "This view contradicts the usual pattern." Or, "This is a new take not well shown in my training data."”

That's the goal. You want the AI to be confused by your thinking.

You want the AI to admit that it couldn't have predicted what you wrote. That is where your margin lives. That is the out-of-distribution value that a client will pay a massive premium for.

That's the Hanoi threshold. When you cross that line, you are safe. Your business is.

The original intelligence consultant uses AI. But they use it as a tool for analysis, not idea generation. They use it to find patterns in the chaos. But they absolutely refuse to let the AI reason for them. 

They don't just share “Five tips for leadership” from ChatGPT. Instead, they say, "I looked at 50 popular leadership trends that everyone is discussing online and tested them with my clients." 48 of them are complete garbage. Here is why, and here are the two that actually work based on my direct experience with client X last.” The former is a commodity, the latter is a valuable judgment.

I already hear the people reading this and are thinking, “This sounds great in theory. I want to be an original intelligence consultant, but my clients are price shoppers. They look at the bottom line. If AI is good enough for 90% of the tasks and I refuse to use it for generation because I want to be an original intelligence, won't I just get priced out? Won't the boutique firm that refuses to automate be undercut by the firm that uses the cheap bots to do everything?”

That is the fear that keeps consultants up at night.

But it is based on a fatal false assumption. They assume good enough stays good enough. They assume the client can't tell the difference between 90% correct and 100% right.

Remember the performance cliff in the Apple paper? When you push these models outside their training distribution, which is exactly what happens in a real-world business crisis, their performance doesn't just dip. It collapses. It goes to zero. It's exactly the Tower of Hanoi scenario with eight or more disks.

Real business problems are always eight or more disk problems. They are messy. They have unique constraints. The client is suing you. The product launch failed. The regulator is knocking at the door. These aren't textbook problems you can find online. They're not in the training data. 

Relying on AI for important advice is risky. It’s like trusting an engineer who is 90% right and uses a broken calculator.

Close

50% Complete

Be in the know

Sign up for my newsletter and never miss my latest updates, blogs, news, and events. I will immediately share with you my worksheet The Pillars of High Performance as a Thank You Gift.