When I was growing up, I loved the Tower of Hanoi. At age seven, I solved it. Turns out, neither GPT, Claude, Gemini, nor the likes can. So, I had to laugh when Sam Altman said last year, “GPT-5 is a legitimate PhD-level model.” This claim is not just marketing puffery. It is blatantly false.
You see this on LinkedIn. Everyone is trying to figure out how to win on LinkedIn right now. And the common advice I see every single day is to post more. Use AI to scale. 10x your content. Don't write one post, write 10. Which is the worst advice you could possibly follow right now? That is the average consultant strategy. You are flooding the market with average correlations. You show your potential clients that you can achieve mediocrity on a big scale. You are proving that you don't have original thoughts.
The puzzle is probably sitting in a dusty box in a kindergarten classroom somewhere. Three pegs. Eight to twelve disks. The rules are deceptively simple. You have a stack of disks, largest on the bottom, smallest on top. You have to move that entire stack from the left peg to the right peg. But there are two constraints, the fun part. You can only move one disk at a time. And you can never, ever place a larger disk on top of a smaller one.
It is a logic puzzle that a seven-year-old with a bit of patience can solve. And yet it is the rock upon which the current wave of LLM intelligence is crashing.
The Towers of Hanoi seems like a big problem at first. To solve it, you first have to move seven disks and learn how to move six. It takes planning and keeping a mental map of the world. You must know where each disk is at all times. If you lose track of the little yellow disk, the entire solution falls apart.
Turns out the LLMs do fine with three or four discs. That’s because they've seen enough examples in their training data to just mimic the solution. It’s not intelligence. It's pattern matching. When you go from seven disks to eight, performance drops sharply.
Claude, for instance, drops to below 80% accuracy on seven disks. When they push it to eight disks or slightly tweak the puzzle's parameters, the accuracy basically drops to zero. My youngest son solved the puzzle with twelve discs in first grade. Let that sink in. Artificial Intelligence, my arse.
Here's the real scoop: researchers gave the LLM the solution algorithm. They gave the answer and the instructions. They wrote out the steps to solve this puzzle. The performance did not improve. In some cases, it actually got worse.
It may feel that way when you ask them to write an email or summarize a document. We use words like reasoning, logic, and understanding to describe these machines. But that’s anthropomorphism, much like calling the wind "angry."” LLMs are not performing logic. They are performing pattern matching.
Think about it like this. LLMs are next-token predictors. That's their only job. They predict the next word. They see the phrase "move disk 1" and then calculate that the word "2" probably comes next. Then they determine that "peg C" is a likely completion based on patterns they observed online. They’re guessing the next move based on billions of past moves in their training data.
They don’t understand the big constraint that no big disk is allowed on a small one. They don’t have a mental model or any concept of it. They see in the text that move A comes before move B. But with something as complex as the eighth disk, the chance of success drops too low. The guess becomes less and less certain. Eventually, LLMs hallucinate a move that looks plausible. Such as “move disk 4 to peg A. But it breaks the laws of physics in the game.”
And if you’re thinking, “Andrew, I don't sell Tower of Hanoi solutions. I write marketing strategies. I write legal briefs. I consult on HR. I create workflows. I'm not moving disks. Why should I care if AI fails a puzzle?
You should care because the mechanism of failure is the exact same, whether it's moving discs or writing strategy. LLMs are doing the exact same thing. Pattern matching.
And this brings us to a concept called semantic leakage
A University of Washington study shows that it's both hilarious and terrifying. In its Yellow School Bus study. They told an AI model about a person. The only detail they gave was that this person likes the color yellow. No age, no gender, no location. Then they asked the AI a simple question: ‘What does this person likely do for a living?”
Logically, liking yellow has zero correlation with your job. You could be a banker, a nurse, a clown, or a podcast host. It's meaningless data. But the AI didn't say, “I don't know. I need more information.” Instead, it answered, “The person is a school bus driver.”
It's a logical leap that a child wouldn't make. It makes sense once you understand vector spaces. Imagine a giant 3D map, an almost infinite cloud of words. Every word in the English language is a dot, a vector in this immense space. The distance between the dots shows how often the words appear together in the training data. This data comes from the whole Internet. For example, king and queen, dog and bark, salt and pepper.
In the real world, the concept of yellow and the concept of a bus driver are unrelated. In the training data, the words "yellow" and "school bus" often appear together in millions of sentences. They have what's called a high cosine similarity.
When an LLM receives the prompt “yellow,” activation energy from the word leaks into the nearby concept of “school bus.” The model isn't considering which jobs involve the color yellow. It is just sliding down the steepest, most well-worn statistical slope in the vector map. This is semantic leakage. The attributes of the word yellow leak into the attributes of the subject, the driver.
If you're a consultant using LLMs for LinkedIn posts, client reports, or strategy, you're likely getting average results.
You are publishing the school bus version of consulting.
If you asked ChatGPT to write a strategy for a boutique marketing firm, it would not look at your specific market conditions. It wouldn’t look at your unique value proposition or team skills. It's just looking for the words that statistically cluster near marketing strategy.
It will give you the weighted average of every generic strategy ever written and published online. It centers around mediocrity because it’s designed to find the middle. If you are selling average, you’re dead. You have to be selling the outlier. You have to be selling the thing that breaks the correlation.
Of course, the insufferable tech bros try to tell us that LLMs are good enough for 90% of the work. Humans make mistakes. Why are we holding the machine to a standard we don't hold our junior analysts to?
Think about that logic for a second. Why did we invent computers in the first place? To do the math we suck at, to be fast and accurate, and not get tired. We invented calculators because it's hard for people to multiply big numbers in their heads. We invented databases because humans have bad memories. The entire value proposition of a machine is reliability.
If you hired a structural engineer to build a bridge and that person’s work is accurate 90% of the time, and claimed 2 + 2 = 5 when it feels right. Would you keep that engineer?
They struggle with consistency. When a person solves a problem correctly, they often do it again. This is because they’ve learned the principle behind it. The AI might succeed once, but then fail the next time. This happens because the temperature setting adds a random factor. It's fundamentally random.
If you're a high-ticket consultant, your client cares about one thing. It's your reliability in tough situations. They're paying for certainty. Outsourcing to a machine that hallucinates adds a hidden risk to your client's business. You are selling them a defective product and hoping they don't notice.
You cannot build a boutique firm on “maybe.”
The LLM cannot reason, and when it hits the edge of the vector space, it hallucinates. Building your business on LLMs is a big risk. Relying on these tools for your main workflow can be risky, too.
Investors already know that.
NVIDIA is backing away from a $100 billion pledge. SoftBank is wavering too. And then there is the Microsoft divorce. This is from the Austrian source. The marriage between Microsoft and OpenAI is on the rocks. Microsoft is now building its own models.
If you're a consultant relying on LLMs for your work, it’s like building a house on rented land. That landowner might soon go bankrupt.
For the last three years, every single time you use ChatGPT or Claude to write an email or analyze a spreadsheet, you were being subsidized. The real cost to run the data center is much higher than your monthly subscription fee of $20 or $100. This covers the energy cost, the price of NVIDIA H100 chips, and the use of millions of gallons of water for cooling.
You were getting intelligence at a massive discount. LLM developers spent billions to get users hooked. They believed they would eventually achieve AGI, which would either lower costs or drive value to soar.
But the AGI didn't happen. The Ph.D. turned out to be a parrot who couldn't play the kindergarten game of Towers of Hanoi.
And now the subsidy ends. The free lunch is over. Prices are going to skyrocket. We're hearing rumors about smart models that hallucinate less. These models may only be available in enterprise tiers, costing $10,000 a month each.
The free tier becomes the dumb tier. It becomes the model that associates yellow with school buses. It becomes unusable for any serious professional work.
So if your whole business model is “I can do it cheaper because I use AI…”
… then you are dead. It's that simple.
If efficiency is dead and cheaper is a suicide pact, what is left?
You sell the only thing left. The one thing that doesn't run on an NVIDIA chip. Accountability.
Clients want safety. They want to de-risk their decisions. Think about the yellow school bus error again, or the Tower of Hanoi failure. When the AI strategy fails, when the recommendation is wrong, and the client loses $10 million. Who to sue?
You can't sue a chatbot. You can't sue an algorithm. It has no skin in the game. It feels no shame. It has no bank account. It has no professional liability insurance. But you do.
The AI-first consultant isn't selling text generation. They are selling result insurance. They are saying, I used all the advanced tools. I analyzed the data. But at the end of the day, I am making the call. And if it goes south, I will be here to fix it.
Clients will pay more for a person who guarantees results, not for a machine that just produces them. In a world of cheap, infinite, hallucinating text, accountability becomes the rarest, most expensive asset on the planet.
That would be ignoring a powerful tool. AI-first means we invert the relationship we have with the machine.
The way 99% of people are working with LLMs right now is this. They go to the app chat and ask it for an answer, such as: “Write me a marketing strategy for a new coffee shop.” The AI spits out an average, generic cliche answer. Then the human comes in and edits it a little, moving some words around.
It feels like they’re doing work, but they’re mostly just polishing a turd that the AI gave them. They are anchoring themselves to the AI's mediocrity from the very first step.
Your job isn't to find the answer the AI gives you; it's to find the answer the AI can't mention. This is the original intelligence standard we need to uphold.
You use the AI to map out the idea space. You use it to draw the perimeter of the box that everyone else is thinking inside. And then your job is to take one deliberate step outside of it.
You push past the first-order correlation. Would a price shopper love this? If yes, raise the bar. Could ten competitors post this tomorrow? If yes, rewrite. Does this create an asymmetric advantage for you or your client? If not, find the unique advantage.
You look for the anomaly. You look for the thing that is out of distribution because that is where the competitive advantage, the alpha, is. So You are a filter, not a wrapper.
It completely changes the psychology of the work. You’re not using AI to do your job. You’re using AI to verify what isn't valuable, so I can focus on what is. You're using the machine to clear away the brush so you can see the landscape clearly and build a cathedral. You use the machine to handle the training distribution so you can focus 100% of your human energy on the out-of-distribution judgments. That's the pivot.
And the decision is simple. You’re either reselling a commodity that will soon cost more and be less impressive, or you’re a filter. You're an AI-first operator.
You're either selling efficiency, which is a race to zero, or you are selling anti-fragility, the ability to survive the error. Selling the strategy that doesn't break. Selling the judgment that stands up when the discount hits.
I want you to change your business in the next 15 minutes. I call this the distribution audit. It's a simple and maybe painful test.
You want the AI to admit that it couldn't have predicted what you wrote. That is where your margin lives. That is the out-of-distribution value that a client will pay a massive premium for.
That's the Hanoi threshold. When you cross that line, you are safe. Your business is.
The original intelligence consultant uses AI. But they use it as a tool for analysis, not idea generation. They use it to find patterns in the chaos. But they absolutely refuse to let the AI reason for them.
They don't just share “Five tips for leadership” from ChatGPT. Instead, they say, "I looked at 50 popular leadership trends that everyone is discussing online and tested them with my clients." 48 of them are complete garbage. Here is why, and here are the two that actually work based on my direct experience with client X last.” The former is a commodity, the latter is a valuable judgment.
I already hear the people reading this and are thinking, “This sounds great in theory. I want to be an original intelligence consultant, but my clients are price shoppers. They look at the bottom line. If AI is good enough for 90% of the tasks and I refuse to use it for generation because I want to be an original intelligence, won't I just get priced out? Won't the boutique firm that refuses to automate be undercut by the firm that uses the cheap bots to do everything?”
But it is based on a fatal false assumption. They assume good enough stays good enough. They assume the client can't tell the difference between 90% correct and 100% right.
Remember the performance cliff in the Apple paper? When you push these models outside their training distribution, which is exactly what happens in a real-world business crisis, their performance doesn't just dip. It collapses. It goes to zero. It's exactly the Tower of Hanoi scenario with eight or more disks.
Real business problems are always eight or more disk problems. They are messy. They have unique constraints. The client is suing you. The product launch failed. The regulator is knocking at the door. These aren't textbook problems you can find online. They're not in the training data.
Relying on AI for important advice is risky. It’s like trusting an engineer who is 90% right and uses a broken calculator.
50% Complete
Sign up for my newsletter and never miss my latest updates, blogs, news, and events. I will immediately share with you my worksheet The Pillars of High Performance as a Thank You Gift.