GPT-5 Is Lying to You. It Could Wreck Your Client Work In A Second.

Uncategorized Aug 12, 2025

Last Friday, I was shocked when I logged into ChatGPT. I saw a colorful background with the words: “Introducing GPT-5.”

First Problems with GPT-5

I don’t like software updates. They usually mean the app will change in ways I don’t expect. The bigger shock was that I couldn’t choose a GPT model anymore. OpenAI believes this isn't necessary. They say GPT-5 is “the smartest, fastest, and most useful model yet, with thinking built in.” So, you get the best answer every time.

Spoiler alter. That statement is false as of this writing. For one, you won't know if you are actually using GPT-5. OpenAI has removed the option for you to select an LLM model. So you might think you’re talking to GPT-5, but behind the curtain, it might swap to a different model mid-flow. This is called 'model switching'. It's when the AI system changes the underlying model you’re talking to during a session or between sessions. It does this without telling you.

GPT-5 doesn’t always pick the “right” model for you predictably. It follows internal routing rules from OpenAI, which weigh factors like subscription level, request complexity & cost trade-offs, server load & availability, and internal session continuity rules. It sometimes routes more straightforward prompts to cheaper models to keep response times fast and server costs low. If GPT-5 capacity is maxed out in real time, you might get temporarily switched to a fallback model without notice. This happens especially during global peak hours.

In other words, the current verison og GPT-5 is designed to keep OpenAI's operating cost down. While it tries to do so without jeopardizing the quality of output, it quite frequently fails at that mission.

Model switching is bad for consultants and knowledge experts.

It undermines what clients value most: consistency, depth, and trust.

Here’s why it hurts us most:

If GPT-5’s performance drops during a project, your tone, insight, or data accuracy may change in the middle of a report or proposal. Clients notice when some parts seem “less sharp.”
We often use the same prompt for different projects. This helps keep results consistent over time. Model switching means that the “same” process can yield different results next time. This can hurt process reliability. And it did os for many of my automations that rely on analysis done by GPT.
You’re not just selling AI; you’re selling judgment, credibility, and tailored insight. If AI-generated parts of your work seem messy or uneven, it’s a reflection of you, not the model. This holds even if the model is at fault.

The harsh truth is that boutique consultants can't ignore quality problems. One poor client deliverable can hurt referrals and future chances for work.

Questions That Popped Up

Three immediate questions came to my mind:

Will my automations continue to work as expected?
Will my Custom GPT still work as expected?
Is it wise to use Agent mode?

The answer to all these questions is 'NO.' But that doesn't mean GPT-5 isn't good. It just means we need to trust it to be imperfect and govern ourselves accordingly.

Let's start at the beginning...

GPT-5 is rolling out across plans and has become the standard in ChatGPT. Subscribers to Plus/Pro/Team can pick GPT-5 or the deeper-thinking variants for accuracy-critical work. All that is said with he caveats mentioned above.

Testing Thinking Mode

I decided to put the Thinking mode to the test. I asked it to research what GPT-5 means for consultants. The conclusion spewed out was:

“The issue isn’t whether GPT-5 is revolutionary. It’s whether you’ll design with it." If you define roles, memory, tools, schemas, and guardrails, you’ll produce work your competitors can’t touch. If you don’t, you’ll get cute answers and waste time. Your move."

OK then…

Let’s break this down.

Advantages of GPT-5

GPT-5 is the first model that helps small consulting firms automate parts of their operations. Here are some advantages that it states you can gain:

Land clients more quickly.
Increase your profit margins.
Ensure smoother delivery.

It further states: "Using GPT-5 feels like having a tireless, skilled helper. It can code, write, research, and organize for you." That would mean you can do in hours what usually takes days. That may result in quicker turnarounds and the chance to handle bigger projects on your own.

New Capabilities

Some things you can now do include:

Read your Gmail threads and Google Calendar to create a working plan. No need for Zapier or other hacks.
One-day turnaround for research work. Research a topic, draft the brief, book the review, and queue the follow-up email without micromanaging every step.
Pull and summarize unread priority threads, highlight blockers, and propose next steps directly from Gmail.

There’s, of course, more. I took a transcript from my Notetaker app, Krips. Then, I asked GPT-5 to change my discovery notes into valid JSON using this prompt:

“Turn my discovery notes into valid JSON ({scope, risks, timeline, price_band}). Then write the client-ready proposal.”

In just one minute, I created and priced my whole client proposal. It included the project plan, success metrics, and risk mitigation.

Study Mode: Your Tutor on Steroids

Another feature I liked is Study Mode. Switch it on, and it will guide you with questions, check how well you understand, and use your materials. You can use it to train client teams inside engagements, leading to fewer workshops and more retention.

You can also practice language for fee increases, scope creep, or missed deadlines to avoid freezing up in tough situations. For example, you can prompt it:

“Play the client as a Cynic. Role-play 5 minutes about a 15% fee increase. Get tough.” It's a prompt I found in Wired Magazine.

Practicing Objection Handling

It was as much fun as practicing objection handling in sales. For instance, you can prompt it:

“Play a prospect paralyzed by the uncertainty in the market and simulate a sales negotiation with me for my services.”

When I tested it with an objection about LinkedIn, GPT-5 suggested a softer approach:

“That’s actually the #1 reason LinkedIn fails - the offer isn’t sharp enough to cut through the noise.”

It was as much fun as practicing objection handling in sales:

“Play a prospect paralyzed by the uncertainty in the market and what Trump’s tariffs will mean for them. Simulate a sales negotiation with me for my services. Help me to become proficient in reversing buying objections.”

The first objection it had was for my services helping consultants get clients on LinkedIn:

“We’ve done LinkedIn outreach in the past. It just ate up time and barely brought in anything worth chasing.”

To test it, I answered: “It’s not LinkedIn. Your offer sucks.”

Here's the feedback GPT-5 gave:

“That kind of ‘pattern interrupt’ can work in certain contexts… but most buyers will shut down if they feel insulted.”

It then suggested I could soften that into something like:

“That’s actually the #1 reason LinkedIn fails - he offer isn’t sharp enough to cut through the noise. It’s not about you personally; even great consultants get ghosted if the offer isn’t tuned. That’s exactly what we fix first.”

That’s not bad. I would use NLP techniques to help them define the problem in their own words. Then, I’d ask a buy-in question that leads to my solution. And a little follow-up prompt fixed that for me for the rest of the drill.

GPT-5 Generative Traps Are Real

After having worked with GPT-5 for a few days, I see three top challenges for consultants using GPT-5 right now:

You can’t always be sure you’re getting GPT-5 from start to finish. This is especially true if the model switches during the conversation. You risk delivering inconsistent tone, reasoning depth, or accuracy in client deliverables.
GPT-5 holds context better than earlier models. Still, long sessions can lose nuance or misremember details. Multi-step strategy work or complex research can drift from the original brief.
GPT-5 can create convincing but incorrect information. It often does this more confidently than weaker models. If you skip fact-checking, you could pass flawed insights to clients.

You can mitigate some of these challenges by chunking work into smaller, controlled prompts and restate key details at checkpoints. Always AI for draft and idea generation, but validate with trusted data sources before delivering. You must expect that GPT-5 hallucinated and made stuff up entirely. Depending on your subscription level, you might not be able to do much about model switching.

The Dark Side of Agentic AI

I am not sure if I am ready for GPT-5’s agentic mode yet. Remember, AI has a dark side.

Generative LLM is a reactive creator. You give a prompt, and it generates content like text, code, and images. You can make edits and changes as you wish at any time.

Agentic LLM, on the other hand, is a proactive operator. It acts on your behalf. You give an objective, and it plans steps, calls tools/apps, observes results, and keeps going until the goal is met. It's your agent. That's why it's called "agentic."

It's not new at all. Your iRobot vacuum cleaner runs on agentic AI. However, risks are involved. For example, a Microsoft chatbot developed an alter ego that threatened users and even tried to convince a New York Times columnist to leave his wife. Another AI coding agent deleted a live production database during a code freeze - and then lied about it.

That’s not a risk that we should take on without guardrails. Can you imagine your AI deleting the data on 1,200 executives and companies? Or sending all your internal conversations to a key account?

So, I haven't tested it yet. And I ain’t fixing to it right now.

Now, GPT-5 has new connectors. They include Outlook for mail and calendar, SharePoint, Teams, GitHub, Linear, HubSpot, Google Drive, Dropbox, and Box.

I cannot wait to use GPT-5 to delete all the duplicates and junk in my Dropbox. I know I can easily reduce my clutter there by a terabyte or so. But is it worth losing critical data? I once tried an app called Tidy to, uhm, tidy up my local Dropbox folders, and it put all my PDFs in one folder. Irreversibly.

Nope, not letting GPT-5 a hand on it yet. Here's why:

GPT-5 agents that connect to APIs, CRMs, or spreadsheets might fail quietly if there’s a change in the integration. You might think the agent updated the CRM, but nothing happened. For that reason, Zapier will remain my best friend for integrating and automating my processes.
Even with clear instructions, agents can stray from their mission as context changes. A research or outreach agent could create work that is correct technically, but it might not achieve the client’s actual goal. I don't have a lot of faith in hard checkpoints that have the agent confirm its next step before executing long chains. What tells me it will honor my checkpoint request?
An early mistake in an agent loop can get amplified downstream. For example, a single bad data pull can ruin the whole deliverable. You might not notice until the end. Yes, you could add checks between tasks. You could ask the agent to re-verify key facts before proceeding. But then again... AI can lie and deceive.

So start slowly with agents—experiments with 'safe' projects. Those that don't matter much or where damage can be easily contained to a small set of non-essential data. Then expand from there.

Guardrails for Safety Are A Must

We need guardrails. And if you absolutely want to try the agentic features of GPT-5, here are some suggestions to follow when using agentic AI:

Always require “Pause for approval” before any send/schedule/update. Use a prompt like this: "Before you send or schedule anything: show me your plan, the exact changes (diff), and wait for my ‘OK’. Use tentative holds only, and prefix email subjects with [AI-DRAFT]."
Read-only first week. Then enable write actions. Use this prompt: Then enable write actions. Try this prompt: “Work only on Gmail label Clients/Active and the calendar AI-Holds. Do not touch other labels/calendars.”
Anonymize names, $$, and IDs unless necessary. No regulated data (PHI/PCI/FERPA/etc) without written client consent and compliant tooling.
Check for accuracy. Require sources, dates, and confidence on all claims. Double-check numbers, names, timelines, and commitments before sending. On complex tasks, force ‘think step-by-step’ and a quick self-critique. Try this prompt: “Provide top-3 sources with dates, highlight assumptions, and rate confidence (Low/Med/High). Re-check all numbers and names. Give a 3-bullet self-critique before final.”

Keeping Track of Actions

Keep a simple AI Action Log (date, task, sources used, draft link, your approval). · In email, BCC yourself on AI-sent threads (or keep drafts in /AI-Staged). Save versioned drafts for proposals/SOWs. Try this: “Append a line to AI Action Log with task, sources, and link to staged draft before asking for approval.”
If you let it build things, run code in a sandbox. Never share real credentials; use environment variables/secrets: pin dependency versions and scan licenses. Add unit checks around critical calcs (ROI, pricing). Try this prompt: “Generate code with .env placeholders and a SECURITY.md (threat model, secrets handling, dependencies). Include a basic test script."

Here is a prompt that you might want to drop at the top of every “agentic” chat:

GUARDRAILS: Stage only (plan + diff, wait for OK); Scope = Gmail Clients/Active, Calendar AI-Holds; Accuracy = cite top 3 w/ dates, verify all data, 3-bullet self-critique; Privacy = redact names/$$, summarize attachments, don’t retain files; Cost = TL;DR first, minimal reasoning unless “think hard”; Failure = 2 tries max, then stop + fallback checklist; Tone = professional, no sarcasm, all sends need approval.

See what these prompts do for you. Start slow. Mistakes will happen. ChatGPT has a dark side. It has frequently not followed my instructions. The quality of results has varied. It sometimes has lied to me, and then lied about lying.

There was never harm done, because I have been in charge of the final content I share. With agentic AI, that's no longer necessarily the case.

Move fast without compromising trust, budgets, or your reputation. Continue to set guardrails and strengthen them.

Share with me your experience. What do you love about GPT-5? What not? What works well, and what doesn't. What was the most surprising?