Claude is overly confident when it shouldn't be.
I have a long list of rules about how Claude and I work together. Don't do this. Always do that. Check this before that.
The rules aren't policy. They're not best practices. They're not the result of careful design.
They're scar tissue.
I work with Claude in production. Real systems, real data, real customers. The bar isn't "plausible answer." The bar is "right, or we have a problem."
Every rule on the list exists because something specific broke. Code that wiped data because Claude didn't understand the API. An audit that reported clean on records it never actually checked because Claude didn't notice the API response was paginated. Each time, I had to clean up the mess. Each time, I added a rule to make sure it didn't happen again.
The rules sometimes work. When Claude reads them and applies them, they prevent the failure. The system from my last post exists in part to keep these rules in front of Claude, session after session.
The rules also get ignored. Daily. I have to pause Claude and tell it to check its memory and its rules at least once a session, often more. The rules being in the system isn't the same as the rules being followed. What it does mean is that when Claude misses a rule, I have something to point at to get back on track.
I started writing rules because I almost got fooled. Claude's confidence read as competence, and I caught myself buying it. The first time I lost data because I trusted output that sounded right, I realized confidence wasn't a signal I could rely on.
AI is confidently wrong
The one thing about AI that nobody warns you about loud enough. AI doesn't know what it doesn't know. And it sounds exactly the same when it's right as when it's wrong.
A traditional system fails loudly. You write bad code, the compiler tells you. You query a missing column, the database tells you. You misspell a function name in Excel and the cell shows #NAME? before you've even hit enter twice. The system has opinions about correctness and it expresses them.
"The output is presented with the same calm confidence whether it's right or wrong, and Claude doesn't know which."
Claude has no such mechanism. Claude produces output. The output looks fine. The output sounds fine. The output is presented with the same calm confidence whether it's right or wrong, and Claude doesn't know which.
My rules exist so I don't find out Claude was wrong three hours later. Or three days later.
A specific example
I asked Claude to add a pipeline filter to one of our HubSpot workflows. The kind of edit that should take a few seconds. Claude wrote the code. Told me what the call would do. Looked right. I gave the green light. Claude ran it. The response came back clean.
I tested it immediately. Created a test company. Moved it through the stages. Nothing happened. The lifecycle status writes that were supposed to fire never did.
I checked the workflow. It still existed. Enrollment criteria looked fine. But the actions were gone. All twenty-eight of them. Across all four workflows we had just modified. The action graphs that did the actual work had been wiped to zero.
What had happened: Claude had used a PUT request to update the workflow with a partial payload. PUT replaces. The payload didn't include all the existing fields. HubSpot interpreted the missing fields as "set these to empty," not "leave them alone." Every workflow we touched had been reduced to an empty shell that would enroll records and do nothing to them.
Recovery was a four-hour ordeal.
The rule that came out of this is short. Never PUT an existing workflow. Not to update, not to enable, not for anything. Build new, test, delete old. The rule is now in the knowledge system. Claude follows it most of the time. The mistake hasn't happened again yet.
But notice what the rule isn't. It isn't insight. It isn't expertise. It's a specific catch for a specific failure that should never have happened in the first place. I had to lose work to learn something Claude should have known.
Every rule has a scar behind it
I could walk through dozens more of these. An audit comparing records between systems that reported everything was in sync because Claude only checked the first 20 and didn't notice the API was paginated. API calls that retried in tight loops and locked us out because Claude didn't know about the rate limit. JavaScript that ran fine in Node but broke in the browser because Claude didn't think to check the runtime.
Every single one followed the same pattern. Confident output. Plausible-looking. Subtly wrong in a way that required someone with hands-on experience to notice the difference. Every single one became a rule.
"The most dangerous failures don't look like failures."
The most dangerous failures don't look like failures. The rules are safety guards. They also tell you something about the gap between what AI can do and what it understands.
What kind of knowledge this actually is
The rules aren't a list of things AI got wrong. They're something else.
They're the kind of knowledge you can only get by doing the work and watching where it breaks. Not theory. Not documentation. Not best practices someone wrote up after the fact. The residue of failure. The catches I now make automatically because I've seen what happens when nobody catches them.
That kind of learning doesn't happen in any manual. It can't be planned for. It can only be earned, and the cost of earning it is the failure itself.
Writing it down as rules turns that hard-won knowledge into guardrails. Every guardrail is a lesson I paid for, written in a form that aims to prevent, or at least catch, the next version of the same failure. As long as Claude actually reads it.
What it actually takes
If you're working with AI on real production work, you're going to build guardrails whether you mean to or not. The choice isn't whether to have them. The choice is whether to build them deliberately or rebuild them every time you change tools, projects, or memory.
I build them deliberately. The guardrails live in the knowledge system. They get delivered back to Claude in every session that matters. New ones get added when new failures happen. Old ones get refined when I learn the failure mode was actually different than I first thought.
The work is unglamorous. It's also the thing that's gotten me from "AI is interesting" to "AI is producing real value in production." Without the guardrails, every session is a coin flip. That's fine in a sandbox. It's not fine when the work is going to production.
Where I landed
When I started, I trusted Claude's confidence. The first failure changed that. The rules came after. They don't make Claude more careful and they don't catch every failure. But every rule I write reduces the chances of Claude making the same mistake again. And writing them keeps me alert to where the next failure might appear.
It's not enough. The gap is still there.
But fewer get past me now.