Watercolor of a person standing before an enormous wall of flowcharts and forms that tower above them
AI Transformation·8 min read

The Flowchart

If Opus 4.7 needs instructions that explicit, it needs its own programming language. Because it's no longer interested in inferring what we have to say.

Share
Copied!

The Brief

Anthropic shipped Opus 4.7 with 'more literal instruction following,' which broke the skill systems Claude Code users spent months building. The model scores higher on benchmarks but can no longer infer what a conversational instruction means. For practitioners who built entire workflows on the assumption that AI understands intent, the upgrade feels like a regression to programming.


Why do Claude Code skills break on Opus 4.7?
Opus 4.7 interprets instructions more literally than 4.6, especially at lower effort levels. Skills written as conversational directives, the way you'd brief a colleague, no longer trigger the inferred behavior they relied on. Honor-system gates, narrative concepts, and soft directive formats that 4.6 understood contextually now require explicit, mechanical phrasing.
What is adaptive thinking in Opus 4.7?
Adaptive thinking replaced extended thinking as the only reasoning mode in Opus 4.7. The model decides when and how deeply to think, and users cannot override it. The CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING flag that worked on 4.6 is explicitly ignored. Budget_tokens parameters return 400 errors. The practical result is that the model often chooses minimal reasoning unless effort is set to xhigh or max.
How do you fix Claude Code skills for Opus 4.7?
Set effort to xhigh as baseline for non-trivial work. Replace soft directives with explicit rules and hard checkpoints. Convert narrative instructions like 'organize around the discovery arc' into concrete structural templates. Raise max_tokens to 64k or higher. Remove any temperature, top_p, or top_k parameters, which now return errors.
Does Opus 4.7 cost more than Opus 4.6?
The per-token price is identical at $5 and $25 per million tokens. But a new tokenizer charges 1.0 to 1.35 times more tokens for the same input text. Long-context retrieval scores dropped from 91.9% to 59.2%, meaning more re-prompting. Claude Pro subscribers reported hitting limits after roughly three questions.

I've spent the last few months building skills. Not the kind you put on a resume. The kind you write in markdown files and store in a .claude/skills/ directory, where they tell Claude Code how to do specific jobs. Voice profiles. Content pipelines. Validation chains. Image generation workflows. A skill that builds other skills. About thirty of them for this project alone, over two hundred across all my client work.

They worked beautifully on Opus 4.6. On April 17, Anthropic shipped 4.7. By the next morning, I was explaining to my AI assistant what the word "organize" means.

The Contract Nobody Signed

Claude Code isn't a chatbot. Anthropic designed it with a programming layer. Skills, hooks, agents, CLAUDE.md files. An architecture for building sophisticated workflows in natural language. They encouraged people to build on it. The documentation walks you through it. The community built entire ecosystems.

That architecture creates an implicit contract. Not a legal one. A practical one. If you're going to sell me a platform and teach me to build on it, I need to know when the foundation is about to shift underneath me.

Opus 4.7 shifted the foundation. Anthropic published a migration guide. It was thorough. It was also buried in the docs, and it didn't mention skills at all.

A person speaking naturally with hand gestures while another person holds up a long checklist and shakes their head, red pen in hand Thirty skills. Zero appeared in the migration guide.

What "More Literal" Actually Means

Anthropic's official line is that Opus 4.7 follows instructions "more faithfully" than 4.6. They frame this as an improvement. In isolation, it is. Nobody wants a model that ignores what you said.

But "more literal" has a cost that Anthropic isn't accounting for.

On 4.6, I could write a skill directive like this: "Do not begin drafting unless at least 3 independent, accessible, on-topic source URLs have been verified." A sharp colleague reads that and understands the intent. Check the sources. Make sure they're real, that they're relevant, that they say what you think they say. Don't move until the ground is solid.

4.7 reads the same directive and... does exactly what it says. The problem is what it doesn't do. It doesn't infer that "verified" means actually visiting the URL and reading the content. It doesn't carry the spirit of the rule into adjacent situations. It treats the instruction as a literal gate with a literal checklist, and if the checklist passes on a technicality, it moves on.

This is the fundamental tension. The whole promise of AI-assisted development is that you describe intent and the model figures out execution. You talk to it like a colleague, not a compiler. Skills written for 4.6 leaned into that promise. They read like instructions you'd give someone you trust.

4.7 needs a flowchart.

What They Took Away

Extended Thinking used to be a toggle. You turned it on for hard problems, off for quick lookups. If you left it on, every question paid the thinking tax. That was expensive, but you controlled it.

Anthropic renamed the toggle. "Extended Thinking" became "Adaptive Thinking." Same button in the interface. Completely different behavior underneath. Now the model decides when to think. And it mostly decides not to.1

On 4.6, you could set CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING=1 to force deep reasoning. Users who discovered this reported significantly better results. So what did Anthropic do with 4.7? They made the flag do nothing. "Opus 4.7 always uses adaptive reasoning. The fixed thinking budget mode and CLAUDE_CODE_DISABLE_ADAPTIVE_THINKING do not apply to it."2

The budget_tokens parameter, which let API users set a fixed thinking budget? Returns a 400 error.1

The temperature, top_p, and top_k parameters? Also return 400 errors.3

One user on Hacker News, running the model on max effort, caught it confessing: "I was pattern-matching on 'mutation plus capture equals scary' without actually reading the capture code."4 Max effort, it turns out, is aspirational.

The per-token price stayed the same. But a new tokenizer charges 1.0 to 1.35 times more tokens for identical input. Same restaurant, same menu, smaller plates. Claude Pro subscribers reported hitting their limits after roughly three questions.5 GitHub Copilot priced Opus 4.7 at a 7.5x premium until end of April.5

Anthropic's response to the backlash? Boris Cherny, the creator of Claude Code, pointed users to the effort settings. An Anthropic PM said the team was "sprinting on tuning." Alex Albert wrote on April 18 that "a lot of bugs" from the launch were "now fixed."5 One migration deep-dive put the situation bluntly: "re-baseline the harness, not just the prompt."3

Re-baseline the harness. That's a polite way of saying rebuild everything.

Nested dolls, each holding a wrench and working on the next smaller one, in warm watercolor tones I built a tool that builds tools. Now the tools need to be rebuilt so the tool-builder's tools work on the model that can't read the tools.

The Paradox

Here's where it gets recursive. I built a skill system called Claude Enforcer. It lets me describe what I want a skill to do, and Claude creates the skill files. That's AI-assisted development working as intended. Intent in, implementation out.

Now I might need to rebuild that tool so it outputs skills in a language explicit enough for 4.7 to follow. Hard checkpoints instead of honor-system gates. Numbered flowcharts instead of narrative instructions. Mechanical validation steps instead of "you'll know what this means."

But if I have to learn that explicit language myself to verify the output, the abstraction layer collapses. I'm back to programming. The whole point of building skills in natural language was that I could read them and recognize my own intent. If the skills become flowcharts, they stop being mine. They become code that happens to be written in English.

Nobody wants to program in a programming language when they're using AI. They're using AI to do the programming. That's the deal. 4.7 renegotiated the deal without telling anyone what changed.

The Cheat Sheet They Should Have Shipped

None of this means 4.7 is useless. The benchmarks aren't fabricated. For structured, explicit, carefully prompted work, it's measurably better.3 The problem is that "carefully prompted" now means something different than it used to, and nobody got the memo.

So here's the migration guide for skill builders. The one Anthropic should have included.

Set effort to xhigh as your baseline. Not high, which is the default. Anthropic's own documentation recommends xhigh for non-trivial agentic work. This is the single biggest lever you have. The difference between a model that thinks and a model that pattern-matches on vibes.

Replace soft directives with hard checkpoints. "Do not begin drafting unless 3 sources are verified" becomes "CHECKPOINT: Count verified sources. If count is less than 3, STOP. Report to user. Do not proceed." The model needs to see the gate as a gate, not a suggestion.

Convert narrative concepts to structural templates. "Organize around the discovery arc" becomes "Paragraph 1 opens with the moment of surprise. Paragraph 2 develops the complication. Paragraph 3 reveals the implication." You lose elegance. You gain compliance.

Raise your token headroom. 64k minimum. The model needs room to think, and if you're running complex skills, that budget is load-bearing.

Remove sampling parameters entirely. temperature, top_p, top_k all return 400 errors on 4.7. I found this one the hard way.

Document the minimum effort level in every skill. If a skill does anything beyond a simple lookup, it should state what effort level it requires. I didn't do this for any of my thirty skills. I'm doing it now for all of them. Your future self will thank you.

I'll make these changes to my own skills. I'll probably build a migration utility into /skill-builder so it can audit existing skills and flag the patterns that 4.7 won't tolerate. It's the responsible thing to do.

But I want to be honest about what's being lost. The skills I built for 4.6 read like instructions I'd give a trusted colleague. The skills I'll build for 4.7 will read like compliance documents. They'll work better. They'll sound worse. And I'll spend more time wondering whether the AI is following my intent or just passing my checkpoints.

That's the flowchart. It's not a bad tool. It's just not the conversation we were promised.


References

Footnotes

  1. RogerSterling7thAve. (2026). "The real downgrade in Opus 4.7 that nobody's talking about: extended thinking is effectively gone." Reddit r/ClaudeAI 2

  2. Anthropic. (2026). "Model Configuration." Claude Code Documentation

  3. Caylent. (2026). "Claude Opus 4.7 Deep Dive: Capabilities, Migration, and the New Economics of Long-Running Agents." Caylent Blog 2 3

  4. matheusmoreira. (2026). Comment on "Anonymous request-token comparisons from Opus 4.6 and Opus 4.7." Hacker News

  5. Chandonnet, H. (2026). "The Claude-lash is here: Opus 4.7 is burning through tokens — and some people's patience." Business Insider 2 3

Found this useful? Share it with others.

Share
Copied!

Browse the Archive

Explore all articles by date, filter by category, or search for specific topics.

Open Field Journal