Category: Dev Notes

Claude Fable 5 Explained — vs Opus and Sonnet, and What Changed in Claude Code

Summary — the conclusion first

On June 9, 2026, Anthropic shipped Claude Fable 5. Three things matter.

A new Mythos class now sits above the Opus tier, and Fable 5 is its first generally available model
On risky questions it doesn’t refuse — it hands the request over to Opus 4.8. Substitution, not a wall
Claude Code gained a /goal command: “don’t stop until this condition is true” is now a thing

It costs twice as much as Opus 4.8 — and the real gap is bigger than 2x. Explained below.

What shipped — Fable and Mythos

Start with the naming. Mythos 5 is the base model. Fable 5 is the same model wearing safety classifiers, released to the public.

Mythos 5 itself is restricted to vetted security researchers under a government-partnered program (Project Glasswing). What users and companies actually get is Fable 5.

On benchmarks, Anthropic reports state-of-the-art results in coding (Cognition’s FrontierCode), financial reasoning (Hebbia), and vision. Stripe testified that a 50-million-line codebase migration “compressed months of engineering into days.”

The numbers — against the existing lineup

Here are the official specs of the four current models in one table.

	Fable 5	Opus 4.8	Sonnet 4.6	Haiku 4.5
Position	top tier (Mythos class)	complex reasoning, agents	speed-intelligence balance	fastest, cheapest
Context	1M tokens	1M	1M	200K
Max output	128K	128K	64K	64K
Price (in/out, per MTok)	$10 / $50	$5 / $25	$3 / $15	$1 / $5

One trap hides here. Fable 5 uses a new tokenizer, and the same text counts as roughly 30% more tokens than on older models (stated in the official docs). Double the unit price, more tokens per text — the effective cost gap is wider than the table suggests.

Also: there is no off switch for thinking. Adaptive thinking is always on; you only control its depth.

The safety design — substitution, not a wall

This is the most interesting design choice. When Fable 5 detects high-risk topics — cybersecurity, biology, model distillation — it doesn’t refuse. It passes the request to Opus 4.8 to answer safely.

Even in the API, a refusal isn’t an error: you get HTTP 200 with stop_reason: "refusal", plus a fallback parameter that retries on another model for you. Per the announcement, over 95% of sessions never trigger a fallback.

I hit it firsthand while writing this post. My session was full of server-maintenance scripts and security gates; minutes after switching to Fable 5, the classifier flagged the session as “cybersecurity” and quietly swapped in Opus 4.8. Work continued — less like being blocked, more like a shift change.

What changed in Claude Code

Claude Code (the terminal AI coding tool) gained /goal. Set a completion condition — “all tests pass and lint is clean” — and the session keeps working until the condition is true.

The design detail worth noticing: whether the condition is met is checked every turn by a separate evaluator, not by the model doing the work. The worker doesn’t grade its own homework.

Boris Cherny, Claude Code’s creator, wrote that Fable is “the best model I have used for coding, by a wide margin” — fewer prompts and steers, better token efficiency, code quality, and self-verification, longer autonomous runs. A teammate compressed the shift into one line: “We used to verify that Claude did the work right. Now we verify that it’s doing the right work.”

The context — a warning days before launch

The timing is striking. Days before this release, Anthropic publicly urged major AI labs to build “a coordinated brake pedal” for frontier AI development. The strongest model and the strongest warning shipped in the same week.

There’s criticism too: Mythos-class traffic carries mandatory 30-day data retention (no zero-retention option), and enterprises are already pushing back on AI costs in general.

One line

Fable 5 is less about raw power and more about structure. “Swap instead of refuse,” “graded by someone else instead of trusting itself” — the strongest model yet, shipped inside the strictest frame yet.

Sources

All numbers and quotes verified against these (as of 2026-06-10).

2026-06-10

What Is a Schema — Why Databases and Your Brain Use the Same Word

Summary — the conclusion first

A schema is a frame that defines, in advance, what shape data should take. In databases it’s the table blueprint; in psychology it’s the structure of knowledge in your head.

Sharing the word is no accident. Both rest on the same idea — the frame must exist first, so that new things have a place to land.

In development, a schema blocks garbage data. In learning, a schema catches new knowledge.

Think of a bank form

A bank form has rules per field. Letters in the name field, eight digits for the birth date, signature required.

That agreement — field types, formats, what’s required, decided up front — is a schema. It’s the frame, not the content. Whoever fills it in, the form stays the same.

In development — a blueprint for data

In dev, a schema declares “this data must have this shape.” The meaning holds everywhere it appears.

Context	What the schema defines
DB schema	which tables hold which columns (name, type, required)
API schema	which fields a request or response must carry
Form/config schema	what format an input must follow

A schema’s power is that it filters at the door. Send “abc” into an age field and it’s rejected before it’s stored. Without a schema, bad data piles up quietly and detonates months later.

In psychology — the frame of knowledge in your head

In 1932, psychologist Frederic Bartlett ran a famous experiment. He had English students read an unfamiliar Native American folk tale, “The War of the Ghosts,” then retell it later.

They couldn’t recall it as it was. They reshaped it to fit frames they already had. Unfamiliar canoes became familiar boats; a strange ritual became a hunt.

That pre-existing frame of knowledge is psychology’s schema. We don’t store new information verbatim — we file it into existing frames. It’s like having a “restaurant schema”: you’ve never been to this place, yet you walk in, sit, order, pay, without a manual.

Why the same word

The root is the Greek skhēma (form, shape). The two fields picked the same word because they do the same job.

	DB schema	Mental schema
The frame	tables, columns	existing knowledge structure
New data	inserted as rows	attached to existing frames
When it doesn’t fit	rejected (an error)	quietly distorted or forgotten

The last row is the scary one. A computer at least throws an error. The brain distorts or drops it silently. Bartlett’s students never noticed their memories had changed.

In practice — for studying and for building

The concept works on both sides of the desk.

Studying: when something won’t stick, it’s often not your memory — it’s that there’s no frame to hang it on. Meet a new concept and first ask, “what do I already know that this resembles?” One analogy outlasts ten repetitions.

Building: define the schema before accepting data. “Store it now, clean it later” almost always ends in a garbage pile. And changing a schema is surgery on everything stored on top of it — back up first.

One line

Schema = frame. The frame must exist first for new things to find their place — in a database, and in your head.

Sources

These are the experiments and terms cited above.

Frederic Bartlett, Remembering (1932) — “The War of the Ghosts,” the start of schema theory
Jean Piaget — children grow schemas through assimilation and accommodation
JSON Schema — a standard example of schemas in dev

2026-06-10

How to Pick Open-Source Tools by Fit — I Skipped an 84k-Star One

Summary — the conclusion first

I weighed four similar tools on GitHub and picked one.

The interesting part is why I skipped the others. I dropped one with 84k stars, and I trusted a license badge and got it wrong.

Bottom line: stars are popularity, not fit. And don’t take a badge at face value — I did, and I was wrong.

Stars are popularity, not “fit”

Did I pick the one with the most stars? No. I skipped a tool with over 84k stars.

It was a multi-agent trading framework (TradingAgents, 84k stars). Powerful. But live trading means real money, and it wasn’t what I needed right now.

A restaurant with 50,000 reviews is like that — not proof it fits my taste. Stars signal “many people look,” not “this fits me.”

I trusted a badge and got it wrong — the license story

Here’s where I was wrong. I’ll say it plainly.

I’d noted one finance terminal (FinceptTerminal) as an “AGPL license trap.” But when I checked, it wasn’t AGPL. It was NOASSERTION — GitHub couldn’t match it to a standard license.

NOASSERTION doesn’t mean “bad.” It means “the label isn’t standard, so read it yourself.” In fact, Anthropic’s own security tool (defending-harness) was also NOASSERTION.

Judge by the badge alone and you’ll be wrong. I nearly called a fine project a trap based on the wrapper. A badge is a hypothesis until you read it.

So the test isn’t “is it good?” but “does it fit?”

I’m not looking for a good tool. I’m looking for one that fits my stack, my needs, my risk.

Tool	Stars	License	Why I picked / skipped
HyperFrames	26k	Apache-2.0	✅ picked — blog to video, used it right away
TradingAgents	84k	Apache-2.0	live-trading risk + not my need now → parked
FinceptTerminal	26k	NOASSERTION	read the license yourself + C++ (not my stack) + whole app when I needed parts
defending-harness	5k	NOASSERTION	good, but a “customize-it-yourself reference,” not plug-and-play

They’re all good tools. Just one fit me right now.

When someone says “they’re all great,” be suspicious

I first handed this evaluation to an AI assistant. It said all four were great.

I asked: “You’re saying all four are worth adopting?” Only then did an honest ranking appear.

Anyone — an AI or a reviewer — leans toward praise when you hand them an evaluation. So change the question. Not “what’s good?” but “what should I drop?”

One line

Stars, badges, an AI’s first answer — all just signals. Check it yourself, and pick by “does it fit?” not “is it good?”

What I checked

The numbers here aren’t guesses; I confirmed them directly.

Stars, license, and language for each repo confirmed via the GitHub API (2026-06)
NOASSERTION = GitHub couldn’t identify a standard SPDX license (non-standard/custom) → read the LICENSE file yourself

2026-06-09

I Built a Blog Publishing Skill — and This Post Is Its First Output
Summary — the conclusion first

Publishing one blog post took too many manual steps. Title, summary, category, translation, cache — eleven of them.

So I built a tool to do it in one shot. The point wasn’t “automation.” It was making the writing actually get read.

And building it taught me one thing. Building is easy; shipping is hard. This post is my first act of shipping.

Why I built it — the same eleven chores

Every time I published, I repeated the same manual steps. I counted eleven.
1. Write the title
2. Write the summary
3. Add a category
4. Add tags
5. Post the Korean version
6. Translate to English
7. Link the two as a translation pair
8. Convert to web format
9. Publish
10. Purge the cache
11. Check the URL is alive
Once is fun. By the tenth time it’s a chore. I’d forget to link the English version, or skip the cache purge and see the old post.

It’s chopping the same vegetables in the same order every time. You don’t memorize the recipe — you build a meal kit.

The real problem wasn’t automation — it was “does anyone read it?”

Automation was half of it. The bigger half was whether the published post actually gets read.

People don’t read. They scan. Nielsen Norman Group found people read only 20–28% of the words on a page.

“Made to Stick” has a scarier experiment. People tapping out a song’s rhythm guessed listeners would name it half the time. The real rate was 1 in 40.

The writer hears the melody; the reader hears only taps. That gap is the “curse of knowledge” — assuming others know what you know.

So I made readability a “security gate”

I decided to force readable writing: a checker that blocks publishing if the post fails.

Think of a password check. Too short, you can’t sign up. Same here — miss the bar, the post doesn’t go up.

The checks came straight from the research.
- Is the conclusion at the top — the first sentence of each section is the answer
- Is any paragraph too long — long ones get skipped
- Are there enough subheads — signposts for scanners
- Is any sentence too long — out of breath, can’t read
- Are there analogies and examples — abstract doesn’t stick
This post got caught too. My draft ended with a clever little flourish I thought was modest. The checker called it a common cliché. I cut it.

To me it was style; to the reader, filler. A small case of the curse of knowledge — the checker caught the gap for me.

An unexpected win — write it well once, gain in three places

Tidying for readability paid off in more than one place. Three, actually.

A post that’s easy for humans to scan is easy for search-and-summarize AI to quote. Search now shows AI-assembled answers, and it cites posts with the conclusion up top and short paragraphs first.

A well-organized post is also easy to share — you can lift the three key lines as they are.

One lever, three doors: humans, AI, and sharing all point the same way.

Failure log — raw

Nothing went smoothly. Here are the snags, as they were.

The tool rejected my text. One way of moving the text tripped a security filter, so I routed around it.

It didn’t show in the menu. I built the skill but it wasn’t listed. A single config line at the top of the file was missing. I only found it by comparing with a tool that worked.

The English URL collided. Using the same address as the Korean version made the system append a number. I gave the English version its own address.

Behind every clean result there’s this kind of small breakage.

The stack — infrastructure included

I won’t hide what runs it.

The blog runs on one small PC at home. On it, WordPress in Docker, with a multilingual plugin pairing Korean and English.

External access goes through a Cloudflare tunnel. Drafts are handled by free gpt-4o, and I wrote the readability checker in Python myself.

What I actually learned — building is easy, shipping is hard

This is the one thing that stuck. Building a tool is comfortable; shipping is not.

Building Shipping

Control in my hands others’ reaction

If you slip just fix it people see it

Feeling comfortable uncomfortable

Growth small large

So I keep wanting to build more tools. It’s the comfortable side. But growth is on the uncomfortable side.

So I write this. Not leaving the made thing as-made, but putting it out. This post itself is that practice.

One line

I built the tool to force readable writing, but the real lesson was elsewhere: building is easy, shipping is hard — so ship.

Sources
- Nielsen Norman Group, “How Users Read on the Web” — only 20–28% of words read
- Chip & Dan Heath, “Made to Stick” — the curse of knowledge, the tappers experiment
- Diátaxis — four documentation types (tutorial, how-to, reference, explanation)
2026-06-09

	Building	Shipping
Control	in my hands	others’ reaction
If you slip	just fix it	people see it
Feeling	comfortable	uncomfortable
Growth	small	large

Yak Shaving, Idempotency, Dogfooding, Dead Man’s Switch — 4 Dev Terms That Work Outside Code

A lot of developer slang points at how you work, not at code. These four do — getting lured down side-quests, repeating an action safely, using your own thing before you ship it, and stopping safely when you stop. Worth knowing whether you collaborate or work alone.

TL;DR

Yak shaving: starting your real task, then drowning in a chain of side-tasks far from the point
Idempotency: running the same action many times gives the same result as running it once (safe to retry)
Dogfooding: using your own product as a user before handing it to anyone else
Dead man’s switch: when you stop sending the “I’m alive” signal, the system stops itself, safely
What they share: they look like code terms, but they’re really attitudes — focus, safety, honesty, failsafe

Term	One line	At work
Yak shaving	side-quests bury the goal	focus
Idempotency	many times = same as once	safe repetition
Dogfooding	use it yourself first	honest verification
Dead Man’s Switch	you stop → system stops safely	failsafe

1. Yak Shaving

You want to wax the car, but the hose is missing. To buy one you need your store card, which expired; to renew it you have to return the neighbor’s book… and somehow you end up at the zoo, shaving a yak. The car is still dirty.

Yak shaving is exactly this — a side-task started for a real goal spawns another, then another, until you spend all your time somewhere that looks unrelated to the original work. The term is said to have come out of MIT.

It’s common in dev: “just fix one bug” → “need to update a library” → “need to swap the build tool” → half a day in config. Daily life too — sitting down to write one post and ending up redesigning the blog theme.

Getting out: ask once, “is this actually needed for what I set out to do?” If not, leave a note about the side-task and return to the main one.

2. Idempotency

Press the elevator button ten times and it still arrives the same as one press. That’s idempotency — doing the same action several times yields the same result as doing it once.

It’s originally a math term (f(f(x)) = f(x)), but in dev it matters most for safe retries. You send a payment request, get no response, and send it again. If it isn’t idempotent, you get charged twice. If it is, the same request is processed once — distinguished by an “idempotency key.”

Telling them apart is easy. “Delete” is idempotent (deleting an already-deleted thing leaves it deleted). “Add 1” is not (each press increments).

Why it helps: anywhere retries and automation live (payments, notifications, sync), it’s the lens for “if this runs twice, does something break?”

3. Dogfooding

“Eat your own dog food.” It’s said to come from a dog-food company proving quality by feeding it to its own dogs, and Microsoft spread it in 1988 to mean “let’s use our own software first.”

The idea is simple — use it yourself, as a user, before selling or shipping it to anyone. You only catch the bugs and feel the friction by actually using it.

The core is honesty. It’s hard to recommend a tool you don’t use yourself. Conversely, a tool you use daily gets better fast — because you’re the first to hurt when it’s clunky.

One more step — dogfooding goes as far as “I used it.” The next step is a stranger using it, and the real test starts there.

4. Dead Man’s Switch

What happens if a train driver collapses at the controls? Old trains had a pedal. The train runs only while the driver keeps the pedal pressed; lift your foot (collapse) and the train stops itself. That device is the dead man’s switch.

The name is grim, but the idea is simple — when you stop, the system stops itself, safely. Normally you keep sending an “I’m alive” signal; when that signal drops, it assumes the worst and fails toward safety.

Dev and automation use it the same way. If a trading bot stops responding, it closes positions automatically (otherwise a market swing catches you defenseless). If a server misses its periodic “heartbeat,” another server takes over.

Why it helps: for anything that runs unattended (bots, cron jobs, automation), it’s the habit of planting “if this dies, how does it stop safely?” in advance. Designing for how it dies is real safety, more than turning it on.

In One Line

The four point at attitude, not code.

Yak shaving → don’t get lured by side-quests; focus on the point
Idempotency → design so repeats and retries are safe
Dogfooding → use it yourself before shipping
Dead Man’s Switch → design for how it dies, not just how it runs

Name them and you can point at where work leaks — in conversations with developers, and when you work alone.

2026-06-07

Category: Dev Notes

Claude Fable 5 Explained — vs Opus and Sonnet, and What Changed in Claude Code

Summary — the conclusion first

What shipped — Fable and Mythos

The numbers — against the existing lineup

The safety design — substitution, not a wall

What changed in Claude Code

The context — a warning days before launch

One line

Sources

What Is a Schema — Why Databases and Your Brain Use the Same Word

Summary — the conclusion first

Think of a bank form

In development — a blueprint for data

In psychology — the frame of knowledge in your head

Why the same word

In practice — for studying and for building

One line

Sources

How to Pick Open-Source Tools by Fit — I Skipped an 84k-Star One

Summary — the conclusion first

Stars are popularity, not “fit”

I trusted a badge and got it wrong — the license story

So the test isn’t “is it good?” but “does it fit?”

When someone says “they’re all great,” be suspicious

One line

What I checked

I Built a Blog Publishing Skill — and This Post Is Its First Output

Summary — the conclusion first

Why I built it — the same eleven chores

The real problem wasn’t automation — it was “does anyone read it?”

So I made readability a “security gate”

An unexpected win — write it well once, gain in three places

Failure log — raw

The stack — infrastructure included

What I actually learned — building is easy, shipping is hard

One line

Sources

Yak Shaving, Idempotency, Dogfooding, Dead Man’s Switch — 4 Dev Terms That Work Outside Code

TL;DR

1. Yak Shaving

2. Idempotency

3. Dogfooding

4. Dead Man’s Switch

In One Line