Category: Dev Notes

  • Claude Fable 5 Explained — vs Opus and Sonnet, and What Changed in Claude Code

    Summary — the conclusion first

    On June 9, 2026, Anthropic shipped Claude Fable 5. Three things matter.

    • A new Mythos class now sits above the Opus tier, and Fable 5 is its first generally available model
    • On risky questions it doesn’t refuse — it hands the request over to Opus 4.8. Substitution, not a wall
    • Claude Code gained a /goal command: “don’t stop until this condition is true” is now a thing

    It costs twice as much as Opus 4.8 — and the real gap is bigger than 2x. Explained below.

    What shipped — Fable and Mythos

    Start with the naming. Mythos 5 is the base model. Fable 5 is the same model wearing safety classifiers, released to the public.

    Mythos 5 itself is restricted to vetted security researchers under a government-partnered program (Project Glasswing). What users and companies actually get is Fable 5.

    On benchmarks, Anthropic reports state-of-the-art results in coding (Cognition’s FrontierCode), financial reasoning (Hebbia), and vision. Stripe testified that a 50-million-line codebase migration “compressed months of engineering into days.”

    The numbers — against the existing lineup

    Here are the official specs of the four current models in one table.

    Fable 5 Opus 4.8 Sonnet 4.6 Haiku 4.5
    Position top tier (Mythos class) complex reasoning, agents speed-intelligence balance fastest, cheapest
    Context 1M tokens 1M 1M 200K
    Max output 128K 128K 64K 64K
    Price (in/out, per MTok) $10 / $50 $5 / $25 $3 / $15 $1 / $5

    One trap hides here. Fable 5 uses a new tokenizer, and the same text counts as roughly 30% more tokens than on older models (stated in the official docs). Double the unit price, more tokens per text — the effective cost gap is wider than the table suggests.

    Also: there is no off switch for thinking. Adaptive thinking is always on; you only control its depth.

    The safety design — substitution, not a wall

    This is the most interesting design choice. When Fable 5 detects high-risk topics — cybersecurity, biology, model distillation — it doesn’t refuse. It passes the request to Opus 4.8 to answer safely.

    Even in the API, a refusal isn’t an error: you get HTTP 200 with stop_reason: "refusal", plus a fallback parameter that retries on another model for you. Per the announcement, over 95% of sessions never trigger a fallback.

    I hit it firsthand while writing this post. My session was full of server-maintenance scripts and security gates; minutes after switching to Fable 5, the classifier flagged the session as “cybersecurity” and quietly swapped in Opus 4.8. Work continued — less like being blocked, more like a shift change.

    What changed in Claude Code

    Claude Code (the terminal AI coding tool) gained /goal. Set a completion condition — “all tests pass and lint is clean” — and the session keeps working until the condition is true.

    The design detail worth noticing: whether the condition is met is checked every turn by a separate evaluator, not by the model doing the work. The worker doesn’t grade its own homework.

    Boris Cherny, Claude Code’s creator, wrote that Fable is “the best model I have used for coding, by a wide margin” — fewer prompts and steers, better token efficiency, code quality, and self-verification, longer autonomous runs. A teammate compressed the shift into one line: “We used to verify that Claude did the work right. Now we verify that it’s doing the right work.”

    The context — a warning days before launch

    The timing is striking. Days before this release, Anthropic publicly urged major AI labs to build “a coordinated brake pedal” for frontier AI development. The strongest model and the strongest warning shipped in the same week.

    There’s criticism too: Mythos-class traffic carries mandatory 30-day data retention (no zero-retention option), and enterprises are already pushing back on AI costs in general.

    One line

    Fable 5 is less about raw power and more about structure. “Swap instead of refuse,” “graded by someone else instead of trusting itself” — the strongest model yet, shipped inside the strictest frame yet.

    Sources

    All numbers and quotes verified against these (as of 2026-06-10).

  • What Is a Schema — Why Databases and Your Brain Use the Same Word

    Summary — the conclusion first

    A schema is a frame that defines, in advance, what shape data should take. In databases it’s the table blueprint; in psychology it’s the structure of knowledge in your head.

    Sharing the word is no accident. Both rest on the same idea — the frame must exist first, so that new things have a place to land.

    In development, a schema blocks garbage data. In learning, a schema catches new knowledge.

    Think of a bank form

    A bank form has rules per field. Letters in the name field, eight digits for the birth date, signature required.

    That agreement — field types, formats, what’s required, decided up front — is a schema. It’s the frame, not the content. Whoever fills it in, the form stays the same.

    In development — a blueprint for data

    In dev, a schema declares “this data must have this shape.” The meaning holds everywhere it appears.

    Context What the schema defines
    DB schema which tables hold which columns (name, type, required)
    API schema which fields a request or response must carry
    Form/config schema what format an input must follow

    A schema’s power is that it filters at the door. Send “abc” into an age field and it’s rejected before it’s stored. Without a schema, bad data piles up quietly and detonates months later.

    In psychology — the frame of knowledge in your head

    In 1932, psychologist Frederic Bartlett ran a famous experiment. He had English students read an unfamiliar Native American folk tale, “The War of the Ghosts,” then retell it later.

    They couldn’t recall it as it was. They reshaped it to fit frames they already had. Unfamiliar canoes became familiar boats; a strange ritual became a hunt.

    That pre-existing frame of knowledge is psychology’s schema. We don’t store new information verbatim — we file it into existing frames. It’s like having a “restaurant schema”: you’ve never been to this place, yet you walk in, sit, order, pay, without a manual.

    Why the same word

    The root is the Greek skhēma (form, shape). The two fields picked the same word because they do the same job.

    DB schema Mental schema
    The frame tables, columns existing knowledge structure
    New data inserted as rows attached to existing frames
    When it doesn’t fit rejected (an error) quietly distorted or forgotten

    The last row is the scary one. A computer at least throws an error. The brain distorts or drops it silently. Bartlett’s students never noticed their memories had changed.

    In practice — for studying and for building

    The concept works on both sides of the desk.

    Studying: when something won’t stick, it’s often not your memory — it’s that there’s no frame to hang it on. Meet a new concept and first ask, “what do I already know that this resembles?” One analogy outlasts ten repetitions.

    Building: define the schema before accepting data. “Store it now, clean it later” almost always ends in a garbage pile. And changing a schema is surgery on everything stored on top of it — back up first.

    One line

    Schema = frame. The frame must exist first for new things to find their place — in a database, and in your head.

    Sources

    These are the experiments and terms cited above.

    • Frederic Bartlett, Remembering (1932) — “The War of the Ghosts,” the start of schema theory
    • Jean Piaget — children grow schemas through assimilation and accommodation
    • JSON Schema — a standard example of schemas in dev
  • How to Pick Open-Source Tools by Fit — I Skipped an 84k-Star One

    Summary — the conclusion first

    I weighed four similar tools on GitHub and picked one.

    The interesting part is why I skipped the others. I dropped one with 84k stars, and I trusted a license badge and got it wrong.

    Bottom line: stars are popularity, not fit. And don’t take a badge at face value — I did, and I was wrong.

    Stars are popularity, not “fit”

    Did I pick the one with the most stars? No. I skipped a tool with over 84k stars.

    It was a multi-agent trading framework (TradingAgents, 84k stars). Powerful. But live trading means real money, and it wasn’t what I needed right now.

    A restaurant with 50,000 reviews is like that — not proof it fits my taste. Stars signal “many people look,” not “this fits me.”

    I trusted a badge and got it wrong — the license story

    Here’s where I was wrong. I’ll say it plainly.

    I’d noted one finance terminal (FinceptTerminal) as an “AGPL license trap.” But when I checked, it wasn’t AGPL. It was NOASSERTION — GitHub couldn’t match it to a standard license.

    NOASSERTION doesn’t mean “bad.” It means “the label isn’t standard, so read it yourself.” In fact, Anthropic’s own security tool (defending-harness) was also NOASSERTION.

    Judge by the badge alone and you’ll be wrong. I nearly called a fine project a trap based on the wrapper. A badge is a hypothesis until you read it.

    So the test isn’t “is it good?” but “does it fit?”

    I’m not looking for a good tool. I’m looking for one that fits my stack, my needs, my risk.

    Tool Stars License Why I picked / skipped
    HyperFrames 26k Apache-2.0 ✅ picked — blog to video, used it right away
    TradingAgents 84k Apache-2.0 live-trading risk + not my need now → parked
    FinceptTerminal 26k NOASSERTION read the license yourself + C++ (not my stack) + whole app when I needed parts
    defending-harness 5k NOASSERTION good, but a “customize-it-yourself reference,” not plug-and-play

    They’re all good tools. Just one fit me right now.

    When someone says “they’re all great,” be suspicious

    I first handed this evaluation to an AI assistant. It said all four were great.

    I asked: “You’re saying all four are worth adopting?” Only then did an honest ranking appear.

    Anyone — an AI or a reviewer — leans toward praise when you hand them an evaluation. So change the question. Not “what’s good?” but “what should I drop?”

    One line

    Stars, badges, an AI’s first answer — all just signals. Check it yourself, and pick by “does it fit?” not “is it good?”

    What I checked

    The numbers here aren’t guesses; I confirmed them directly.

    • Stars, license, and language for each repo confirmed via the GitHub API (2026-06)
    • NOASSERTION = GitHub couldn’t identify a standard SPDX license (non-standard/custom) → read the LICENSE file yourself
  • I Built a Blog Publishing Skill — and This Post Is Its First Output

    Summary — the conclusion first

    Publishing one blog post took too many manual steps. Title, summary, category, translation, cache — eleven of them.

    So I built a tool to do it in one shot. The point wasn’t “automation.” It was making the writing actually get read.

    And building it taught me one thing. Building is easy; shipping is hard. This post is my first act of shipping.

    Why I built it — the same eleven chores

    Every time I published, I repeated the same manual steps. I counted eleven.

    1. Write the title
    2. Write the summary
    3. Add a category
    4. Add tags
    5. Post the Korean version
    6. Translate to English
    7. Link the two as a translation pair
    8. Convert to web format
    9. Publish
    10. Purge the cache
    11. Check the URL is alive

    Once is fun. By the tenth time it’s a chore. I’d forget to link the English version, or skip the cache purge and see the old post.

    It’s chopping the same vegetables in the same order every time. You don’t memorize the recipe — you build a meal kit.

    The real problem wasn’t automation — it was “does anyone read it?”

    Automation was half of it. The bigger half was whether the published post actually gets read.

    People don’t read. They scan. Nielsen Norman Group found people read only 20–28% of the words on a page.

    “Made to Stick” has a scarier experiment. People tapping out a song’s rhythm guessed listeners would name it half the time. The real rate was 1 in 40.

    The writer hears the melody; the reader hears only taps. That gap is the “curse of knowledge” — assuming others know what you know.

    So I made readability a “security gate”

    I decided to force readable writing: a checker that blocks publishing if the post fails.

    Think of a password check. Too short, you can’t sign up. Same here — miss the bar, the post doesn’t go up.

    The checks came straight from the research.

    • Is the conclusion at the top — the first sentence of each section is the answer
    • Is any paragraph too long — long ones get skipped
    • Are there enough subheads — signposts for scanners
    • Is any sentence too long — out of breath, can’t read
    • Are there analogies and examples — abstract doesn’t stick

    This post got caught too. My draft ended with a clever little flourish I thought was modest. The checker called it a common cliché. I cut it.

    To me it was style; to the reader, filler. A small case of the curse of knowledge — the checker caught the gap for me.

    An unexpected win — write it well once, gain in three places

    Tidying for readability paid off in more than one place. Three, actually.

    A post that’s easy for humans to scan is easy for search-and-summarize AI to quote. Search now shows AI-assembled answers, and it cites posts with the conclusion up top and short paragraphs first.

    A well-organized post is also easy to share — you can lift the three key lines as they are.

    One lever, three doors: humans, AI, and sharing all point the same way.

    Failure log — raw

    Nothing went smoothly. Here are the snags, as they were.

    The tool rejected my text. One way of moving the text tripped a security filter, so I routed around it.

    It didn’t show in the menu. I built the skill but it wasn’t listed. A single config line at the top of the file was missing. I only found it by comparing with a tool that worked.

    The English URL collided. Using the same address as the Korean version made the system append a number. I gave the English version its own address.

    Behind every clean result there’s this kind of small breakage.

    The stack — infrastructure included

    I won’t hide what runs it.

    The blog runs on one small PC at home. On it, WordPress in Docker, with a multilingual plugin pairing Korean and English.

    External access goes through a Cloudflare tunnel. Drafts are handled by free gpt-4o, and I wrote the readability checker in Python myself.

    What I actually learned — building is easy, shipping is hard

    This is the one thing that stuck. Building a tool is comfortable; shipping is not.

    Building Shipping
    Control in my hands others’ reaction
    If you slip just fix it people see it
    Feeling comfortable uncomfortable
    Growth small large

    So I keep wanting to build more tools. It’s the comfortable side. But growth is on the uncomfortable side.

    So I write this. Not leaving the made thing as-made, but putting it out. This post itself is that practice.

    One line

    I built the tool to force readable writing, but the real lesson was elsewhere: building is easy, shipping is hard — so ship.

    Sources

    • Nielsen Norman Group, “How Users Read on the Web” — only 20–28% of words read
    • Chip & Dan Heath, “Made to Stick” — the curse of knowledge, the tappers experiment
    • Diátaxis — four documentation types (tutorial, how-to, reference, explanation)
  • Yak Shaving, Idempotency, Dogfooding, Dead Man’s Switch — 4 Dev Terms That Work Outside Code

    A lot of developer slang points at how you work, not at code. These four do — getting lured down side-quests, repeating an action safely, using your own thing before you ship it, and stopping safely when you stop. Worth knowing whether you collaborate or work alone.

    TL;DR

    • Yak shaving: starting your real task, then drowning in a chain of side-tasks far from the point
    • Idempotency: running the same action many times gives the same result as running it once (safe to retry)
    • Dogfooding: using your own product as a user before handing it to anyone else
    • Dead man’s switch: when you stop sending the “I’m alive” signal, the system stops itself, safely
    • What they share: they look like code terms, but they’re really attitudes — focus, safety, honesty, failsafe
    Term One line At work
    Yak shaving side-quests bury the goal focus
    Idempotency many times = same as once safe repetition
    Dogfooding use it yourself first honest verification
    Dead Man’s Switch you stop → system stops safely failsafe

    1. Yak Shaving

    You want to wax the car, but the hose is missing. To buy one you need your store card, which expired; to renew it you have to return the neighbor’s book… and somehow you end up at the zoo, shaving a yak. The car is still dirty.

    Yak shaving is exactly this — a side-task started for a real goal spawns another, then another, until you spend all your time somewhere that looks unrelated to the original work. The term is said to have come out of MIT.

    It’s common in dev: “just fix one bug” → “need to update a library” → “need to swap the build tool” → half a day in config. Daily life too — sitting down to write one post and ending up redesigning the blog theme.

    Getting out: ask once, “is this actually needed for what I set out to do?” If not, leave a note about the side-task and return to the main one.

    2. Idempotency

    Press the elevator button ten times and it still arrives the same as one press. That’s idempotencydoing the same action several times yields the same result as doing it once.

    It’s originally a math term (f(f(x)) = f(x)), but in dev it matters most for safe retries. You send a payment request, get no response, and send it again. If it isn’t idempotent, you get charged twice. If it is, the same request is processed once — distinguished by an “idempotency key.”

    Telling them apart is easy. “Delete” is idempotent (deleting an already-deleted thing leaves it deleted). “Add 1” is not (each press increments).

    Why it helps: anywhere retries and automation live (payments, notifications, sync), it’s the lens for “if this runs twice, does something break?”

    3. Dogfooding

    “Eat your own dog food.” It’s said to come from a dog-food company proving quality by feeding it to its own dogs, and Microsoft spread it in 1988 to mean “let’s use our own software first.”

    The idea is simple — use it yourself, as a user, before selling or shipping it to anyone. You only catch the bugs and feel the friction by actually using it.

    The core is honesty. It’s hard to recommend a tool you don’t use yourself. Conversely, a tool you use daily gets better fast — because you’re the first to hurt when it’s clunky.

    One more step — dogfooding goes as far as “I used it.” The next step is a stranger using it, and the real test starts there.

    4. Dead Man’s Switch

    What happens if a train driver collapses at the controls? Old trains had a pedal. The train runs only while the driver keeps the pedal pressed; lift your foot (collapse) and the train stops itself. That device is the dead man’s switch.

    The name is grim, but the idea is simple — when you stop, the system stops itself, safely. Normally you keep sending an “I’m alive” signal; when that signal drops, it assumes the worst and fails toward safety.

    Dev and automation use it the same way. If a trading bot stops responding, it closes positions automatically (otherwise a market swing catches you defenseless). If a server misses its periodic “heartbeat,” another server takes over.

    Why it helps: for anything that runs unattended (bots, cron jobs, automation), it’s the habit of planting “if this dies, how does it stop safely?” in advance. Designing for how it dies is real safety, more than turning it on.

    In One Line

    The four point at attitude, not code.

    • Yak shaving → don’t get lured by side-quests; focus on the point
    • Idempotency → design so repeats and retries are safe
    • Dogfoodinguse it yourself before shipping
    • Dead Man’s Switch → design for how it dies, not just how it runs

    Name them and you can point at where work leaks — in conversations with developers, and when you work alone.