Claude for Research — Learn to use the Stick

The most capable chat tool is the one that asks you to drive
Hugh McCutchen · v2.2 · June 2026 · ~2,000 words
Part One

The car nobody explained

Most chat tools are automatics. They do the ordinary jobs well, keep the risky controls out of reach, and they won’t let you stall. For most people, most of the time, that’s the right car.

Long-running work is where this gets unglued.

The kind of work I’m talking about is research and writing that builds over days or weeks — accumulating sources, developing ideas, moving from rough thinking to something you’d actually publish. Not a single task you hand off and collect. A project.

In most tools, that project ends the same way. The wheels start to slip in the turns. The engine misfires. I find myself covering ground I already covered — same problem, same dead ends. So I pull the draft to a separate document and finish it by hand. Claude is more resilient, but it can end up in the same pattern if you haven’t learned to drive it.

Most of what gets written about Claude is aimed at developers — token budgets, prompt caching, agent orchestration. Not much help if you’re doing research and writing.

What drew me to Claude was simple: it reasons better. It produces a clean diagram. It formats a document well enough that I’m not spending hours cleaning up drafts before I can share them. Real advantages, and they come with a real catch.

The capability is metered. Every plan runs on a usage window, and the strongest setup drains it fastest. At first that felt like a cost problem. It isn’t. It’s a design constraint that turns out to be a feature — it forces you to make choices about your work that make you a better driver.

Claude was demanding I learn to drive. Once I did, everything changed.

If you work the way I do — long research, multi-session writing, an LLM alongside you the whole way — this paper is the foundation. It covers the ideas you need to understand. Paper Two is the framework I built to put them to work.

Illustrated car dashboard: model gear shift, effort dial, feature toggles, and context folders up front; the fuel gauges inside an open glovebox.
FIG. A — THE DASHBOARD. POWER CONTROLS UP FRONT; THE FUEL GAUGE IN THE GLOVEBOX.
Part Two

Choosing your gears and Loading the vehicle

The first move isn’t picking a model. It’s assessing the work in front of you — how complex is it, how much genuine uncertainty does it carry, does it need open-ended judgment or defined execution against a settled plan — and matching that to what you can afford. Model, effort, and context are how you spend.

What can you afford in your time window

Working in Claude for hours at a stretch means working within a budget. You’re not billed per token unless you buy additional credits, but you do have a rolling session window and a weekly limit, and both draw down as you work. Burning through either means stopping early.

The right model depends on the work. Deep reasoning and open-ended judgment go to the strongest model. Defined execution — drafting from a settled outline, producing from a spec — goes to the fastest capable one. Right now that means Opus to frame and synthesize, Sonnet to draft and produce, Haiku for the lightweight tasks.

My own path wasn’t clean. I defaulted to Opus because it handled everything. Then Anthropic released a new version and the token usage doubled overnight. I was burning through my session window in two hours instead of five, and just staying on the latest model was no longer a cost I could absorb. When I shifted to Opus for the framing work and Sonnet for execution, I got the same quality at half the cost — and kept working longer on the same plan.

As I came to the final draft of this paper, I shifted from Opus to Sonnet Max and turned off Claude’s extended thinking mode. The ideas were settled — I didn’t need deep reasoning pulling in bigger questions from the project. I needed clean, fast prose work, section by section. Sonnet is better at that. Turning off thinking kept it focused on the writing in front of it rather than re-examining the architecture behind it. And it iterates faster, which is exactly what you want in a drafting pass.

One thing I almost never do: switch models mid-thread. Bringing a new model into a long thread is like bringing a new person into a meeting that’s been running for two hours. They need to catch up, and the thinking gets inconsistent while they do. If I need to switch, I’d rather start a fresh thread than try to hand off mid-stream.

The match Weigh the work on two axes. The setup falls out. bounded by what you can afford: the usage window WORK COMPLEXITY KIND OF THINKING High Low Defined execution Open-ended judgment Drafting from a settled plan Sonnet · High–Max large, complex builds: run it up Sonnet · Medium routine drafting and production Synthesis and real uncertainty Opus · High–Extra the deepest uncertainty earns it Sonnet · High lower-stakes synthesis spends fastest Lookups and mechanical tasks Haiku or Sonnet · Low the fast, cheap path cheapest Light exploration and brainstorm Sonnet · Medium escalate if it deepens
FIG. B — THE MATCH. SPEND RISES TOWARD THE TOP RIGHT.

Context is what you’re carrying in the vehicle

Context is the other half of the equation. It determines what the model is actually working with — what facts it holds, what decisions it remembers, how much of your work is live in the thread.

Something shifted here while I was writing this paper. Claude used to hold your entire thread, word for word, until you hit a hard stop. Now the stop is softer. When a thread gets near the context limit, Claude summarizes the early material and keeps going — and it marks the spot when it does. Everything below that line is still held in full.

The other tools I’ve used do something different. They trim and compress continuously as the thread grows, and you find out what fell off when the model forgets it. With Claude you can see what the model is holding, because below the compression line, it’s holding all of it.

That difference matters because it means you can choose where the break happens and what survives it. A break you cut deliberately — carrying forward what you decided matters — produces better work than a break the system cut for you.

Habit The Hazard The Protection What it costs
Cut a clean break at every deliverable You can’t see what’s been compressed or what the model is quietly working without. You chose the break point and what carries forward. Fresh context, full picture. Noticing when the thread is running long. Save the draft, open a fresh thread, carry the file forward.
Close with a handoff note The model drifts — logic gets fuzzy, focus slips, earlier decisions lose weight. The next session opens on a few pages of full continuity instead of a hundred-thousand-token thread. Ten minutes at session close. Let Claude draft it from your open items and next step.
Load only what the thread needs More context feels like richer work. It isn’t — it’s a slower, shorter thread on diluted material. The model holds exactly what you want it thinking about. The thread runs longer and the work stays sharper. Deciding what context each task actually needs before you load it.

These habits pay off on work that builds over time. For a one-off question, skip them and just pick the right model for the task.

Two roads compared: on Claude's road, three labeled breaks the driver cut; on the other road, an unmarked compaction zone consuming the road behind the car.
FIG. C — WHO COMPRESSES, AND WHEN. A BREAK YOU CUT IS A SUMMARY YOU AUTHORED.

Is the process overhead worth it?

Managing context, breaking work across threads, writing a handoff note at the end of a session — that’s real work. Worth asking honestly whether it pays.

Working through multiple drafts and iterations with a model that isn’t losing the thread — isn’t quietly forgetting earlier decisions or drifting from the original problem — produces better work. That part is straightforward.

The thing that’s hard to see until you’ve lived it: when I write two thousand words the usual way, I get attached to the sentences. Tearing the structure down means rewriting all of them, so I defend what I have instead of improving it. When Claude holds the draft, the sentences stop being the investment. The ideas are the investment — what the piece argues, how it’s ordered, what it’s built to do. Those survive any rewrite. The prose is just the cheap layer, regenerated from the thinking you already banked. You can reshape the whole thing with a few prompts and feel nothing, because nothing you actually valued got destroyed.

Claude is a raw tool. Powerful, but raw. A software engineer has an IDE — one place that holds everything, tracks every change, keeps the project coherent across sessions. Nothing like that exists for knowledge work. So you assemble it yourself.

That assembly is what Paper Two is about.

Part Three

What does this look like exactly?

For each paper I create a separate Claude Project. The project instructions guide and track the work — what I’m building, what’s been decided, where I am in the process. I maintain a set of reference documents in Google Drive that travel with the project: research I’ve accumulated, frameworks I’ve developed, decisions I’ve locked. And I use the three habits in the table above consistently. Every paper I’ve written in the past several months was built this way and the framework has evolved throughout.

It’s built entirely on native Claude features. Nothing exotic — just Project instructions, context discipline, and a handful of habits applied consistently. Together they get closer to a knowledge worker’s IDE than anything else I’ve found.

Paper Two covers the framework in full:

If you’ve read this far, I assume you’re trying to solve the same problems I was.

Get in the driver’s seat — the next paper takes you much further down far trickier roads.

Provenance

ModelClaude Opus 4.8 (S01, S03, S04, S06 · Extra, S07 · High/Extra, S10 prose forks) · Claude Sonnet 4.6 (S03b · Medium, S05 · Max, S11 build) · Claude Sonnet Max · Thinking off (S11 writing refinement pass) · Claude Opus 4.8 · Medium · Thinking off (S11 final in-project review) · Claude Fable 5 (S08–S10 production and HTML merge) · 2026-06-05/12
Versionv2.2 · HTML · June 2026
SessionWPP-S01 through WPP-S11 · 2026-06-05 through 2026-06-12 · 11 sessions across 8 days
Audienceclaude.ai Pro researcher / analyst / writer (primary); LinkedIn reader; industry reader / product leader (secondary)
SourcesWPP reference manual v0.4 (Opus 4.8, S03 — foundational claim inventory); white paper v0.6–v0.11 markdown (Opus 4.8, Fable 5, successive drafts); author editorial pass (Google Doc, June 12, 2026); Gemini external review (15 findings, June 11, 2026); Sonnet 4.6 Max verification against Anthropic primary sources: support.claude.com/en/articles/11647753, /12111783, /11473015; GitHub issue #25759 (all accessed June 11, 2026); Charter v0.1 → v0.2 → v0.3; S01–S10 session logs and handoffs; output-formatting, svg-diagrams, provenance, session-handoff, my-voice skills
Decision / ActionS01–S02 (Jun 5–6): Project scoped. Terminal question locked: how should a knowledge worker manage model, effort, and features to optimize quality and cost. Charter v0.1. Reference manual established as claim foundation. S03/S03b (Jun 7): Parallel sessions — reference manual v0.4 (Opus 4.8); story-framing pass (Sonnet 4.6 · Medium). S04–S05 (Jun 8): Charter v0.2. Working title: “A Fresh Look at Claude for Deep Knowledge Work.” HTML/GitHub target confirmed. SVG diagram approach established. S06 (Jun 9): Title locked as “Driving with a Stick” (Opus 4.8 · Extra). Stick shift metaphor consolidated. S07 (Jun 9, Opus 4.8 · High/Extra): Structural inflection point. Read the actual paper; found two documents stapled together — essay wrapped around a reworded manual. Split executed: essay stripped to argument only (v0.5), companion manual scoped as Paper 2. Thesis locked: reading the work (complexity + kind of thinking) and matching to what you can afford — model/effort/context are instruments of that match. Context persistence named as Claude’s differentiator. Practice observations moved to data table. v0.6 produced (~1,700 words). S08 (Jun 9–10): Charter v0.3 installed. Blind-review results (queued from S07) ingested. Gemini 15-finding triage locked (Fable 5); 2 genuine items, 13 rejected. S09 (Jun 10): Spine rebuilt around capability-then-meter argument (Fable 5). v0.9 — 33-item batch pass. S10 (Jun 11): Opus 4.8 five-fork rewrite → v0.10 (evergreen, “weigh the work” frame) → v0.11 (rhythm/flow pass). Fable 5 merged to HTML (v0.12). Standalone verification thread (Sonnet 4.6 Max) confirmed all product mechanics against Anthropic primary sources; Phase 1 gate satisfied. S11 (Jun 12): Gemini reconciliation completed. Author editorial pass: title changed, Part One rewritten in direct personal voice, Weigh it first merged into Part Two intro, Opus token-doubling story added, Sonnet Max example placed in budget section, overhead scaffolding removed, table rebuilt 3→4 columns, Part Three rewritten as Paper 2 teaser. Sonnet Max · Thinking off used for section-by-section writing refinement. HTML rebuilt via Python from settled prose. v2.0 delivered. v2.1: provenance rebuilt from session logs. v2.2 (final pass, in-project, Opus 4.8 · Medium · Thinking off): thread reviewed against Charter goals and verification record. Two accuracy corrections applied — “weekly cap” → “weekly limit” (vocabulary lock), and compaction signal softened from “it tells you when it happens” to “it marks the spot when it does” (accurate to the quiet indicator). Two Charter deltas named for close: title change from “Driving with a Stick” and LinkedIn-native register now canonical for the deliverable.
Iteration notesv2.2: Final in-project review — two accuracy corrections, two Charter deltas named (Opus 4.8 · Medium · Thinking off, S11). v2.1: Provenance rebuild from session logs (Sonnet 4.6, S11). v2.0: Major editorial pass — title, voice, structure (Sonnet Max · Thinking off, S11). v0.12: Four post-merge calls (Fable 5, Hugh-directed). v0.11: Rhythm/flow pass (Opus 4.8, S10). v0.10: Five-fork rewrite — evergreen, manual removed, “weigh the work,” Part Three cut (Opus 4.8, S10). v0.9: Spine rebuild + 33-item batch (Fable 5, S09). v0.6: Structural split — essay only, thesis locked, context persistence named differentiator (Opus 4.8, S07). v0.1–v0.5: Reference manual → argument essay evolution, S01–S07.
AssumptionsProduct mechanics verified against Anthropic primary sources (Sonnet 4.6 Max verification thread, June 11, 2026): compaction behavior, Settings usage page, Project retrieval, dual clocks (session window + weekly limit). Phase 1 claim-verification complete. HTML publishes from claude-guide/.
Scope exclusionsSettings/how-to manual (Paper 2); v0.1–v0.5 draft series (reference manual phase); coding, agentic work, Cowork, speed; the developer API.
Tool chainGoogle Drive fetch (session logs and handoffs S01–S10, author edited source, Charter v0.3); Python rebuild script (HTML production); CSS + SVG extracted from prior build; web_fetch (GitHub Pages); Anthropic source verification (Sonnet 4.6 Max, S10); output-formatting, provenance, svg-diagrams skills
Review statusv2.2 — final in-project review complete (Opus 4.8 · Medium); accuracy corrections applied; Charter deltas named; ready to publish