diff --git a/CLAUDE.md b/CLAUDE.md index f7aef09..852d6d8 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -131,6 +131,87 @@ If user wants an ongoing project: --- +## Model Selection Strategy + +**Core Principle:** Avoid context loss between models. Using a cheaper model for execution and a more expensive model for error recovery costs more in total tokens than using the appropriate model from the start. + +### Decision Tree + +**1. Trivial, read-only tasks (zero risk)?** + - Examples: `git status`, checking if a file exists, reading a single file + - **Use: HAIKU** + - Rationale: Fastest, cheapest, no context needed, no execution risk + +**2. Standard task with a clear plan?** + - **Default: OPUS (plans AND executes in one shot)** + - **Rare Exception: SONNET for execution-only IF:** + - Plan is 100% mechanical (no decisions needed, pure step-following) + - AND error probability is extremely low (documented, tested system) + - AND the cost saving actually matters for the specific task + - Rationale: Opus adapts on-the-fly, handles surprises without re-planning overhead + +**3. Complex, risky, or unknown-territory tasks?** + - Examples: Host infrastructure scans, service restarts, SSH operations, debugging, anything with potential for surprises + - **Use: OPUS** + - Rationale: Lowest error rate, best context understanding, avoids costly error recovery + +### Why This Matters: Context Loss is Expensive + +**Anti-pattern: Model Switching** +``` +Opus plans → Sonnet/Haiku executes → Reality differs → Opus must blind-debug += Token cost: Plan tokens + Execution tokens + Error recovery tokens +``` + +**Better: One-shot Opus** +``` +Opus plans AND executes, adapts on-the-fly, handles surprises += Token cost: Plan tokens + Execution tokens (no re-planning overhead) +``` + +**Real-world execution rarely matches plans exactly** because: +- Unexpected file structures or permissions +- System state differences from documentation +- Commands that succeed but produce different output +- Permission errors or authentication issues +- Configuration differences in different environments + +When Opus executes the plan, it can adapt in real-time without: +- Losing context between model switches +- Re-explaining the situation to a different model +- Incurring planning overhead again + +### Concrete Examples + +| Task | Model | Reasoning | +|------|-------|-----------| +| `git status` | HAIKU | Read-only, no execution risk | +| Read a config file | HAIKU | Read-only, no execution risk | +| Host infrastructure scan | OPUS | Complex, multiple hosts, recursive discovery, adapts to surprises | +| Service restart | OPUS | Risk of unexpected state, error handling needed | +| SSH operations | OPUS | Unknown system state, permission issues possible | +| Codebase refactoring | OPUS | Multiple files, architectural decisions, error recovery critical | +| Deployment script (well-tested) | SONNET (rare) | Only if plan is 100% mechanical AND low error risk AND cost matters | +| Debugging a production issue | OPUS | Unknown territory, needs real-time adaptation | +| DNS record check | HAIKU | Read-only lookup | +| Firewall rule modification | OPUS | Complex state, multiple systems affected, documentation updates needed | +| Running documented commands | SONNET (rare) | Only if commands are proven, output predictable, error probability <1% | + +### Anti-Patterns to Avoid + +| Anti-Pattern | Why It Fails | Cost | +|--------------|-------------|------| +| Use Haiku/Sonnet to "save tokens" | Failures cost more in error recovery | ❌ False economy | +| Plan with Opus, execute with Haiku, fix errors with Opus | Context loss between models | ❌ Most expensive option | +| Sonnet for everything "to balance speed/cost" | Unclear when it's appropriate | ❌ Inconsistent, risky | +| Switch models mid-task based on "looking easy" | Real execution rarely matches expectations | ❌ Context loss | + +### When in Doubt + +**Default to OPUS.** Token cost of unnecessary Opus usage is typically less than the token cost of error recovery with a cheaper model. Better to overshoot on capability than to undershoot and pay for recovery. + +--- + ## Document Structure ### copilot-instructions.md (Development Guidelines)