# Bug Reports & Critical Fixes **Historical record of critical incidents, bugs, and their resolutions.** This directory contains CRITICAL_*.md and FIXES_*.md bug reports. Every file documents a real incident that cost money, time, or could have caused financial loss if not caught. --- ## 🚨 Critical Bug Reports ### **Position Management** - `CRITICAL_FIX_POSITION_SIZE_BUG.md` - Smart Entry using webhook percentage as signal price - **Impact:** $89 position sizes instead of $2,300, 97% pullback calculations - **Root Cause:** TradingView webhook sent percentage (70.80) not price ($142.50) - **Fix:** Use Pyth oracle price instead of webhook signal.price - **Date:** Dec 3, 2025 - `CRITICAL_INCIDENT_UNPROTECTED_POSITION.md` - Database-first pattern violation - **Impact:** Position opened with NO database record, NO Position Manager tracking - **Root Cause:** Position Manager added before database save - **Fix:** Database save FIRST, then Position Manager add - **Date:** Nov 13, 2025 - `CRITICAL_TP1_FALSE_DETECTION_BUG.md` - TP1 detection fails when on-chain orders fill fast - **Impact:** Winners marked as "SL" exits, analytics incorrect - **Root Cause:** External closure detected after both TP1 + runner closed - **Fix:** Percentage-based exit reason inference - **Date:** Nov 19, 2025 ### **Trade Execution** - `CRITICAL_MISSING_TRADES_NOV19.md` - Trades executed but not in database - **Impact:** 3 trades missing from database, P&L values inflated 5-14× - **Root Cause:** P&L compounding in external closure detection - **Fix:** Don't mutate trade.realizedPnL during calculation - **Date:** Nov 19, 2025 - `CRITICAL_ISSUES_FOUND.md` - Multiple critical discovery incidents - Container restart killing positions - Phantom detection killing runners - Field name mismatches in startup validation - **Dates:** Various Nov 2025 --- ## 🔧 Fix Documentation ### **Runner System** - `FIXES_RUNNER_AND_CANCELLATION.md` - Runner stop loss gap fixes - **Problem:** No SL protection between TP1 and TP2 - **Solution:** Explicit runner SL check in monitoring loop - **Impact:** Prevented unlimited risk exposure on 25-40% runner - **Date:** Nov 15, 2025 ### **Applied Fixes** - `FIXES_APPLIED.md` - Collection of multiple bug fixes - TP1 detection - P&L calculation - External closure handling - Order cancellation - **Period:** Nov 2025 --- ## 📋 Common Bug Patterns ### **1. P&L Compounding (Multiple Incidents)** **Pattern:** Monitoring loop detects closure multiple times, accumulates P&L **Symptoms:** Database shows 5-20× actual P&L, duplicate Telegram notifications **Root Cause:** Async operations + monitoring loop = race condition **Solution:** Delete from Map IMMEDIATELY (atomic operation) before any async work **Example Fixes:** - Common Pitfall #49 (Nov 17, 2025) - Common Pitfall #61 (Nov 22, 2025) - Common Pitfall #67 (Dec 2, 2025) ### **2. Database-First Violations** **Pattern:** In-memory state updated before database write **Symptoms:** Container restart loses tracking, positions unprotected **Root Cause:** Developer assumes database always succeeds **Solution:** Always database write FIRST, then update in-memory state **Example Fixes:** - Common Pitfall #29 (Nov 13, 2025) - CRITICAL_INCIDENT_UNPROTECTED_POSITION.md ### **3. External Closure Detection** **Pattern:** Position closed on-chain, Position Manager detects late **Symptoms:** Duplicate updates, wrong exit reasons, ghost positions **Root Cause:** Drift state propagation delay (5-10 seconds) **Solution:** Verification wait + closingInProgress flag + ghost detection **Example Fixes:** - Common Pitfall #47 (Nov 16, 2025) - Close verification gap - Common Pitfall #56 (Nov 20, 2025) - Ghost orders - Common Pitfall #57 (Nov 20, 2025) - P&L accuracy ### **4. Unit Conversion Errors** **Pattern:** Tokens vs USD, percentage vs decimal, on-chain vs display units **Symptoms:** Position sizes off by 100×+, TP/SL at wrong prices **Root Cause:** SDK returns different units than expected **Solution:** Always log raw values, verify units explicitly **Example Fixes:** - Common Pitfall #22 (Nov 12, 2025) - position.size is TOKENS not USD - Common Pitfall #68 (Dec 3, 2025) - Signal price is percentage not USD --- ## 🔍 Debugging Checklist **When investigating bugs:** 1. **Check logs first:** ```bash docker logs -f trading-bot-v4 | grep -E "CRITICAL|ERROR|Failed" ``` 2. **Verify deployment:** ```bash # Container start time MUST be > commit timestamp docker inspect trading-bot-v4 --format='{{.State.StartedAt}}' git log -1 --format='%ai %s' ``` 3. **Query database state:** ```sql -- Check for inconsistencies SELECT * FROM "Trade" WHERE exitReason IS NULL AND (createdAt < NOW() - INTERVAL '1 hour') ORDER BY createdAt DESC; ``` 4. **Compare to Drift:** ```bash # Get actual position from Drift curl -X GET http://localhost:3001/api/trading/positions \ -H "Authorization: Bearer $API_SECRET_KEY" ``` 5. **Search Common Pitfalls:** ```bash grep -n "symptom_keyword" .github/copilot-instructions.md | head -20 ``` --- ## 📝 Creating Bug Reports **Required Sections:** ```markdown # [Bug Title] (CRITICAL - Fixed [Date]) **Symptom:** [What user observed] **Root Cause:** [Technical explanation] **Real Incident ([Date]):** * [Specific trade or event] * [Expected behavior] * [Actual behavior] * [Financial impact] **Impact:** [Scope of problem] **Fix ([Date]):** ```typescript // Code showing the fix ``` **Files changed:** [List of files] **Git commit:** [Commit hash] **Deployed:** [Deployment timestamp] **Verification Required:** * [How to test fix works] * [Expected logs/behavior] **Lessons Learned:** 1. [Key insight #1] 2. [Key insight #2] ``` **Naming Convention:** - `CRITICAL_[TOPIC]_BUG.md` - Single critical bug - `CRITICAL_[TOPIC]_[DATE].md` - Dated incident report - `FIXES_[SYSTEM].md` - Multiple related fixes --- ## ⚠️ Prevention Guidelines **From Past Incidents:** 1. **Always verify deployment before declaring "fixed"** - Container restart timestamp > commit timestamp - Test with actual trade if possible - Check logs for expected behavior change 2. **Never trust SDK data formats** - Log raw values first - Verify units explicitly (tokens vs USD, % vs decimal) - Check SDK docs are often wrong 3. **Database writes before in-memory updates** - Save to database FIRST - Only then update Position Manager, caches, etc. - If DB fails, don't proceed 4. **Async operations need serialization** - Delete from Map IMMEDIATELY (atomic lock) - Don't check Map.has() then delete later (race condition) - Use Map.delete() return value as lock 5. **External closures need verification** - Wait 5-10 seconds for Drift state propagation - Query Drift to confirm position actually closed - Keep monitoring if verification fails --- ## 📞 Escalation **When to declare "CRITICAL":** - Financial loss occurred or was narrowly avoided - System produced incorrect data (P&L, positions, exits) - Position left unprotected (no monitoring, no TP/SL) - Bug could recur and cause future losses **Response Required:** 1. Stop trading immediately if data integrity affected 2. Document incident in this directory 3. Add to Common Pitfalls in copilot-instructions.md 4. Fix + deploy + verify within 24 hours 5. Update relevant architecture docs --- See `../README.md` for overall documentation structure.