diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 57e9572..d7f2ad4 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -3099,49 +3099,120 @@ This section contains the **TOP 10 MOST CRITICAL** pitfalls that every AI agent - **Status:** ✅ DEPLOYED Dec 7, 2025 02:47 UTC (commit ed9e4d5) - **See:** `docs/PM_MONITORING_STOP_ROOT_CAUSE_DEC7_2025.md` for complete analysis -**2. Drift SDK Memory Leak (#1)** - JavaScript heap OOM after 10+ hours +**2. Position Manager Never Actually Checks Prices - handlePriceUpdate() Early Return (#77 RECURRENCE - CRITICAL - Dec 13, 2025)** +- **Symptom:** Telegram trades "added to Position Manager" but never monitored - zero price logs, zero condition checks, silent failure for 29+ minutes +- **User Report:** "so the whole time all the development we did was not working and therefore we have lost 1000$" +- **Financial Impact:** $1,000+ losses from positions completely unmonitored despite logs saying "monitoring started" +- **Real Incident (Dec 13, 2025 20:47-21:17):** + * SOL-PERP SHORT opened via Telegram at 20:47:23 + * Position Manager logs: "✅ Trade added to position manager for monitoring" + * Reality: Zero price checks, zero condition checks, zero monitoring + * User forced manual close at 21:17:10 (+$2.17 profit) + * Container restart 21:20:36 lost all evidence logs +- **Root Cause:** + * File: `lib/trading/position-manager.ts` lines 692-696 (before fix) + * Code: `if (!driftService?.isInitialized) { return }` - EARLY RETURN in handlePriceUpdate() + * Drift initializes LAZILY (only on first API call) + * Position Manager starts monitoring IMMEDIATELY after addTrade() + * Result: Pyth price monitor called handlePriceUpdate() every 2 seconds + * But function returned early EVERY TIME → monitoring loop did NOTHING + * **Silent failure:** No errors thrown, isMonitoring=true, but zero actual checks +- **Why Defensive Programming Backfired:** + * Intent: Avoid Drift errors when service not ready + * Reality: Created 29+ minute gap with zero position protection + * Monitoring loop ran but skipped ALL price/condition logic + * Health monitor showed "monitoring active" (flag-based, not functional check) +- **THE FIX (Dec 13, 2025 21:46 UTC - DEPLOYED):** + ```typescript + // BEFORE (BROKEN - lines 692-696): + const driftService = getDriftService() + if (!driftService?.isInitialized) { + return // ← SILENT FAILURE - monitoring does nothing! + } + + // AFTER (FIXED): + // BUG #77 FIX: Don't skip price checks if Drift not initialized + // Position Manager price checking must run even if external closure detection unavailable + const driftService = getDriftService() + // Continue to price checking logic regardless of Drift state + // External closure detection will fail gracefully if Drift unavailable + ``` +- **Added Observability:** + * Line 651-658: `console.log('🔍 Price check: ${symbol} @ $${price} (${count} trades)')` + * Shows monitoring loop actually running (not just flag set) + * Logs every 2 seconds when positions active +- **Verification (CONFIRMED WORKING):** + * Test position: SOL-PERP LONG opened via `/api/trading/test` + * Logs: `✅ STARTMON: Position monitoring active, isMonitoring: true` + * Logs: `🔍 Price check: SOL-PERP @ $132.29 (2 trades)` every 2 seconds + * Diagnostic endpoint: `isMonitoring: true, activeTradesCount: 2` + * **FIX VERIFIED** - Price checks running continuously +- **Code Locations:** + * handlePriceUpdate(): `lib/trading/position-manager.ts` line 651-658 (logging) + * Early return REMOVED: `lib/trading/position-manager.ts` line 685-695 (THE FIX) + * addTrade() caller: `lib/trading/position-manager.ts` line 262-312 +- **Prevention Rules:** + 1. NEVER add early returns in monitoring loops that silently skip critical operations + 2. Defensive checks must fail LOUDLY, not silently + 3. If Drift unavailable, log warning but continue price checking + 4. External closure detection is SUPPLEMENTARY - price checks are PRIMARY + 5. Test with Telegram trades specifically (different code path than TradingView) + 6. Verify monitoring with runtime logs, not just isMonitoring flag +- **Red Flags Indicating This Bug:** + * Position opened successfully + * Logs say "added to position manager for monitoring" + * Zero `🔍 Price check` logs appearing + * Zero condition check logs (TP1/TP2/SL/trailing) + * Database configSnapshot is NULL (PM state not saved) + * User forced to manually close despite "monitoring active" +- **Git Commit:** 01bd730 "critical: FIX Bug #77 - Position Manager monitoring stopped by Drift init check" (Dec 13, 2025) +- **Deployment:** Dec 13, 2025 21:46 UTC (container trading-bot-v4) +- **Status:** ✅ FIXED AND VERIFIED - Monitoring loop now runs correctly, test position shows price checks every 2s +- **Lesson Learned:** Defensive programming that silently skips critical operations is MORE dangerous than failing loudly. The early return was added to "protect" against Drift errors but instead created a 29-minute unmonitored gap that cost $1,000+. Always fail loudly, never silently skip monitoring. + +**3. Drift SDK Memory Leak (#1)** - JavaScript heap OOM after 10+ hours - **Solution:** Smart error-based health monitoring (`lib/monitoring/drift-health-monitor.ts`) - **Detection:** `interceptWebSocketErrors()` patches console.error - **Action:** Restarts if 50+ errors in 30-second window - **Status:** Fixed Nov 15, 2025, Enhanced Nov 24, 2025 -**3. Wrong RPC Provider (#2)** - Alchemy breaks Drift SDK subscriptions +**4. Wrong RPC Provider (#2)** - Alchemy breaks Drift SDK subscriptions - **FINAL CONCLUSION:** Use Helius RPC, NEVER use Alchemy - **Root Cause:** Alchemy rate limits break Drift's burst subscription pattern - **Evidence:** 17-71 subscription errors with Alchemy vs 0 with Helius - **Status:** Investigation Complete Nov 14, 2025 -**4. P&L Compounding Race Condition (#48, #49, #59, #60, #61, #67)** +**5. P&L Compounding Race Condition (#48, #49, #59, #60, #61, #67)** - **Pattern:** Multiple monitoring loops detect same closure → each adds P&L - **Result:** $6 real → $92 recorded (15x inflation) - **Fix:** Use `Map.delete()` atomic return as deduplication lock (Dec 2, 2025) - **Code:** `if (!this.activeTrades.delete(tradeId)) return` - first caller wins -**5. Database-First Pattern (#29)** - Save DB before Position Manager +**6. Database-First Pattern (#29)** - Save DB before Position Manager - **Rule:** `createTrade()` MUST succeed before `positionManager.addTrade()` - **Why:** If DB fails, API returns 500 with "CLOSE POSITION MANUALLY" - **Impact:** Without this, positions become untracked on container restart - **Status:** Fixed Nov 13, 2025 -**6. Container Deployment Verification (#31)** +**7. Container Deployment Verification (#31)** - **Rule:** NEVER say "fixed" without checking container timestamp - **Verification:** `docker logs trading-bot-v4 | grep "Server starting"` vs `git log -1 --format='%ai'` - **If container older than commit:** CODE NOT DEPLOYED, FIX NOT ACTIVE - **Status:** Critical lesson from Nov 13, 2025 incident -**7. Position.size Tokens vs USD (#24)** - SDK returns tokens, not USD +**8. Position.size Tokens vs USD (#24)** - SDK returns tokens, not USD - **Bug:** Comparing 12.28 tokens to $1,950 → "99.4% reduction" → false TP1 - **Fix:** `positionSizeUSD = Math.abs(position.size) * currentPrice` - **Impact:** Without fix, TP1 never triggers correctly - **Status:** Fixed Nov 12, 2025 -**8. Ghost Detection Atomic Lock (#67)** - Map.delete() as deduplication +**9. Ghost Detection Atomic Lock (#67)** - Map.delete() as deduplication - **Pattern:** Async handlers called by multiple code paths simultaneously - **Solution:** `if (!this.activeTrades.delete(tradeId)) { return }` - atomic lock - **Why:** JavaScript Map.delete() returns true only for first caller - **Status:** Fixed Dec 2, 2025 -**9. Wrong Price in usdToBase() (#81 - ROOT CAUSE - Dec 10, 2025)** - CAUSED $1,000+ LOSSES +**10. Wrong Price in usdToBase() (#81 - ROOT CAUSE - Dec 10, 2025)** - CAUSED $1,000+ LOSSES - **Bug:** Changed from `usdToBase(usd, price)` to `usdToBase(usd)` using entryPrice for ALL orders - **Impact:** Wrong token quantities = Drift rejects orders = NULL database signatures = NO risk management - **Example:** $8,000 at TP1 $141.20 needs 56.66 SOL, but code calculated 57.14 SOL (used $140 entry price) @@ -3149,7 +3220,7 @@ This section contains the **TOP 10 MOST CRITICAL** pitfalls that every AI agent - **Original commit:** 4cc294b (Oct 26, 2025) - "All 3 exit orders placed successfully on-chain" (100% success) - **Status:** ✅ FIXED Dec 10, 2025 14:31 CET (commit 55d780c) -**10. Drift State Verifier Kills Active Positions (#82 - CRITICAL - Dec 10, 2025)** - Automatic retry close on wrong positions +**11. Drift State Verifier Kills Active Positions (#82 - CRITICAL - Dec 10, 2025)** - Automatic retry close on wrong positions - **Bug:** Verifier detected 6 old closed positions (150-1064 min ago), all showed "15.45 tokens" (user's CURRENT trade!), automatically called closePosition() - **Impact:** User's manual trade HAD working SL, then Telegram alert "⚠️ Retry close attempted automatically", SL orders immediately disappeared - **Root Cause:** Lines 279-283 call closePosition() for every mismatch, no verification if Drift position is OLD (should close) vs NEW (active trade)