diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index b466e76..06d3f72 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -747,16 +747,16 @@ docs/COMMON_PITFALLS.md ``` **Top 10 Critical Pitfalls (Summary):** -1. **Position Manager Never Monitors (#73, #74)** - Container running old code = NO MONITORING = $108 loss -2. **Drift SDK Memory Leak (#1)** - JS heap OOM after 10+ hours → Smart health monitoring -3. **Wrong RPC Provider (#2)** - Alchemy breaks Drift SDK → Use Helius only -4. **P&L Compounding Race Condition (#48, #49, #61)** - Multiple closures → Atomic Map.delete() -5. **Database-First Pattern (#29)** - Save DB before Position Manager -6. **Container Deployment Verification (#31)** - Always check container timestamp -7. **Position.size Tokens vs USD (#24)** - SDK returns tokens → multiply by price -8. **External Closure Race Condition (#67)** - 16 duplicate notifications → Atomic lock -9. **Smart Entry Wrong Price (#66)** - Use Pyth oracle, not webhook percentage -10. **MAE/MFE Wrong Units (#54)** - Store percentages, not dollars +1. **Position Manager Never Monitors (#77)** - Logs say "added" but isMonitoring=false = $1,000+ losses +2. **Silent SL Placement Failure (#76)** - placeExitOrders() returns SUCCESS with 2/3 orders, no SL protection +3. **Orphan Cleanup Removes Active Orders (#78)** - cancelAllOrders() affects ALL positions on symbol +4. **Wrong Year in SQL Queries (#75)** - Query 2024 dates when current is 2025 = 12× inflated results +5. **Drift SDK Memory Leak (#1)** - JS heap OOM after 10+ hours → Smart health monitoring +6. **Wrong RPC Provider (#2)** - Alchemy breaks Drift SDK → Use Helius only +7. **P&L Compounding Race Condition (#48, #49, #61)** - Multiple closures → Atomic Map.delete() +8. **Database-First Pattern (#29)** - Save DB before Position Manager +9. **Container Deployment Verification (#31)** - Always check container timestamp +10. **External Closure Race Condition (#67)** - 16 duplicate notifications → Atomic lock **How to Use:** - **Quick lookup:** Check Quick Reference Table in `docs/COMMON_PITFALLS.md` @@ -2025,9 +2025,83 @@ scoreSignalQuality({ * If quality 80-84 wins but 85-90 loses → Adjust to 85 threshold - **Possible outcomes:** Keep 91, lower to 85, adjust ADX/RSI weights, add context filters -### 2. Position Manager (`lib/trading/position-manager.ts`) +### 2. Position Manager Health Monitoring System (`lib/health/position-manager-health.ts`) +**Purpose:** Detect Position Manager failures within 30 seconds to prevent $1,000+ loss scenarios + +**CRITICAL (Dec 8, 2025):** Created after discovering three bugs that caused $1,000+ losses: +- Bug #77: Position Manager logs "added" but never actually monitors (isMonitoring=false) +- Bug #76: placeExitOrders() returns SUCCESS but SL order missing (silent failure) +- Bug #78: Orphan detection removes active position orders (cancelAllOrders affects all) + +**Key Functions:** +- `checkPositionManagerHealth()`: Returns comprehensive health check result + - DB open trades vs PM monitoring status + - PM has trades but monitoring OFF + - Missing SL orders (checks slOrderTx, softStopOrderTx, hardStopOrderTx) + - Missing TP1/TP2 orders + - DB vs PM vs Drift count mismatches +- `startPositionManagerHealthMonitor()`: Runs automatically every 30 seconds + - Logs CRITICAL alerts when issues found + - Silent operation when system healthy + - Started automatically in startup sequence + +**Health Checks Performed:** +1. **DB open trades but PM not monitoring** → CRITICAL ALERT +2. **PM has trades but monitoring OFF** → CRITICAL ALERT +3. **Open positions missing SL orders** → CRITICAL ALERT per position +4. **Open positions missing TP orders** → WARNING per position +5. **DB vs PM trade count mismatch** → WARNING +6. **PM vs Drift position count mismatch** → WARNING + +**Alert Format:** +``` +🚨 CRITICAL: Position Manager not monitoring! + DB: 2 open trades + PM: 2 trades in Map + Monitoring: false ← BUG! + +🚨 CRITICAL: Position cmix773hk019gn307fjjhbikx missing SL order + Symbol: SOL-PERP + Size: $2,003 + slOrderTx: NULL + softStopOrderTx: NULL + hardStopOrderTx: NULL +``` + +**Integration:** +- File: `lib/startup/init-position-manager.ts` line ~78 +- Starts automatically after Drift state verifier +- Runs alongside: data cleanup, blocked signals, stop hunt, smart validation +- No manual intervention needed + +**Test Suite:** +- File: `tests/integration/position-manager/monitoring-verification.test.ts` (201 lines) +- 4 test suites, 8 test cases: + * "CRITICAL: Monitoring Actually Starts" (4 tests) + * "CRITICAL: Price Updates Actually Trigger Checks" (2 tests) + * "CRITICAL: Monitoring Stops When No Trades" (2 tests) + * "CRITICAL: Error Handling Doesnt Break Monitoring" (1 test) +- Validates: startMonitoring() calls Pyth monitor, isMonitoring flag set, price updates processed +- Mocks: drift/client, pyth/price-monitor, database/trades, notifications/telegram + +**Why This Matters:** +- **This is a REAL MONEY system** - Position Manager is the safety net +- User lost $1,000+ because PM said "monitoring" but wasn't +- Positions appeared protected but had no monitoring whatsoever +- Health monitor detects failures within 30 seconds +- Prevents catastrophic silent failures + +**Deployment Status:** +- ✅ Code complete and committed (Dec 8, 2025) +- ⏳ Deployment pending (Docker build blocked by DNS) +- ✅ Startup integration complete +- ✅ Test suite created + +### 3. Position Manager (`lib/trading/position-manager.ts`) **Purpose:** Software-based monitoring loop that checks prices every 2 seconds and closes positions via market orders +**CRITICAL BUG (#77):** Logs say "added to monitoring" but isMonitoring stays false - see Health Monitoring System above for detection + **Singleton pattern:** Always use `getInitializedPositionManager()` - never instantiate directly ```typescript const positionManager = await getInitializedPositionManager() @@ -2182,6 +2256,12 @@ On container startup, cross-checks last 24h of "closed" trades against actual Dr - `placeExitOrders()` - Places TP/SL orders on-chain - `cancelAllOrders()` - Cancels all reduce-only orders for a market +**CRITICAL BUG (#76 - Dec 8, 2025):** placeExitOrders() can return SUCCESS with missing SL order +- Symptom: Logs "Exit orders placed: [2 signatures]" but SL missing (expected 3) +- Impact: Position completely unprotected from downside +- Detection: Health monitor checks slOrderTx/softStopOrderTx/hardStopOrderTx every 30s +- Fix required: Validate signatures.length before returning, add error handling around SL placement + **CRITICAL: Transaction Confirmation Pattern** Both `openPosition()` and `closePosition()` MUST confirm transactions on-chain: ```typescript @@ -3055,6 +3135,291 @@ This section contains the **TOP 10 MOST CRITICAL** pitfalls that every AI agent - **Git commit:** [Document wrong year SQL query lesson - Dec 8, 2025] - **Status:** ✅ Documented - Future AI agents must verify year in date queries +76. **CRITICAL: Silent SL Placement Failure - placeExitOrders() Returns SUCCESS With Missing Orders (CRITICAL - Dec 8, 2025):** + - **Symptom:** Position opened with TP1 and TP2 orders but NO stop loss, completely unprotected from downside + - **User Report:** "when i opened the manually trade we hade a sl and tp but it was removed by the system" + - **Financial Impact:** Part of $1,000+ losses - positions left open with no SL protection + - **Real Incident (Dec 8, 2025 13:39:24):** + * Trade: cmix773hk019gn307fjjhbikx + * Symbol: SOL-PERP LONG at $138.45, size $2,003 + * TP1 order EXISTS: 2QzE4q9Q... ($139.31) + * TP2 order EXISTS: 5AQRiwRK... ($140.17) + * SL order MISSING: NULL in database (should be $137.16) + * stopLossPrice: Correctly calculated ($137.1551) and passed to placeExitOrders() + * Logs: "📨 Exit orders placed on-chain: [2 signatures]" (expected 3!) + * Function returned: `{success: true, signatures: [tp1Sig, tp2Sig]}` (SL missing) + - **Root Cause:** + * File: `lib/drift/orders.ts` function `placeExitOrders()` (lines 252-495) + * Lines 465-473: TRIGGER_MARKET SL placement code exists but never executed + * No "🛡️ Placing SL..." log found in container logs + * No error handling around SL placement section + * Function returns SUCCESS even if signatures.length < 3 + * No validation before return statement + - **Why It's Silent:** + * placeExitOrders() doesn't check signatures.length before returning + * Execute endpoint trusts SUCCESS status without validation + * No alerts, no errors, no indication of failure + * Position appears protected but actually isn't + - **How It Bypasses Checks:** + * Size check: Position 14.47 SOL >> minOrderSize 0.1 SOL (146× above threshold) + * All inputs valid: stopLossPrice calculated correctly, market exists, wallet has balance + * Code path exists but doesn't execute - unknown reason (rate limit? SDK bug? network?) + * Function returns early or skips SL section without throwing error + - **Fix Required (Not Yet Implemented):** + ```typescript + // In lib/drift/orders.ts at end of placeExitOrders() (around line 490) + const expectedCount = useDualStops ? 4 : 3 // TP1 + TP2 + SL (+ hard SL if dual) + if (signatures.length < expectedCount) { + console.error(`❌ CRITICAL: Only ${signatures.length}/${expectedCount} exit orders placed!`) + console.error(` Expected: TP1 + TP2 + SL${useDualStops ? ' + Hard SL' : ''}`) + console.error(` Got: ${signatures.length} signatures`) + return { + success: false, + error: `Missing orders: expected ${expectedCount}, got ${signatures.length}`, + signatures + } + } + // Add try/catch around SL placement section (lines 346-476) + // Log errors explicitly if SL placement fails + ``` + - **Execute Endpoint Fix Required:** + ```typescript + // In app/api/trading/execute/route.ts after placeExitOrders() (around line 940) + const expectedSigs = config.useDualStops ? 4 : 3 + if (exitRes.signatures && exitRes.signatures.length < expectedSigs) { + console.error(`❌ CRITICAL: Missing exit orders!`) + console.error(` Expected: ${expectedSigs}, Got: ${exitRes.signatures.length}`) + await logCriticalError('MISSING_EXIT_ORDERS', { + symbol, expectedCount: expectedSigs, + actualCount: exitRes.signatures.length, tradeId: trade.id + }) + } + ``` + - **Detection: Health Monitoring System (Dec 8, 2025):** + * File: `lib/health/position-manager-health.ts` (177 lines) + * Function: `checkPositionManagerHealth()` runs every 30 seconds + * Check: Open positions missing SL orders → CRITICAL ALERT per position + * Validates: slOrderTx, softStopOrderTx, hardStopOrderTx all present + * Log format: "🚨 CRITICAL: Position {id} missing SL order (symbol: {symbol}, size: ${size})" + * Started automatically via `lib/startup/init-position-manager.ts` line ~78 + - **Why This Matters:** + * **This is a REAL MONEY system** - no SL = unlimited loss exposure + * Position can drop 5%, 10%, 20% with no protection + * User may be asleep, away, unavailable for hours + * Silent failures are the most dangerous kind + * Function says "success" but position is unprotected + - **Prevention Rules:** + 1. ALWAYS validate signatures.length matches expected count + 2. NEVER return success without verifying all orders placed + 3. ADD try/catch around ALL order placement sections + 4. LOG errors explicitly, don't fail silently + 5. Health monitor will detect missing orders within 30 seconds + 6. Execute endpoint must validate placeExitOrders() result + - **Red Flags Indicating This Bug:** + * Logs show "Exit orders placed: [2 signatures]" + * Database slOrderTx field is NULL + * No "🛡️ Placing SL..." log messages + * placeExitOrders() returned success: true + * Position open with TP1/TP2 but no SL + - **Git commit:** [Pending - health monitoring deployed, placeExitOrders() fix pending] + - **Status:** ⚠️ Health monitor deployed (detects issue), root cause fix pending + +77. **CRITICAL: Position Manager Never Actually Monitors - Logs Say "Added" But isMonitoring Stays False (CRITICAL - Dec 8, 2025):** + - **Symptom:** System logs "✅ Trade added to position manager for monitoring" but position never monitored + - **User Report:** "we have lost 1000$...... i hope with the new test system this is an issue of the past" + - **Financial Impact:** $1,000+ losses because positions completely unprotected despite logs saying otherwise + - **Real Incident (Dec 8, 2025):** + * Trade: cmix773hk019gn307fjjhbikx created at 13:39:24 + * Logs: "✅ Trade added to position manager for monitoring" + * Database: `configSnapshot.positionManagerState` = NULL (not monitoring!) + * Reality: No price checks, no TP/SL monitoring, no protection whatsoever + * No Pyth price monitor startup logs found + * No price update logs found + * No "checking conditions" logs found + - **Root Cause:** + * File: `lib/trading/position-manager.ts` (2027 lines) + * Function: `addTrade()` (lines 257-271) - Adds to Map, calls startMonitoring() + * Function: `startMonitoring()` (lines 482-518) - Calls priceMonitor.start() + * Problem: startMonitoring() exists and looks correct but doesn't execute properly + * No verification that monitoring actually started + * No health check that isMonitoring matches activeTrades.size + * Pyth price monitor never starts (no WebSocket connection logs) + - **Why It's Catastrophic:** + * System SAYS position is protected + * User trusts the logs + * Position actually has ZERO protection + * No TP/SL checks, no emergency stop, no trailing stop + * Position can move 10%+ with no action + * Database shows NULL for positionManagerState (smoking gun) + - **The Deception:** + * Log message: "✅ Trade added to position manager for monitoring" + * Reality: Trade added to Map but monitoring never starts + * isMonitoring flag stays false + * No price monitor callbacks registered + * Silent failure - no errors thrown + - **Detection: Health Monitoring System (Dec 8, 2025):** + * File: `lib/health/position-manager-health.ts` (177 lines) + * Function: `checkPositionManagerHealth()` runs every 30 seconds + * Critical Check #1: DB has open trades but PM not monitoring + * Critical Check #2: PM has trades but isMonitoring = false + * Critical Check #3: DB vs PM trade count mismatch + * Alert format: "🚨 CRITICAL: Position Manager not monitoring! DB: {dbCount} open trades, PM: {pmCount} trades, Monitoring: {isMonitoring}" + * Started automatically via `lib/startup/init-position-manager.ts` line ~78 + - **Test Suite Created:** + * File: `tests/integration/position-manager/monitoring-verification.test.ts` (201 lines) + * Test Suite: "CRITICAL: Monitoring Actually Starts" (4 tests) + - Validates startMonitoring() calls priceMonitor.start() + - Validates symbols array passed correctly + - Validates isMonitoring flag set to true + - Validates monitoring doesn't start twice + * Test Suite: "CRITICAL: Price Updates Actually Trigger Checks" (2 tests) + * Test Suite: "CRITICAL: Monitoring Stops When No Trades" (2 tests) + * Test Suite: "CRITICAL: Error Handling Doesnt Break Monitoring" (1 test) + * Purpose: Validate Position Manager ACTUALLY monitors, not just logs "added" + - **Fix Required (Not Yet Implemented):** + ```typescript + // In lib/trading/position-manager.ts after startMonitoring() call (around line 269) + // Add verification that monitoring actually started + if (this.activeTrades.size > 0 && !this.isMonitoring) { + console.error(`❌ CRITICAL: Failed to start monitoring!`) + console.error(` Active trades: ${this.activeTrades.size}`) + console.error(` isMonitoring: ${this.isMonitoring}`) + await logCriticalError('MONITORING_START_FAILED', { + activeTradesCount: this.activeTrades.size, + symbols: Array.from(this.activeTrades.values()).map(t => t.symbol) + }) + } + ``` + - **Why This Matters:** + * **This is a REAL MONEY system** - no monitoring = no protection + * TP/SL orders can fail, monitoring is the backup + * Position Manager is the "safety net" - if it doesn't work, nothing does + * User trusts logs saying "monitoring" - but it's a lie + * $1,000+ losses prove this is NOT theoretical + - **Prevention Rules:** + 1. NEVER trust log messages about state - verify actual state + 2. Health checks MUST validate isMonitoring matches activeTrades + 3. Test suite MUST validate monitoring actually starts + 4. Add verification after startMonitoring() calls + 5. Health monitor detects failures within 30 seconds + 6. If monitoring fails to start, throw error immediately + - **Red Flags Indicating This Bug:** + * Logs say "Trade added to position manager for monitoring" + * Database configSnapshot.positionManagerState is NULL + * No Pyth price monitor startup logs + * No price update logs + * No "checking conditions" logs + * Position moves significantly with no PM action + - **Git commit:** [Health monitoring deployed Dec 8, 2025 - detects issue within 30 seconds] + - **Status:** ✅ Health monitor deployed (detects issue), root cause investigation ongoing + +78. **CRITICAL: Orphan Detection Removes Active Position Orders - CancelAllOrders Affects ALL Positions On Symbol (CRITICAL - Dec 8, 2025):** + - **Symptom:** User opens new position with TP/SL orders, system immediately removes them, position left unprotected + - **User Report:** "when i opened the manually trade we hade a sl and tp but it was removed by the system" + - **Financial Impact:** Part of $1,000+ losses - active positions stripped of protection while system tries to close old positions + - **Real Incident Timeline (Dec 8, 2025):** + * **06:46:23** - Old orphaned position: 14.47 SOL-PERP (DB says closed, Drift says open) + * **13:39:24** - User opens NEW manual SOL-PERP LONG at $138.45, size $2,003 + * **13:39:25** - placeExitOrders() places TP1 + TP2 (SL fails silently - Bug #76) + * **13:39:26** - Drift state verifier detects OLD orphan (7 hours old) + * **13:39:27** - System attempts to close orphan via market order + * **13:39:28** - Close fails (Drift state propagation delay 5+ min) + * **13:39:30** - Position Manager removeTrade() calls cancelAllOrders(symbol='SOL-PERP') + * **13:39:31** - cancelAllOrders() cancels ALL SOL-PERP orders (TP1 + TP2 from NEW position) + * **Result** - NEW position left open with NO TP, NO SL, NO PROTECTION + - **Root Cause:** + * File: `lib/trading/position-manager.ts` function `removeTrade()` (lines 275-300) + * Code: `await cancelAllOrders(symbol)` - operates on SYMBOL level, not position level + * Problem: Doesn't distinguish between old orphaned position and new active position + * When closing orphan, cancels orders for ALL positions on that symbol + * User's NEW position gets orders removed while orphan cleanup runs + - **Why It's Dangerous:** + * Orphan detection is GOOD (recovers lost positions) + * But cleanup affects ALL positions on symbol, not just orphan + * If user opens position while orphan cleanup runs, new position loses protection + * Window of vulnerability: 5+ minutes (Drift state propagation delay) + * Multiple close attempts = multiple cancelAllOrders() calls + - **Code Evidence:** + ```typescript + // lib/trading/position-manager.ts lines ~285-300 + async removeTrade(tradeId: string, reason: string) { + const trade = this.activeTrades.get(tradeId) + if (!trade) return + + try { + // PROBLEM: This cancels ALL orders for the symbol + // Doesn't check if other active positions exist on same symbol + await cancelAllOrders(trade.symbol) + console.log(`🧹 Cancelled all orders for ${trade.symbol}`) + } catch (error) { + console.error(`❌ Error cancelling orders:`, error) + } + + this.activeTrades.delete(tradeId) + } + ``` + - **Orphan Detection Context:** + * File: `lib/startup/init-position-manager.ts` function `detectOrphanedPositions()` + * Runs every 10 minutes via Drift state verifier + * Checks: DB says closed but Drift says open → orphan detected + * Action: Attempts to close orphan position + * Side effect: Calls removeTrade() → cancelAllOrders() → affects ALL positions + - **Fix Required (Not Yet Implemented):** + ```typescript + // Option 1: Check Drift position size before cancelling orders + async removeTrade(tradeId: string, reason: string) { + const trade = this.activeTrades.get(tradeId) + if (!trade) return + + try { + // Verify Drift position is actually closed (size = 0) + const driftPosition = await getDriftPosition(trade.symbol) + if (driftPosition && Math.abs(driftPosition.size) > 0.01) { + console.log(`⚠️ Not cancelling orders - Drift position still open`) + return + } + + await cancelAllOrders(trade.symbol) + console.log(`🧹 Cancelled all orders for ${trade.symbol}`) + } catch (error) { + console.error(`❌ Error cancelling orders:`, error) + } + + this.activeTrades.delete(tradeId) + } + + // Option 2: Store order IDs with trade, cancel only those specific orders + // This requires tracking orderIds in trade object + ``` + - **Detection: Health Monitoring System:** + * File: `lib/health/position-manager-health.ts` + * Check: Open positions missing TP1/TP2 orders → WARNING + * Check: Open positions missing SL orders → CRITICAL ALERT + * Detects orders removed within 30 seconds + * Logs: "🚨 CRITICAL: Position {id} missing SL order" + - **Why This Matters:** + * **This is a REAL MONEY system** - removed orders = lost protection + * Orphan detection is necessary (recovers stuck positions) + * But must not affect active positions on same symbol + * User opens position expecting protection, system removes it + * Silent removal - no notification, no alert + - **Prevention Rules:** + 1. NEVER cancel orders without verifying position actually closed + 2. Check Drift position size = 0 before cancelAllOrders() + 3. Store order IDs per trade, cancel specific orders only + 4. Health monitor detects missing orders within 30 seconds + 5. Add grace period for new positions (skip orphan checks <5 min old) + 6. Log CRITICAL alert when orders removed from active position + - **Red Flags Indicating This Bug:** + * Position initially has TP/SL orders + * Orders disappear shortly after opening + * Orphan detection logs around same time + * Multiple close attempts on old position + * cancelAllOrders() logs for symbol + * New position left with no orders + - **Git commit:** [Health monitoring deployed Dec 8, 2025 - detects missing orders] + - **Status:** ⚠️ Health monitor deployed (detects issue), root cause fix pending + 72. **CRITICAL: MFE Data Unit Mismatch - ALWAYS Filter by Date (CRITICAL - Dec 5, 2025):** - **Symptom:** SQL analysis shows "20%+ average MFE" but TP1 (0.6% target) never hits - **Root Cause:** Old Trade records stored MFE/MAE in DOLLARS, new records store PERCENTAGES @@ -3390,6 +3755,10 @@ if (!enabled) { - Verify targets fall within safety bounds (TP1: 0.5-1.5%, TP2: 1.0-3.0%, SL: 0.8-2.0%) - Update Telegram manual trade presets if median ATR changes (currently 0.43 for SOL) 10. **Position Manager changes:** ALWAYS run tests BEFORE deployment, then validate in production + - **CRITICAL (Dec 8, 2025):** Health monitoring system detects PM failures within 30 seconds + - Health checks: `docker logs -f trading-bot-v4 | grep "🏥"` + - Expected: "🏥 Starting Position Manager health monitor (every 30 sec)..." + - If issues: "🚨 CRITICAL: Position Manager not monitoring!" or "🚨 CRITICAL: Position {id} missing SL order" - **STEP 1 - Run tests locally (MANDATORY):** ```bash npm test # Run all 113 tests (takes ~30 seconds)