# Common Pitfalls Reference Documentation > **Last Updated:** December 4, 2025 > **Total Documented:** 72 Pitfalls > **Primary Source:** `.github/copilot-instructions.md` ## Purpose This document is the **comprehensive reference** for all documented pitfalls, bugs, and lessons learned from the Trading Bot v4 project. Each entry represents a real incident that caused financial loss, system instability, or operational issues. **How to Use This Document:** 1. **Before making changes:** Search for related pitfalls to avoid repeating mistakes 2. **When debugging:** Look for symptoms matching your issue 3. **After fixing bugs:** Add new entries to preserve institutional knowledge 4. **Code review:** Verify changes don't reintroduce known issues **Severity Levels:** - 🔴 **CRITICAL** - Financial loss, data corruption, or system failure - ⚠️ **HIGH** - System stability or significant operational impact - 🟡 **MEDIUM** - Performance degradation or UX issues - 🔵 **LOW** - Code quality or minor improvements --- ## Quick Reference Table | # | Severity | Category | Date | Summary | |---|----------|----------|------|---------| | 1 | 🔴 CRITICAL | SDK/Memory | Nov 15, 2025 | Drift SDK memory leak - heap OOM after 10+ hours | | 2 | 🔴 CRITICAL | RPC/Infrastructure | Nov 14, 2025 | Wrong RPC provider (Alchemy) breaks Drift SDK | | 3 | 🟡 MEDIUM | Build/Docker | - | Prisma not generated in Docker | | 4 | 🟡 MEDIUM | Configuration | - | Wrong DATABASE_URL for container vs host | | 5 | 🟡 MEDIUM | Data/Symbols | - | Symbol format mismatch (TradingView → Drift) | | 6 | ⚠️ HIGH | Orders | - | Missing reduce-only flag on exit orders | | 7 | 🟡 MEDIUM | Architecture | - | Singleton violations (DriftClient, Position Manager) | | 8 | 🟡 MEDIUM | Types/Prisma | - | Type errors with Prisma after generate | | 9 | 🟡 MEDIUM | Code Quality | - | Quality score duplication in check-risk and execute | | 10 | ⚠️ HIGH | Configuration | - | TP2-as-Runner configuration confusion | | 11 | 🔴 CRITICAL | P&L Calculation | - | P&L calculation using SDK values incorrectly | | 12 | 🔴 CRITICAL | Transactions | - | Transaction confirmation missing (phantom trades) | | 13 | ⚠️ HIGH | Execution Order | - | Execution order matters (Position Manager before DB) | | 14 | ⚠️ HIGH | Timing | - | New trade grace period (30s for Drift propagation) | | 15 | 🟡 MEDIUM | SDK/Drift | - | Drift minimum position sizes differ from docs | | 16 | 🔴 CRITICAL | Exit Logic | - | Exit reason detection bug (using current price) | | 17 | 🟡 MEDIUM | Cooldown | - | Per-symbol cooldown, not global | | 18 | ⚠️ HIGH | Quality Scoring | - | Timeframe-aware scoring crucial for 5min | | 19 | 🔴 CRITICAL | Trading Logic | - | Price position chasing causes flip-flops | | 20 | 🟡 MEDIUM | TradingView | - | TradingView ADX minimum for 5min charts | | 21 | 🟡 MEDIUM | Types/Prisma | - | Prisma Decimal type handling in raw SQL | | 22 | 🔴 CRITICAL | Trailing Stop | Nov 11, 2025 | ATR-based trailing stop implementation bug | | 23 | 🟡 MEDIUM | Database Schema | - | CreateTradeParams interface sync required | | 24 | 🔴 CRITICAL | SDK/Units | Nov 12, 2025 | Position.size returns tokens not USD | | 25 | 🟡 MEDIUM | Display | Nov 12, 2025 | Leverage display showing global instead of symbol-specific | | 26 | 🟡 MEDIUM | Tracking | Nov 12, 2025 | Indicator version tracking (v5→v6→v7→v8) | | 27 | 🔴 CRITICAL | Race Condition | Nov 15, 2025 | Runner stop loss gap - no protection between TP1 and TP2 | | 28 | 🔴 CRITICAL | Race Condition | Nov 12, 2025 | External closure duplicate updates bug | | 29 | 🔴 CRITICAL | Database | Nov 13, 2025 | Database-First Pattern required | | 30 | ⚠️ HIGH | Network | Nov 13, 2025 | DNS retry logic needed | | 31 | 🔴 CRITICAL | Deployment | Nov 13, 2025 | Declaring fixes "working" before deployment | | 32 | 🔴 CRITICAL | Workflow | Nov 14, 2025 | Phantom trade notification workflow breaks | | 33 | 🔴 CRITICAL | Data Integrity | Nov 15, 2025 | Wrong entry price after orphaned position restoration | | 34 | 🔴 CRITICAL | Monitoring | Nov 15, 2025 | Runner stop loss gap (duplicate of #27) | | 35 | 🔴 CRITICAL | Database | Nov 15, 2025 | Phantom trades need exitReason for cleanup | | 36 | 🔴 CRITICAL | Rate Limits | Nov 15, 2025 | closePosition() missing retry logic causes rate limit storm | | 37 | 🔴 CRITICAL | Ghost Positions | Nov 15, 2025 | Ghost position accumulation from failed DB updates | | 38 | 🟡 MEDIUM | Display | Nov 15, 2025 | Analytics dashboard showing original position size | | 39 | 🔴 CRITICAL | Permissions | Nov 15, 2025 | Settings UI permission error (.env not writable) | | 40 | 🔴 CRITICAL | Ghost Positions | Nov 15-16, 2025 | Ghost position death spiral from skipped validation | | 41 | 🔴 CRITICAL | P&L Calculation | Nov 19, 2025 | Stats API recalculating P&L incorrectly for TP1+runner | | 42 | 🟡 MEDIUM | Notifications | Nov 16, 2025 | Missing Telegram notifications for position closures | | 43 | 🔴 CRITICAL | Trailing Stop | Nov 20, 2025 | Runner trailing stop never activates after TP1 | | 44 | ⚠️ HIGH | DNS | Nov 16, 2025 | Telegram bot DNS resolution failures | | 45 | 🔴 CRITICAL | SDK/Drift | Nov 16, 2025 | Drift SDK position.entryPrice recalculates after partial closes | | 46 | 🔴 CRITICAL | Leverage | Nov 16, 2025 | Drift account leverage must be set in UI, not API | | 47 | 🔴 CRITICAL | Verification | Nov 16, 2025 | Position close verification gap - 6 hours unmonitored | | 48 | 🔴 CRITICAL | P&L Compounding | Nov 16, 2025 | P&L compounding during close verification | | 49 | 🔴 CRITICAL | P&L Compounding | Nov 17, 2025 | P&L exponential compounding in external closure detection | | 50 | 🔴 CRITICAL | Database | Nov 19, 2025 | Database not tracking trades despite successful Drift executions | | 51 | 🔴 CRITICAL | Detection | Nov 19, 2025 | TP1 detection fails when on-chain orders fill fast | | 52 | 🔴 CRITICAL | Exit Logic | Nov 19, 2025 | ADX-based runner SL only applied in one code path | | 53 | 🔴 CRITICAL | Container | Nov 19, 2025 | Container restart kills positions + phantom detection bug | | 54 | 🔴 CRITICAL | Data Integrity | Nov 23, 2025 | MFE/MAE storing dollars instead of percentages | | 55 | 🔴 CRITICAL | Configuration | Nov 19-20, 2025 | Settings UI quality score variable name mismatch / BlockedSignalTracker using wrong price source | | 56 | 🔴 CRITICAL | Ghost Orders | Nov 20-21, 2025 | Ghost orders after external closures + false order count bug | | 57 | 🔴 CRITICAL | P&L Calculation | Nov 20, 2025 | P&L calculation inaccuracy for external closures | | 58 | ⚠️ HIGH | Database | Nov 21, 2025 | 5-Layer Database Protection System implemented | | 59 | 🔴 CRITICAL | Duplicates | Nov 22, 2025 | Layer 2 ghost detection causing duplicate Telegram notifications | | 60 | 🔴 CRITICAL | Race Condition | Nov 23, 2025 | Stale array snapshot in monitoring loop causes duplicate processing | | 61 | 🔴 CRITICAL | P&L Compounding | Nov 24, 2025 | P&L compounding STILL happening despite all guards | | 62 | 🔴 CRITICAL | Quality Check | Nov 24-27, 2025 | Adaptive leverage not working / Execute endpoint bypassing quality threshold | | 63 | ⚠️ HIGH | Feature | Nov 30, 2025 | Smart Entry Validation System - Block & Watch deployed | | 64 | 🔴 CRITICAL | Cluster | Dec 1, 2025 | EPYC Cluster SSH Timeout - nested hop requires longer timeouts | | 65 | 🔴 CRITICAL | Cluster | Dec 1, 2025 | Distributed Worker Quality Filter - dict vs callable | | 66 | 🔴 CRITICAL | Smart Entry | Dec 1, 2025 | Smart Entry Validation Queue wrong price display | | 67 | 🔴 CRITICAL | Race Condition | Dec 2, 2025 | Ghost detection race condition causing duplicate notifications with P&L compounding | | 68 | 🔴 CRITICAL | Smart Entry | Dec 3, 2025 | Smart Entry using webhook percentage as signal price | | 69 | 🟡 MEDIUM | Configuration | Dec 3, 2025 | Direction-specific leverage thresholds not explicit in code | | 70 | 🔴 CRITICAL | Smart Entry | Dec 3, 2025 | Smart Validation Queue rejected by execute endpoint | | 71 | 🔴 CRITICAL | Revenge System | Dec 3, 2025 | Revenge system missing external closure integration | | 72 | 🔴 CRITICAL | Telegram | Dec 4, 2025 | Telegram webhook conflicts with polling bot | --- ## Category Index ### 🔴 P&L Calculation Errors - [#11](#pitfall-11-pl-calculation-critical) - P&L calculation using SDK values incorrectly - [#41](#pitfall-41-stats-api-recalculating-pl-incorrectly-critical---fixed-nov-19-2025) - Stats API recalculating P&L incorrectly - [#48](#pitfall-48-pl-compounding-during-close-verification-critical---fixed-nov-16-2025) - P&L compounding during close verification - [#49](#pitfall-49-pl-exponential-compounding-in-external-closure-detection-critical---fixed-nov-17-2025) - P&L exponential compounding - [#54](#pitfall-54-mfemae-storing-dollars-instead-of-percentages-critical---fixed-nov-23-2025) - MFE/MAE storing dollars instead of percentages - [#57](#pitfall-57-pl-calculation-inaccuracy-for-external-closures-critical---fixed-nov-20-2025) - P&L calculation inaccuracy for external closures - [#61](#pitfall-61-pl-compounding-still-happening-despite-all-guards-critical---under-investigation-nov-24-2025) - P&L compounding STILL happening ### 🔴 Race Conditions & Duplicates - [#27](#pitfall-27-runner-stop-loss-gap---no-protection-between-tp1-and-tp2-critical---fixed-nov-15-2025) - Runner stop loss gap - no protection between TP1 and TP2 - [#28](#pitfall-28-external-closure-duplicate-updates-bug-critical---fixed-nov-12-2025) - External closure duplicate updates - [#59](#pitfall-59-layer-2-ghost-detection-causing-duplicate-telegram-notifications-critical---fixed-nov-22-2025) - Layer 2 ghost detection duplicates - [#60](#pitfall-60-stale-array-snapshot-in-monitoring-loop-critical---fixed-nov-23-2025) - Stale array snapshot duplicates - [#67](#pitfall-67-ghost-detection-race-condition-critical---fixed-dec-2-2025) - Ghost detection race condition ### 🔴 SDK/API Integration - [#1](#pitfall-1-drift-sdk-memory-leak-critical---fixed-nov-15-2025) - Drift SDK memory leak - [#2](#pitfall-2-wrong-rpc-provider-critical---investigation-complete-nov-14-2025) - Wrong RPC provider (Alchemy) - [#12](#pitfall-12-transaction-confirmation-critical) - Transaction confirmation missing - [#24](#pitfall-24-positionsize-tokens-vs-usd-bug-critical---fixed-nov-12-2025) - Position.size tokens vs USD - [#36](#pitfall-36-closeposition-missing-retry-logic-critical---fixed-nov-15-2025) - closePosition() missing retry logic - [#45](#pitfall-45-drift-sdk-positionentryprice-recalculates-critical---fixed-nov-16-2025) - position.entryPrice recalculates after partial closes ### 🔴 Database Operations - [#29](#pitfall-29-database-first-pattern-critical---fixed-nov-13-2025) - Database-First Pattern required - [#35](#pitfall-35-phantom-trades-need-exitreason-critical---fixed-nov-15-2025) - Phantom trades need exitReason - [#37](#pitfall-37-ghost-position-accumulation-critical---fixed-nov-15-2025) - Ghost position accumulation - [#50](#pitfall-50-database-not-tracking-trades-resolved---nov-19-2025) - Database not tracking trades - [#58](#pitfall-58-5-layer-database-protection-system-implemented---nov-21-2025) - 5-Layer Database Protection System ### 🔴 Configuration & Settings - [#55](#pitfall-55-configuration-issues-critical---fixed-nov-19-20-2025) - Settings UI quality score variable name mismatch - [#62](#pitfall-62-adaptive-leverage-and-quality-bypass-critical---fixed-nov-24-27-2025) - Adaptive leverage / Execute endpoint bypassing quality threshold ### 🔴 Deployment & Verification - [#31](#pitfall-31-declaring-fixes-working-before-deployment-critical---nov-13-2025) - Declaring fixes "working" before deployment - [#47](#pitfall-47-position-close-verification-gap-critical---fixed-nov-16-2025) - Position close verification gap - 6 hours unmonitored ### 🔴 Smart Entry & Validation - [#63](#pitfall-63-smart-entry-validation-system-deployed---nov-30-2025) - Smart Entry Validation System - [#66](#pitfall-66-smart-entry-wrong-price-display-critical---fixed-dec-1-2025) - Smart Entry wrong price display - [#68](#pitfall-68-smart-entry-using-webhook-percentage-critical---fixed-dec-3-2025) - Smart Entry using webhook percentage - [#70](#pitfall-70-smart-validation-queue-rejected-critical---fixed-dec-3-2025) - Smart Validation Queue rejected by execute ### ⚠️ Ghost Positions & Orders - [#40](#pitfall-40-ghost-position-death-spiral-critical---fixed-nov-15-16-2025) - Ghost position death spiral - [#56](#pitfall-56-ghost-orders-after-external-closures-critical---fixed-nov-20-21-2025) - Ghost orders after external closures ### ⚠️ Network & Infrastructure - [#30](#pitfall-30-dns-retry-logic-high---nov-13-2025) - DNS retry logic - [#44](#pitfall-44-telegram-bot-dns-resolution-high---fixed-nov-16-2025) - Telegram bot DNS resolution - [#64](#pitfall-64-epyc-cluster-ssh-timeout-critical---fixed-dec-1-2025) - EPYC Cluster SSH timeout - [#65](#pitfall-65-distributed-worker-quality-filter-critical---fixed-dec-1-2025) - Distributed Worker dict vs callable ### ⚠️ Trailing Stop & Exit Logic - [#22](#pitfall-22-atr-based-trailing-stop-implementation-critical---nov-11-2025) - ATR-based trailing stop implementation - [#43](#pitfall-43-runner-trailing-stop-never-activates-critical---fixed-nov-20-2025) - Runner trailing stop never activates - [#51](#pitfall-51-tp1-detection-fails-critical---fixed-nov-19-2025) - TP1 detection fails on-chain - [#52](#pitfall-52-adx-based-runner-sl-critical---fixed-nov-19-2025) - ADX-based runner SL one code path --- ## Detailed Pitfall Entries ### Pitfall #1: Drift SDK Memory Leak (🔴 CRITICAL - Fixed Nov 15, 2025, Enhanced Nov 24, 2025) **Symptom:** JavaScript heap out of memory after 10+ hours runtime, Telegram bot timeouts (60s) **Root Cause:** Drift SDK accumulates WebSocket subscriptions over time without cleanup **Real Incident:** - Thousands of `accountUnsubscribe error: readyState was 2 (CLOSING)` in logs - Heap growth: Normal ~200MB → 4GB+ after 10 hours → OOM crash **Impact:** System crashes after extended uptime, requires manual container restart **Fix Applied:** - **File:** `lib/monitoring/drift-health-monitor.ts` - **Implementation:** Smart error-based health monitoring replaces blind timer - `interceptWebSocketErrors()` patches console.error to catch SDK WebSocket errors - 30-second sliding window: Only restarts if 50+ errors in 30 seconds - Container restart via flag: Writes `/tmp/trading-bot-restart.flag` for watch-restart.sh - **API:** `GET /api/drift/health` - Check error count and health status - **Commit:** Enhanced Nov 24, 2025 **Code Reference:** ```typescript // lib/monitoring/drift-health-monitor.ts interceptWebSocketErrors() // Patches console.error if (errorsInWindow > 50) { writeRestartFlag() // Triggers container restart } ``` **Prevention:** Monitor for `🏥 Drift health monitor started` and error threshold logs **Lesson Learned:** Smart, reactive monitoring is better than blind timers. Only restart when actual problems occur, not on a schedule. --- ### Pitfall #2: Wrong RPC Provider (🔴 CRITICAL - Investigation Complete Nov 14, 2025) **Symptom:** Trades fail, duplicate closes, Position Manager loses tracking, database save failures **Root Cause:** Alchemy's rate limiting breaks Drift SDK's burst subscription pattern during initialization **Real Incident (Nov 14, 21:14 CET):** - Created diagnostic endpoint `/api/testing/drift-init` - Alchemy: 17-71 subscription errors EVERY init (49 avg over 5 runs), 1644ms avg init time - Helius: 0 subscription errors EVERY init, 800ms avg init time **Impact:** Complete system failure when using wrong RPC provider **Why Alchemy Fails:** - Drift SDK subscribes to 30-50+ accounts simultaneously during init (burst pattern) - Alchemy's CUPS enforcement rate limits these burst requests - Drift SDK does NOT retry failed subscriptions - SDK reports "initialized successfully" but with incomplete subscription set - Error: `"Received JSON-RPC error calling accountSubscribe"` **Fix Applied:** - **Use Helius RPC** (https://mainnet.helius-rpc.com/?api-key=...) - Retry logic: 5s exponential backoff for rate limits - **Documentation:** `docs/ALCHEMY_RPC_INVESTIGATION_RESULTS.md` **Code Reference:** ```bash # Test yourself curl 'http://localhost:3001/api/testing/drift-init?rpc=alchemy' ``` **Prevention:** ALWAYS use Helius RPC. Do not use Alchemy for Drift SDK. **Lesson Learned:** Documentation doesn't always reflect reality. Test with real infrastructure before trusting provider claims. --- ### Pitfall #3: Prisma Not Generated in Docker (🟡 MEDIUM) **Symptom:** Build fails with Prisma client errors **Root Cause:** Must run `npx prisma generate` in Dockerfile BEFORE `npm run build` **Fix Applied:** Add `RUN npx prisma generate` before build step in Dockerfile --- ### Pitfall #4: Wrong DATABASE_URL (🟡 MEDIUM) **Symptom:** Database connection failures **Root Cause:** Container runtime needs `trading-bot-postgres` (container name), Prisma CLI from host needs `localhost:5432` **Fix Applied:** Use correct hostname based on context: - Container: `postgresql://postgres:password@trading-bot-postgres:5432/trading_bot_v4` - Host CLI: `postgresql://postgres:password@localhost:5432/trading_bot_v4` --- ### Pitfall #5: Symbol Format Mismatch (🟡 MEDIUM) **Symptom:** Drift API rejects orders, symbol not found errors **Root Cause:** TradingView sends "SOLUSDT" but Drift requires "SOL-PERP" **Fix Applied:** Always normalize with `normalizeTradingViewSymbol()` before calling Drift - **File:** `config/trading.ts` - Applies to ALL endpoints including `/api/trading/close` --- ### Pitfall #6: Missing Reduce-Only Flag (⚠️ HIGH) **Symptom:** Exit orders accidentally open new positions instead of closing **Root Cause:** Exit orders without `reduceOnly: true` can open new positions **Fix Applied:** All TP/SL orders MUST include `reduceOnly: true` ```typescript const orderParams = { reduceOnly: true, // CRITICAL for TP/SL orders // ... other params } ``` --- ### Pitfall #7: Singleton Violations (🟡 MEDIUM) **Symptom:** Connection issues, state inconsistencies, multiple WebSocket connections **Root Cause:** Creating multiple DriftClient or Position Manager instances **Fix Applied:** Always use getter functions: ```typescript const driftService = await initializeDriftService() // NOT: new DriftService() const positionManager = getPositionManager() // NOT: new PositionManager() const prisma = getPrismaClient() // NOT: new PrismaClient() ``` --- ### Pitfall #8: Prisma Type Errors (🟡 MEDIUM) **Symptom:** TypeScript compilation fails with Prisma types **Root Cause:** Trade type from Prisma only available AFTER `npx prisma generate` **Fix Applied:** Run `npx prisma generate` after any schema changes --- ### Pitfall #9: Quality Score Duplication (🟡 MEDIUM) **Symptom:** Inconsistent quality scoring between endpoints **Root Cause:** Signal quality calculation exists in BOTH `check-risk` and `execute` endpoints **Fix Applied:** Keep logic synchronized between both endpoints when making changes --- ### Pitfall #10: TP2-as-Runner Configuration (⚠️ HIGH) **Symptom:** Confusion about runner size and TP2 behavior **Root Cause:** `takeProfit2SizePercent: 0` means "TP2 activates trailing stop, no position close" **Fix Applied:** - `TAKE_PROFIT_2_PERCENT=0.7` sets TP2 trigger price - `TAKE_PROFIT_2_SIZE_PERCENT` should be 0 for runner system - Runner = 100% - TAKE_PROFIT_1_SIZE_PERCENT (default 40%) --- ### Pitfall #11: P&L Calculation Critical (🔴 CRITICAL) **Symptom:** Incorrect P&L values in database and analytics **Root Cause:** Using SDK values instead of actual entry vs exit price calculation **Fix Applied:** ```typescript const profitPercent = this.calculateProfitPercent(trade.entryPrice, exitPrice, trade.direction) const actualRealizedPnL = (closedSizeUSD * profitPercent) / 100 trade.realizedPnL += actualRealizedPnL // NOT: result.realizedPnL from SDK ``` --- ### Pitfall #12: Transaction Confirmation Critical (🔴 CRITICAL) **Symptom:** "Phantom trades" - SDK returns signatures for transactions that never execute **Root Cause:** Both `openPosition()` AND `closePosition()` must call `connection.confirmTransaction()` **Fix Applied:** ```typescript const txSig = await driftClient.placePerpOrder(orderParams) console.log('⏳ Confirming transaction on-chain...') const connection = driftService.getConnection() const confirmation = await connection.confirmTransaction(txSig, 'confirmed') if (confirmation.value.err) { throw new Error(`Transaction failed: ${JSON.stringify(confirmation.value.err)}`) } console.log('✅ Transaction confirmed on-chain') ``` --- ### Pitfall #13: Execution Order Matters (⚠️ HIGH) **Symptom:** Race conditions where monitoring starts before trade exists in database **Root Cause:** Position Manager added before database save **Fix Applied:** Order MUST be: 1. Open position + place exit orders 2. Save to database (`createTrade()`) 3. Add to Position Manager (`positionManager.addTrade()`) --- ### Pitfall #14: New Trade Grace Period (⚠️ HIGH) **Symptom:** New positions immediately detected as "closed externally" and cancelled **Root Cause:** Drift positions take 5-10 seconds to propagate after opening **Fix Applied:** Position Manager skips "external closure" detection for trades <30 seconds old --- ### Pitfall #15: Drift Minimum Position Sizes (🟡 MEDIUM) **Symptom:** Orders rejected for being too small **Root Cause:** Actual minimums differ from documentation: - SOL-PERP: 0.1 SOL (~$5-15) - ETH-PERP: 0.01 ETH (~$38-40) - BTC-PERP: 0.0001 BTC (~$10-12) **Fix Applied:** Calculate `minOrderSize × currentPrice` must exceed Drift's $4 minimum. Add buffer. --- ### Pitfall #16: Exit Reason Detection Bug (🔴 CRITICAL) **Symptom:** Profitable trades mislabeled as "SL" exits **Root Cause:** Position Manager using current price to determine exit reason, but on-chain orders filled at different price **Fix Applied:** Use `trade.tp1Hit` / `trade.tp2Hit` flags and realized P&L to correctly identify exit trigger --- ### Pitfall #17: Per-Symbol Cooldown (🟡 MEDIUM) **Symptom:** ETH trade incorrectly blocking SOL trade **Root Cause:** Cooldown was global, not per-symbol **Fix Applied:** Each coin (SOL/ETH/BTC) has independent cooldown timer via `getLastTradeTimeForSymbol(symbol)` --- ### Pitfall #18: Timeframe-Aware Scoring Crucial (⚠️ HIGH) **Symptom:** Valid 5min breakouts blocked as "low quality" **Root Cause:** Signal quality thresholds not adjusted for 5min vs higher timeframes - 5min: ADX 12-22 healthy, ATR 0.2-0.7% - Daily: ADX 18-30 healthy, ATR 0.4%+ **Fix Applied:** Always pass `timeframe` parameter from TradingView alerts to `scoreSignalQuality()` --- ### Pitfall #19: Price Position Chasing (🔴 CRITICAL) **Symptom:** Rapid flip-flop losses **Root Cause:** Opening longs at 90%+ range or shorts at <10% range **Real Incident:** Overnight flip-flop losses all had price position 9-94% **Fix Applied:** Quality scoring now penalizes -15 to -30 points for range extremes --- ### Pitfall #20: TradingView ADX Minimum (🟡 MEDIUM) **Symptom:** Too many signals blocked or too many low-quality signals passing **Root Cause:** TradingView ADX filter should be 15 for 5min (not 20+) **Fix Applied:** Set ADX ≥15 in TradingView alerts for 5min charts. Bot's quality scoring provides second-layer filtering. --- ### Pitfall #21: Prisma Decimal Type Handling (🟡 MEDIUM) **Symptom:** Frontend errors with `.toFixed()` on undefined **Root Cause:** Raw SQL queries return Prisma `Decimal` objects, not plain numbers **Fix Applied:** ```typescript // Use `any` type for numeric fields in $queryRaw results const stat: { total_pnl: any } = await prisma.$queryRaw`...` // Convert with Number() before returning to frontend totalPnL: Number(stat.total_pnl) || 0 ``` --- ### Pitfall #22: ATR-Based Trailing Stop Implementation (🔴 CRITICAL - Nov 11, 2025) **Symptom:** Trades with +7-9% MFE exited for losses **Root Cause:** Runner system was using FIXED 0.3% trailing instead of ATR-based **Real Incident:** At $168 SOL, 0.3% = $0.50 wiggle room - too tight **Fix Applied:** ```typescript trailingDistancePercent = (atrAtEntry / currentPrice * 100) × trailingStopAtrMultiplier ``` **Configuration:** - `TRAILING_STOP_ATR_MULTIPLIER=1.5` - `MIN=0.25%`, `MAX=0.9%` - `ACTIVATION=0.5%` **Result:** 0.45% ATR × 1.5 = 0.675% trail ($1.13 vs $0.50 = 2.26x more room) **Documentation:** `ATR_TRAILING_STOP_FIX.md` --- ### Pitfall #23: CreateTradeParams Interface Sync (🟡 MEDIUM) **Symptom:** TypeScript build fails when endpoint passes field not in interface **Root Cause:** New database fields added to Trade model but not to `CreateTradeParams` interface **Fix Applied:** When adding new fields: 1. Add to interface in `lib/database/trades.ts` 2. Add to Prisma create data object in `createTrade()` function --- ### Pitfall #24: Position.size Tokens vs USD Bug (🔴 CRITICAL - Fixed Nov 12, 2025) **Symptom:** Position Manager detects false TP1 hits, moves SL to breakeven prematurely **Root Cause:** `lib/drift/client.ts` returns `position.size` as BASE ASSET TOKENS (12.28 SOL), not USD ($1,950) **Real Incident:** Comparing tokens (12.28) directly to USD ($1,950) → "99.4% reduction" → FALSE TP1! **Fix Applied:** ```typescript // In Position Manager (lines 322, 519, 558, 591) const positionSizeUSD = Math.abs(position.size) * currentPrice // Now compare USD to USD if (positionSizeUSD < trade.currentSize * 0.95) { // Actual 5%+ reduction detected } ``` **Impact:** Without this fix, TP1 never triggers correctly, SL moves at wrong times, runner system fails --- ### Pitfall #25: Leverage Display Bug (🟡 MEDIUM - Fixed Nov 12, 2025) **Symptom:** Telegram notifications showing "⚡ Leverage: 10x" when actual position uses 15x **Root Cause:** API response returning `config.leverage` (global default) instead of symbol-specific value **Fix Applied:** ```typescript const { size, leverage, enabled } = getPositionSizeForSymbol(driftSymbol, config) // Return symbol-specific leverage leverage: leverage, // NOT: config.leverage ``` --- ### Pitfall #26: Indicator Version Tracking (🟡 MEDIUM - Nov 12, 2025+) **Symptom:** Unable to compare performance between TradingView strategies **Root Cause:** No tracking of which indicator generated the signal **Fix Applied:** Database field `indicatorVersion` tracks: - v5: Buy/Sell Signal (pre-Nov 12) - v6: HalfTrend + BarColor (Nov 12-18) - v7: v6 with toggles (deprecated) - v8: Money Line Sticky Trend (Nov 18+) - v9: Money Line with Momentum Filter (Nov 26+) --- ### Pitfall #27: Runner Stop Loss Gap - No Protection Between TP1 and TP2 (🔴 CRITICAL - Fixed Nov 15, 2025) **Symptom:** Runner position remained open despite price moving far past stop loss level **Root Cause:** Position Manager only checked stop loss BEFORE TP1 (line 877), creating a protection gap **Real Incident:** 1. SHORT opened, TP1 hit at 70% close (runner = 30% remaining) 2. Runner had stop loss at profit-lock level (+0.5%) 3. Price moved past stop loss → NO CHECK RAN (tp1Hit = true, so SL check skipped) 4. Runner exposed to unlimited loss for hours during TP1→TP2 window **Fix Applied:** ```typescript // Added explicit runner stop loss check at line ~881: if (trade.tp1Hit && !trade.tp2Hit && this.shouldStopLoss(currentPrice, trade)) { console.log(`🔴 RUNNER STOP LOSS: ${trade.symbol}`) await this.executeExit(trade, 100, 'SL', currentPrice) return } ``` **Lesson Learned:** Every conditional branch in risk management MUST have explicit stop loss checks - never assume "it'll get caught somewhere" --- ### Pitfall #28: External Closure Duplicate Updates Bug (�� CRITICAL - Fixed Nov 12, 2025) **Symptom:** Trades showing 7-8x larger losses than actual ($58 loss when Drift shows $7 loss) **Root Cause:** Position Manager monitoring loop re-processes external closures multiple times before trade removed from activeTrades Map **Real Incident:** 1. Trade closed externally at -$7.98 2. Position Manager detects closure, calculates P&L → -$7.50 in DB 3. Trade still in Map (removal async), loop runs again 4. Accumulates P&L: -$7.50 + -$7.50 = -$15.00 5. Repeats 8 times → final -$58.43 **Fix Applied:** ```typescript // BEFORE (BROKEN): await updateTradeExit({ ... }) await this.removeTrade(trade.id) // Too late! // AFTER (FIXED): this.activeTrades.delete(trade.id) // Remove FIRST await updateTradeExit({ ... }) // Then update DB ``` **Commit:** Fixed Nov 12, 2025 --- ### Pitfall #29: Database-First Pattern (🔴 CRITICAL - Fixed Nov 13, 2025) **Symptom:** Positions opened on Drift with NO database record, NO Position Manager tracking, NO TP/SL protection **Root Cause:** Execute endpoint saved to database AFTER adding to Position Manager, with silent error catch **Real Incident:** Unprotected position opened, database save failed silently, Position Manager never tracked it **Fix Applied:** ```typescript // CRITICAL: Save to database FIRST before adding to Position Manager try { await createTrade({...}) } catch (dbError) { console.error('❌ CRITICAL: Failed to save trade to database:', dbError) return NextResponse.json({ success: false, error: 'Database save failed - position unprotected', message: `CLOSE POSITION MANUALLY IMMEDIATELY. Transaction: ${openResult.transactionSignature}`, }, { status: 500 }) } // ONLY add to Position Manager if database save succeeded await positionManager.addTrade(activeTrade) ``` **Documentation:** `CRITICAL_INCIDENT_UNPROTECTED_POSITION.md` --- ### Pitfall #30: DNS Retry Logic (⚠️ HIGH - Nov 13, 2025) **Symptom:** Trading bot fails with "fetch failed" errors when DNS resolution temporarily fails **Root Cause:** `EAI_AGAIN` errors are transient DNS issues that resolve in seconds **Fix Applied:** Automatic retry in `lib/drift/client.ts`: ```typescript // Detects: fetch failed, EAI_AGAIN, ENOTFOUND, ETIMEDOUT // Retries up to 3 times with 2s delay await this.retryOperation(async () => { // Initialize Drift SDK, subscribe, get user account }, 3, 2000, 'Drift initialization') ``` **Documentation:** `docs/DNS_RETRY_LOGIC.md` --- ### Pitfall #31: Declaring Fixes "Working" Before Deployment (🔴 CRITICAL - Nov 13, 2025) **Symptom:** AI says "position is protected" when container still running old code **Root Cause:** Conflating "code committed to git" with "code running in production" **Real Incident:** Fix committed 15:56, declared "working" at 19:42, but container started 15:06 (old code) **Verification Required:** ```bash # ALWAYS check before declaring fix deployed: docker logs trading-bot-v4 | grep "Server starting" | head -1 # Compare container start time to git commit timestamp # If container older: FIX NOT DEPLOYED ``` **Rule:** NEVER say "fixed", "working", "protected", or "deployed" without verifying container restart timestamp --- ### Pitfall #32: Phantom Trade Notification Workflow Breaks (🔴 CRITICAL - Nov 14, 2025) **Symptom:** Phantom trade detected, position opened, but n8n workflow stops. User NOT notified. **Root Cause:** Execute endpoint returned HTTP 500 when phantom detected, causing n8n chain to halt **Fix Applied:** Auto-close phantom trades immediately + return HTTP 200 with warning: ```typescript return NextResponse.json({ success: true, warning: 'Phantom trade detected and auto-closed', isPhantom: true, message: '[Full notification text]', phantomDetails: {...} }) ``` **Database tracking:** `status='phantom'`, `exitReason='manual'` --- ### Pitfall #33: Wrong Entry Price After Orphaned Position Restoration (🔴 CRITICAL - Fixed Nov 15, 2025) **Symptom:** Position Manager tracking wrong entry price after container restart **Root Cause:** Startup validation restored orphaned position using OLD database entry price instead of querying Drift **Real Incident:** DB showed $141.51, Drift showed $141.31 actual entry → 0.14% SL placement error **Fix Applied:** Query Drift SDK for actual entry price during orphaned position restoration: ```typescript await prisma.trade.update({ data: { entryPrice: position.entryPrice, // CRITICAL: Use Drift's actual entry price positionSizeUSD: positionSizeUSD, } }) ``` --- ### Pitfall #35: Phantom Trades Need exitReason (🔴 CRITICAL - Fixed Nov 15, 2025) **Symptom:** Position Manager keeps restoring phantom trade on every restart **Root Cause:** Phantom auto-closure sets `status='phantom'` but leaves `exitReason=NULL` **Real Incident:** Phantom trade caused 232% size mismatch, hundreds of false alerts **Fix Applied:** MUST set exitReason when auto-closing phantoms: ```typescript await updateTradeExit({ tradeId: trade.id, exitPrice: currentPrice, exitReason: 'manual', // CRITICAL: Must set exitReason for cleanup status: 'phantom' }) ``` --- ### Pitfall #36: closePosition() Missing Retry Logic (🔴 CRITICAL - Fixed Nov 15, 2025) **Symptom:** Position Manager tries to close, gets 429 error, retries EVERY 2 SECONDS → 100+ failed attempts **Root Cause:** `placeExitOrders()` had retry wrapper but `closePosition()` did NOT **Real Incident:** 100+ "❌ Failed to close position: 429" + compounding P&L **Fix Applied:** Wrapped closePosition() with retryWithBackoff(): ```typescript const txSig = await retryWithBackoff(async () => { return await driftClient.placePerpOrder(orderParams) }, 3, 8000) // 8s base delay, 3 max retries (8s → 16s → 32s) ``` --- ### Pitfall #37: Ghost Position Accumulation (🔴 CRITICAL - Fixed Nov 15, 2025) **Symptom:** Position Manager tracking 4+ positions when database shows only 1 open trade **Root Cause:** Database has `exitReason IS NULL` for positions actually closed on Drift **Real Incident:** 4+ ghosts → massive rate limiting, "vanishing orders" **Fix Applied:** Periodic Drift position validation: ```typescript private scheduleValidation(): void { this.validationInterval = setInterval(async () => { await this.validatePositions() }, 5 * 60 * 1000) } ``` --- ### Pitfall #38: Analytics Dashboard Wrong Size (🟡 MEDIUM - Fixed Nov 15, 2025) **Symptom:** Analytics page displays $42.54 when actual runner is $12.59 after TP1 **Root Cause:** API returns `trade.positionSizeUSD` (original) not runner size **Fix Applied:** Check Position Manager state for open positions: ```typescript const currentSize = configSnapshot?.positionManagerState?.currentSize const displaySize = trade.exitReason === null && currentSize ? currentSize : trade.positionSizeUSD ``` --- ### Pitfall #40: Ghost Position Death Spiral (🔴 CRITICAL - Fixed Nov 15-16, 2025) **Symptom:** Container crashes from cascading ghost detection failures **Root Cause:** Position validation skipped during death spiral recovery, creating more ghosts **Fix Applied:** Never skip validation during recovery operations --- ### Pitfall #41: Stats API Recalculating P&L Incorrectly (🔴 CRITICAL - Fixed Nov 19, 2025) **Symptom:** Analytics showing wrong P&L for trades with TP1+runner **Root Cause:** Stats API recalculating P&L from partial position data **Fix Applied:** Use stored `realizedPnL` directly, don't recalculate --- ### Pitfall #43: Runner Trailing Stop Never Activates (🔴 CRITICAL - Fixed Nov 20, 2025) **Symptom:** Runner position sits without trailing stop after TP1 **Root Cause:** Trailing stop activation logic only ran in one code path **Fix Applied:** Ensure trailing stop activates in all TP1 detection paths --- ### Pitfall #44: Telegram Bot DNS Resolution (⚠️ HIGH - Fixed Nov 16, 2025) **Symptom:** Telegram notifications fail intermittently **Root Cause:** DNS resolution failures for api.telegram.org **Fix Applied:** Retry logic for Telegram API calls --- ### Pitfall #45: Drift SDK position.entryPrice Recalculates (🔴 CRITICAL - Fixed Nov 16, 2025) **Symptom:** Entry price changes after partial closes **Root Cause:** Drift SDK calculates `position.entryPrice` from `quoteAssetAmount / baseAssetAmount` **Impact:** After TP1 closes 75%, remaining 25% has "new" entry price **Fix Applied:** Store and use original entry price from trade record, not SDK --- ### Pitfall #46: 100% Position Sizing InsufficientCollateral (🔴 CRITICAL - Fixed Nov 16, 2025) **Symptom:** Bot gets InsufficientCollateral errors when Drift UI can open same size **Root Cause:** Drift's margin calculation includes fees, slippage buffers **Real Incident:** $85.55 collateral, bot tries 100% → rejected, shortage: $0.03 **Fix Applied:** ```typescript if (configuredSize >= 100) { percentDecimal = 0.99 console.log(`⚠️ Applying 99% safety buffer for 100% position`) } ``` **Commit:** 7129cbf --- ### Pitfall #47: Position Close Verification Gap (🔴 CRITICAL - Fixed Nov 16, 2025) **Symptom:** Close transaction confirmed, database marked "closed", but position stayed open 6+ hours **Root Cause:** Transaction confirmation ≠ Drift internal state updated immediately (5-10s delay) **Real Incident:** Trailing stop triggered 02:51, position stayed open until 08:51 restart **Fix Applied:** 2-layer verification: ```typescript if (params.percentToClose === 100) { await cancelAllOrders(params.symbol) console.log('⏳ Waiting 5s for Drift state to propagate...') await new Promise(resolve => setTimeout(resolve, 5000)) const verifyPosition = await driftService.getPosition(marketConfig.driftMarketIndex) if (verifyPosition && Math.abs(verifyPosition.size) >= 0.01) { console.error(`🔴 CRITICAL: Close confirmed BUT position still exists!`) return { ...result, needsVerification: true } } } ``` **Commit:** c607a66 --- ### Pitfall #48: P&L Compounding During Close Verification (🔴 CRITICAL - Fixed Nov 16, 2025) **Symptom:** P&L accumulates during the 5-10s verification wait **Root Cause:** Monitoring loop continues during verification, detecting "external closure" multiple times **Fix Applied:** `closingInProgress` flag: ```typescript if ((result as any).needsVerification) { trade.closingInProgress = true trade.closeConfirmedAt = Date.now() console.log(`🔒 Marked as closing in progress - external closure detection disabled`) return } // Skip external closure check if closingInProgress if ((position === null || position.size === 0) && !trade.closingInProgress) { // ... handle external closure } ``` **Related:** Pitfalls #27, #49 --- ### Pitfall #49: P&L Exponential Compounding in External Closure Detection (🔴 CRITICAL - Fixed Nov 17, 2025) **Symptom:** Database P&L shows 15-20× actual value ($92.46 when Drift shows $6.00) **Root Cause:** `trade.realizedPnL` was being mutated during each external closure detection cycle **Real Incident (Nov 17, 13:54 CET):** - SOL-PERP SHORT closed by on-chain orders - Actual P&L: ~$6.00, Database recorded: $92.46 (15.4× too high) - Rate limiting caused 15+ detection cycles → $6 → $12 → $24 → $48 → $96 **Fix Applied:** ```typescript // DON'T mutate trade.realizedPnL - causes compounding! // trade.realizedPnL = totalRealizedPnL ← REMOVED // Use local variable for DB update await updateTradeExit({ realizedPnL: totalRealizedPnL, // Use local variable }) ``` **Commit:** 6156c0f **Lesson Learned:** In monitoring loops, NEVER mutate shared state during calculation phases. Calculate locally, update shared state ONCE at the end. --- ### Pitfall #50: Database Not Tracking Trades (🔴 CRITICAL - RESOLVED Nov 19, 2025) **Symptom:** Drift UI shows 6 trades, database shows only 3 trades **Root Cause:** P&L compounding bug (#49) - in-memory object with stale/accumulated values **Fix Applied:** Calculate P&L from immutable source values (entry/exit prices), never from in-memory fields --- ### Pitfall #51: TP1 Detection Fails When On-Chain Orders Fill Fast (🔴 CRITICAL - Fixed Nov 19, 2025) **Symptom:** TP1 order fills, but database records exitReason as "SL" instead of "TP1" **Root Cause:** Position Manager detects closure AFTER both TP1 and runner already closed on-chain **Real Incident:** LONG opened, TP1+runner closed within 7 minutes, `trade.tp1Hit = false` **Fix Applied:** Simple percentage-based exit reason: ```typescript if (runnerProfitPercent > 0.3) { if (runnerProfitPercent >= 1.2) { exitReason = 'TP2' // Large profit (>1.2%) } else { exitReason = 'TP1' // Moderate profit (0.3-1.2%) } } else { exitReason = 'SL' // Negative or tiny profit (<0.3%) } ``` **Commit:** de57c96 --- ### Pitfall #52: ADX-Based Runner SL Only Applied in One Code Path (🔴 CRITICAL - Fixed Nov 19, 2025) **Symptom:** TP1 fills via on-chain order, runner gets breakeven SL instead of ADX-based positioning **Root Cause:** Two TP1 detection paths, only one had ADX logic **Fix Applied:** Added ADX-based runner SL to on-chain fill detection path (lines 607-642) **Commits:** b2cb6a3, 66b2922 --- ### Pitfall #53: Container Restart Kills Positions + Phantom Detection Bug (🔴 CRITICAL - Fixed Nov 19, 2025) **Two bugs from container restart:** **Bug 1: Startup order restore failure** - Wrong database field names (`takeProfit1OrderTx` vs correct `tp1OrderTx`) - Fix: Use correct field names **Bug 2: Phantom detection killing runners** - Runners (40% remaining) flagged as phantom - Fix: Check `!trade.tp1Hit` before phantom detection: ```typescript const wasPhantom = !trade.tp1Hit && trade.currentSize > 0 && (trade.currentSize / trade.positionSize) < 0.5 ``` **Commit:** eccecf7 --- ### Pitfall #54: MFE/MAE Storing Dollars Instead of Percentages (🔴 CRITICAL - Fixed Nov 23, 2025) **Symptom:** Database showing maxFavorableExcursion = 64.08% when TradingView showed 0.48% **Root Cause:** Position Manager storing DOLLAR amounts instead of PERCENTAGES **Real Incident:** 133× inflation (64.08% stored vs 0.48% actual) **Fix Applied:** ```typescript // BEFORE (BROKEN): if (currentPnLDollars > trade.maxFavorableExcursion) { trade.maxFavorableExcursion = currentPnLDollars // Storing $64.08 // AFTER (FIXED): if (profitPercent > trade.maxFavorableExcursion) { trade.maxFavorableExcursion = profitPercent // Storing 0.48% ``` **Commit:** 6255662 **Lesson Learned:** Always verify data storage units match schema expectations. Comments don't override schema. --- ### Pitfall #55: Configuration Issues (🔴 CRITICAL - Fixed Nov 19-20, 2025) **Two configuration bugs:** **Bug 1: Settings UI quality score variable name mismatch** - Settings API used `MIN_QUALITY_SCORE` (wrong) - Code actually reads `MIN_SIGNAL_QUALITY_SCORE` (correct) - User changes in UI had ZERO effect **Bug 2: BlockedSignalTracker using Pyth cache instead of Drift oracle** - `priceAfter1Min/5Min/15Min/30Min` fields staying NULL - Fix: Use `driftService.getOraclePrice()` instead of `getPythPriceMonitor().getCachedPrice()` **Commit:** 6b00303 --- ### Pitfall #56: Ghost Orders After External Closures (🔴 CRITICAL - Fixed Nov 20-21, 2025) **Symptom:** Position closed, but TP/SL orders remain active on Drift **Root Cause:** External closure handler didn't call `cancelAllOrders()` before completing **Real Incident:** Risk of ghost order filling → unintended positions **Fix Applied:** ```typescript // In external closure handler: console.log(`🗑️ Cancelling remaining orders for ${trade.symbol}...`) const cancelResult = await cancelAllOrders(trade.symbol) ``` **Additional Bug:** False positive "32 open orders" on restart - Fix: Check `baseAssetAmount.eq(new BN(0))` to filter truly active orders **Commits:** a3a6222 (Nov 20), 29fce01 (Nov 21) --- ### Pitfall #57: P&L Calculation Inaccuracy for External Closures (🔴 CRITICAL - Fixed Nov 20, 2025) **Symptom:** Database P&L shows -$101.68 when Drift UI shows -$138.35 (36% error) **Root Cause:** External closure handler calculates P&L from monitoring loop's `currentPrice`, which lags behind actual fill price **Fix Applied:** Query Drift's actual settledPnL: ```typescript const position = userAccount.perpPositions.find((p: any) => p.marketIndex === marketConfig.driftMarketIndex ) const settledPnL = Number(position.settledPnl || 0) / 1e6 // Convert to USD if (Math.abs(settledPnL) > 0.01) { totalRealizedPnL = settledPnL console.log(`✅ Using Drift's actual P&L: $${totalRealizedPnL.toFixed(2)}`) } ``` **Commit:** 8e600c8 --- ### Pitfall #58: 5-Layer Database Protection System (⚠️ HIGH - Implemented Nov 21, 2025) **Purpose:** Bulletproof protection against untracked positions from database failures **5 Layers:** 1. **Persistent File Logger** (`lib/utils/persistent-logger.ts`) - Survives container restarts 2. **Database Save with Retry + Verification** - 3 retries with exponential backoff 3. **Orphan Position Detection** - Runs on EVERY container startup 4. **Critical Logging in Execute Endpoint** - Full trade details for recovery 5. **Infrastructure (Docker volumes)** - `./logs:/app/logs` **Real-world validation:** Nov 21, 2025 - No database failure occurred, but protection now in place --- ### Pitfall #59: Layer 2 Ghost Detection Causing Duplicate Telegram Notifications (🔴 CRITICAL - Fixed Nov 22, 2025) **Symptom:** Trade #8 sent 13 duplicate notifications with compounding P&L ($11.50 → $155.05) **Root Cause:** Layer 2 ghost detection (failureCount > 20) didn't check `closingInProgress` flag **Real Incident (Nov 22, 04:05 CET):** - Actual P&L: +$18.79, Database final: $155.05 (8.2× actual) - Rate limit storm: 6,581 failed close attempts **Fix Applied:** ```typescript // AFTER (FIXED): if (trade.priceCheckCount > 20 && !trade.closingInProgress) { if (!position || Math.abs(position.size) < 0.01) { trade.closingInProgress = true trade.closeConfirmedAt = Date.now() await this.handleExternalClosure(trade, 'Layer 2: Ghost detected') return } } ``` **Commit:** b19f156 --- ### Pitfall #60: Stale Array Snapshot in Monitoring Loop (🔴 CRITICAL - Fixed Nov 23, 2025) **Symptom:** Manual closure sends duplicate "POSITION CLOSED" Telegram notifications **Root Cause:** Position Manager creates array snapshot before async processing **Real Incident:** Two identical notifications for cmibdii4k0004pe07nzfmturo **Fix Applied:** ```typescript private async checkTradeConditions(trade: ActiveTrade, currentPrice: number): Promise { // CRITICAL FIX: Check if trade still in monitoring if (!this.activeTrades.has(trade.id)) { console.log(`⏭️ Skipping ${trade.symbol} - already removed from monitoring`) return } // ... rest of function } ``` **Commit:** a7c5930 --- ### Pitfall #61: P&L Compounding STILL Happening Despite All Guards (🔴 CRITICAL - Under Investigation Nov 24, 2025) **Symptom:** Trade showed $974.05 P&L when actual was $72.41 (13.4× inflation) **Evidence:** 14 duplicate Telegram notifications with compounding P&L **Status:** All existing guards in place, yet duplicates still occurred **Interim Fix:** Manual P&L correction, container restart with enhanced closingInProgress flag **Investigation Needed:** - Serialization lock around external closure detection - Unique transaction ID to prevent duplicate DB updates - Telegram notification deduplication **Commit:** 0466295 --- ### Pitfall #62: Adaptive Leverage and Quality Bypass (🔴 CRITICAL - Fixed Nov 24-27, 2025) **Two related bugs:** **Bug 1: Adaptive leverage not working (Nov 24)** - `USE_ADAPTIVE_LEVERAGE` ENV variable not set in .env - Quality 90 trade used 15x instead of intended 10x **Bug 2: Execute endpoint bypassing quality threshold (Nov 27)** - Bot executed trades at quality 30, 50, 50 when minimum is 90/95 - Execute endpoint calculated quality but never validated it **Fix Applied (Nov 27):** ```typescript if (qualityResult.score < minQualityScore) { console.log(`❌ QUALITY TOO LOW: ${qualityResult.score} < ${minQualityScore} threshold`) return NextResponse.json({ success: false, error: 'Quality score too low', }, { status: 400 }) } console.log(`✅ Quality check passed: ${qualityResult.score} >= ${minQualityScore}`) ``` **Commit:** cefa3e6 --- ### Pitfall #63: Smart Entry Validation System (⚠️ HIGH - Deployed Nov 30, 2025) **Purpose:** Recover profits from marginal quality signals (50-89) **Implementation:** `lib/trading/smart-validation-queue.ts` (330+ lines) **Threshold Results (Dec 1, 2025):** - **±0.3%:** 28/200 entries (14%), 67.9% WR, +4.73% total ✅ - ±0.2%: 51/200 entries (26%), 43.1% WR, -18.49% total - ±0.15%: 73/200 entries (36%), 35.6% WR, -38.27% total **Commit:** 7c9cfba --- ### Pitfall #64: EPYC Cluster SSH Timeout (🔴 CRITICAL - Fixed Dec 1, 2025) **Symptom:** Coordinator reports "SSH command timed out for v9_chunk_000002 on worker1" **Root Cause:** 30-second subprocess timeout insufficient for nested SSH hop (master → worker1 → worker2) **Fix Applied:** ```python ssh_opts = "-o StrictHostKeyChecking=no -o ConnectTimeout=10 -o ServerAliveInterval=5" result = subprocess.run(ssh_cmd, timeout=60) # Increased from 30s to 60s ``` **Commit:** ef371a1 **Lesson Learned:** Nested SSH hops need 2× minimum timeout. Latency compounds at each hop. --- ### Pitfall #65: Distributed Worker Quality Filter - Dict vs Callable (🔴 CRITICAL - Fixed Dec 1, 2025) **Symptom:** ALL 2,096 distributed backtests returned 0 trades **Root Cause:** Passed dict `{'min_adx': 15, 'min_volume_ratio': vol_min}` instead of lambda function **Error:** `'dict' object is not callable` **Fix Applied:** ```python # BEFORE (BROKEN): quality_filter = {'min_adx': 15, 'min_volume_ratio': vol_min} # AFTER (FIXED): if vol_min > 0: quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min else: quality_filter = None ``` **Commit:** 11a0ea3 **Lesson Learned:** Silent failures more dangerous than crashes. Exception handler hid severity by returning zeros. --- ### Pitfall #66: Smart Entry Wrong Price Display (🔴 CRITICAL - Fixed Dec 1, 2025) **Symptom:** Abandonment notifications showing impossible prices ($126 → $98 = -22% in 30 seconds) **Root Cause:** Symbol format mismatch between validation queue ("SOLUSDT") and market data cache ("SOL-PERP") **Real Incident:** Cache lookup `marketDataCache.get("SOLUSDT")` returned null **Fix Applied:** ```typescript // Normalize symbol before validation queue const normalizedSymbol = normalizeTradingViewSymbol(body.symbol) const queued = await validationQueue.addSignal({ symbol: normalizedSymbol, // Use normalized format for cache lookup // ... }) ``` **Commit:** 6cec2e8 --- ### Pitfall #67: Ghost Detection Race Condition (🔴 CRITICAL - Fixed Dec 2, 2025) **Symptom:** 23 duplicate "POSITION CLOSED" notifications with P&L compounding (-$47.96 to -$1,129.24) **Root Cause:** Race condition in ghost detection - check `Map.has()` happened AFTER function entry **Real Incident (Dec 2, 17:20 CET):** - Expected P&L: ~-$48 - Actual: 23 notifications with compounding P&L **Fix Applied:** Use Map.delete() atomic return value as deduplication lock: ```typescript // FIXED CODE: async handleExternalClosure(trade: ActiveTrade, reason: string) { const tradeId = trade.id // ✅ Delete IMMEDIATELY - atomic operation if (!this.activeTrades.delete(tradeId)) { console.log('DUPLICATE PREVENTED (atomic lock)') return } // ONLY first caller reaches here // ... rest of cleanup } ``` **Commit:** 93dd950 **Lesson Learned:** When async handler can be called by multiple code paths simultaneously, use atomic operations (like Map.delete()) as locks at function entry. --- ### Pitfall #68: Smart Entry Using Webhook Percentage as Signal Price (🔴 CRITICAL - Fixed Dec 3, 2025) **Symptom:** $89 position sizes, 97% pullback calculations, impossible entry conditions **Root Cause:** TradingView webhook `signal.price` contained percentage (70.80) instead of market price ($142.50) **Real Incident:** Smart Entry log showed "97.4% pullback required" (impossible) **Fix Applied:** ```typescript // Use Pyth current price instead of webhook signal price const pythPrice = await pythClient.getPrice(symbol) const signalPrice = pythPrice.price // ✅ Use actual market price ``` **Commit:** 7d0d38a **Lesson Learned:** Never trust webhook data for calculations. Use authoritative price sources (Pyth, Drift). --- ### Pitfall #69: Direction-Specific Leverage Thresholds Not Explicit (🟡 MEDIUM - Fixed Dec 3, 2025) **Symptom:** Leverage code checked quality score without explicit direction context **Root Cause:** Code pattern was ambiguous about which direction's threshold applied **Fix Applied:** Made direction-specific thresholds explicit: ```typescript if (body.direction === 'LONG') { if (qualityResult.score >= 90) leverage = 5 // ... } else { // SHORT if (qualityResult.score >= 90) leverage = 5 // Same as LONG but explicit // ... } ``` **Commit:** 58f812f --- ### Pitfall #70: Smart Validation Queue Rejected by Execute Endpoint (🔴 CRITICAL - Fixed Dec 3, 2025) **Symptom:** Quality 50-89 signals validated by queue get rejected with "Quality score too low" **Root Cause:** Execute endpoint applies quality threshold check AFTER validation queue confirmed price action **Fix Applied:** ```typescript const isValidatedEntry = body.validatedEntry === true if (isValidatedEntry) { console.log(`✅ VALIDATED ENTRY BYPASS: Quality ${qualityResult.score} accepted`) } // Only apply quality threshold if NOT a validated entry if (!isValidatedEntry && qualityResult.score < minQualityScore) { return NextResponse.json({ error: 'Quality too low' }, { status: 400 }) } ``` **Commit:** 785b09e --- ### Pitfall #71: Revenge System Missing External Closure Integration (🔴 CRITICAL - Fixed Dec 3, 2025) **Symptom:** High-quality signals (85+) stopped by external closures don't trigger revenge window **Root Cause:** Revenge eligibility check only existed in executeExit() path, not handleExternalClosure() **Real Incident (Nov 20):** Quality 90 SHORT at $141.37, stopped at $142.48 (-$138.35), price dropped to $131.32 (+$490 opportunity missed) **Fix Applied:** ```typescript // In external closure handler: if (exitReason === 'SL' && trade.signalQualityScore && trade.signalQualityScore >= 85) { console.log(`🎯 External SL closure - Quality ${trade.signalQualityScore} >= 85`) await stopHuntTracker.recordStopHunt({ originalTradeId: trade.id, symbol: trade.symbol, direction: trade.direction, stopHuntPrice: currentPrice, originalEntryPrice: trade.entryPrice, originalQualityScore: trade.signalQualityScore, stopLossAmount: Math.abs(totalRealizedPnL) }) console.log(`✅ Revenge window activated for external closure (30min monitoring)`) } ``` **Commit:** 785b09e --- ### Pitfall #72: Telegram Webhook Conflicts with Polling Bot (🔴 CRITICAL - Fixed Dec 4, 2025) **Symptom:** Python Telegram bot crashes with "Conflict: can't use getUpdates method while webhook is active" **Root Cause:** n8n had active Telegram webhook that intercepted ALL messages before Python bot **Real Incident:** `/status` command returned n8n test message with broken template syntax **Fix Applied:** ```bash # Delete Telegram webhook curl -s "https://api.telegram.org/bot{TOKEN}/deleteWebhook" # Restart Python bot docker restart telegram-trade-bot ``` **Architecture Decision:** Cannot run both n8n webhook AND Python polling bot simultaneously. Choose one. --- ## Appendix: Pattern Recognition ### Common Root Causes 1. **Race Conditions:** Multiple code paths detecting same event (P&L compounding bugs #48, #49, #59, #60, #67) 2. **Unit Mismatches:** Tokens vs USD, dollars vs percentages (#24, #54) 3. **Symbol Format:** TradingView ("SOLUSDT") vs Drift ("SOL-PERP") (#5, #66) 4. **Deployment Verification:** Declaring "fixed" without checking container timestamp (#31) 5. **SDK Behavior:** Documentation doesn't match reality (#2, #24, #45) 6. **Async Timing:** Operations completing out of expected order (#13, #28, #60) ### Prevention Strategies 1. **Use atomic operations** for state changes (Map.delete() returns boolean) 2. **Always normalize symbols** at integration boundaries 3. **Verify deployment** with container timestamp vs commit time 4. **Never mutate shared state** during calculation phases 5. **Add explicit checks** in ALL code paths, not just happy path 6. **Test with real infrastructure** before trusting provider claims --- ## Cross-Reference Index - **See Also:** `.github/copilot-instructions.md` - Main AI agent instructions with Top 10 Critical Pitfalls - **Related:** `docs/bugs/` - Additional bug documentation - **Related:** `docs/architecture/` - System design context --- **Last Updated:** December 4, 2025 **Maintainer:** AI Agent team following "NOTHING gets lost" principle