56 KiB
Common Pitfalls Reference Documentation
Last Updated: December 4, 2025
Total Documented: 72 Pitfalls
Primary Source:.github/copilot-instructions.md
Purpose
This document is the comprehensive reference for all documented pitfalls, bugs, and lessons learned from the Trading Bot v4 project. Each entry represents a real incident that caused financial loss, system instability, or operational issues.
How to Use This Document:
- Before making changes: Search for related pitfalls to avoid repeating mistakes
- When debugging: Look for symptoms matching your issue
- After fixing bugs: Add new entries to preserve institutional knowledge
- Code review: Verify changes don't reintroduce known issues
Severity Levels:
- 🔴 CRITICAL - Financial loss, data corruption, or system failure
- ⚠️ HIGH - System stability or significant operational impact
- 🟡 MEDIUM - Performance degradation or UX issues
- 🔵 LOW - Code quality or minor improvements
Quick Reference Table
| # | Severity | Category | Date | Summary |
|---|---|---|---|---|
| 1 | 🔴 CRITICAL | SDK/Memory | Nov 15, 2025 | Drift SDK memory leak - heap OOM after 10+ hours |
| 2 | 🔴 CRITICAL | RPC/Infrastructure | Nov 14, 2025 | Wrong RPC provider (Alchemy) breaks Drift SDK |
| 3 | 🟡 MEDIUM | Build/Docker | - | Prisma not generated in Docker |
| 4 | 🟡 MEDIUM | Configuration | - | Wrong DATABASE_URL for container vs host |
| 5 | 🟡 MEDIUM | Data/Symbols | - | Symbol format mismatch (TradingView → Drift) |
| 6 | ⚠️ HIGH | Orders | - | Missing reduce-only flag on exit orders |
| 7 | 🟡 MEDIUM | Architecture | - | Singleton violations (DriftClient, Position Manager) |
| 8 | 🟡 MEDIUM | Types/Prisma | - | Type errors with Prisma after generate |
| 9 | 🟡 MEDIUM | Code Quality | - | Quality score duplication in check-risk and execute |
| 10 | ⚠️ HIGH | Configuration | - | TP2-as-Runner configuration confusion |
| 11 | 🔴 CRITICAL | P&L Calculation | - | P&L calculation using SDK values incorrectly |
| 12 | 🔴 CRITICAL | Transactions | - | Transaction confirmation missing (phantom trades) |
| 13 | ⚠️ HIGH | Execution Order | - | Execution order matters (Position Manager before DB) |
| 14 | ⚠️ HIGH | Timing | - | New trade grace period (30s for Drift propagation) |
| 15 | 🟡 MEDIUM | SDK/Drift | - | Drift minimum position sizes differ from docs |
| 16 | 🔴 CRITICAL | Exit Logic | - | Exit reason detection bug (using current price) |
| 17 | 🟡 MEDIUM | Cooldown | - | Per-symbol cooldown, not global |
| 18 | ⚠️ HIGH | Quality Scoring | - | Timeframe-aware scoring crucial for 5min |
| 19 | 🔴 CRITICAL | Trading Logic | - | Price position chasing causes flip-flops |
| 20 | 🟡 MEDIUM | TradingView | - | TradingView ADX minimum for 5min charts |
| 21 | 🟡 MEDIUM | Types/Prisma | - | Prisma Decimal type handling in raw SQL |
| 22 | 🔴 CRITICAL | Trailing Stop | Nov 11, 2025 | ATR-based trailing stop implementation bug |
| 23 | 🟡 MEDIUM | Database Schema | - | CreateTradeParams interface sync required |
| 24 | 🔴 CRITICAL | SDK/Units | Nov 12, 2025 | Position.size returns tokens not USD |
| 25 | 🟡 MEDIUM | Display | Nov 12, 2025 | Leverage display showing global instead of symbol-specific |
| 26 | 🟡 MEDIUM | Tracking | Nov 12, 2025 | Indicator version tracking (v5→v6→v7→v8) |
| 27 | 🔴 CRITICAL | Race Condition | Nov 15, 2025 | Runner stop loss gap - no protection between TP1 and TP2 |
| 28 | 🔴 CRITICAL | Race Condition | Nov 12, 2025 | External closure duplicate updates bug |
| 29 | 🔴 CRITICAL | Database | Nov 13, 2025 | Database-First Pattern required |
| 30 | ⚠️ HIGH | Network | Nov 13, 2025 | DNS retry logic needed |
| 31 | 🔴 CRITICAL | Deployment | Nov 13, 2025 | Declaring fixes "working" before deployment |
| 32 | 🔴 CRITICAL | Workflow | Nov 14, 2025 | Phantom trade notification workflow breaks |
| 33 | 🔴 CRITICAL | Data Integrity | Nov 15, 2025 | Wrong entry price after orphaned position restoration |
| 34 | 🔴 CRITICAL | Monitoring | Nov 15, 2025 | Runner stop loss gap (duplicate of #27) |
| 35 | 🔴 CRITICAL | Database | Nov 15, 2025 | Phantom trades need exitReason for cleanup |
| 36 | 🔴 CRITICAL | Rate Limits | Nov 15, 2025 | closePosition() missing retry logic causes rate limit storm |
| 37 | 🔴 CRITICAL | Ghost Positions | Nov 15, 2025 | Ghost position accumulation from failed DB updates |
| 38 | 🟡 MEDIUM | Display | Nov 15, 2025 | Analytics dashboard showing original position size |
| 39 | 🔴 CRITICAL | Permissions | Nov 15, 2025 | Settings UI permission error (.env not writable) |
| 40 | 🔴 CRITICAL | Ghost Positions | Nov 15-16, 2025 | Ghost position death spiral from skipped validation |
| 41 | 🔴 CRITICAL | P&L Calculation | Nov 19, 2025 | Stats API recalculating P&L incorrectly for TP1+runner |
| 42 | 🟡 MEDIUM | Notifications | Nov 16, 2025 | Missing Telegram notifications for position closures |
| 43 | 🔴 CRITICAL | Trailing Stop | Nov 20, 2025 | Runner trailing stop never activates after TP1 |
| 44 | ⚠️ HIGH | DNS | Nov 16, 2025 | Telegram bot DNS resolution failures |
| 45 | 🔴 CRITICAL | SDK/Drift | Nov 16, 2025 | Drift SDK position.entryPrice recalculates after partial closes |
| 46 | 🔴 CRITICAL | Leverage | Nov 16, 2025 | Drift account leverage must be set in UI, not API |
| 47 | 🔴 CRITICAL | Verification | Nov 16, 2025 | Position close verification gap - 6 hours unmonitored |
| 48 | 🔴 CRITICAL | P&L Compounding | Nov 16, 2025 | P&L compounding during close verification |
| 49 | 🔴 CRITICAL | P&L Compounding | Nov 17, 2025 | P&L exponential compounding in external closure detection |
| 50 | 🔴 CRITICAL | Database | Nov 19, 2025 | Database not tracking trades despite successful Drift executions |
| 51 | 🔴 CRITICAL | Detection | Nov 19, 2025 | TP1 detection fails when on-chain orders fill fast |
| 52 | 🔴 CRITICAL | Exit Logic | Nov 19, 2025 | ADX-based runner SL only applied in one code path |
| 53 | 🔴 CRITICAL | Container | Nov 19, 2025 | Container restart kills positions + phantom detection bug |
| 54 | 🔴 CRITICAL | Data Integrity | Nov 23, 2025 | MFE/MAE storing dollars instead of percentages |
| 55 | 🔴 CRITICAL | Configuration | Nov 19-20, 2025 | Settings UI quality score variable name mismatch / BlockedSignalTracker using wrong price source |
| 56 | 🔴 CRITICAL | Ghost Orders | Nov 20-21, 2025 | Ghost orders after external closures + false order count bug |
| 57 | 🔴 CRITICAL | P&L Calculation | Nov 20, 2025 | P&L calculation inaccuracy for external closures |
| 58 | ⚠️ HIGH | Database | Nov 21, 2025 | 5-Layer Database Protection System implemented |
| 59 | 🔴 CRITICAL | Duplicates | Nov 22, 2025 | Layer 2 ghost detection causing duplicate Telegram notifications |
| 60 | 🔴 CRITICAL | Race Condition | Nov 23, 2025 | Stale array snapshot in monitoring loop causes duplicate processing |
| 61 | 🔴 CRITICAL | P&L Compounding | Nov 24, 2025 | P&L compounding STILL happening despite all guards |
| 62 | 🔴 CRITICAL | Quality Check | Nov 24-27, 2025 | Adaptive leverage not working / Execute endpoint bypassing quality threshold |
| 63 | ⚠️ HIGH | Feature | Nov 30, 2025 | Smart Entry Validation System - Block & Watch deployed |
| 64 | 🔴 CRITICAL | Cluster | Dec 1, 2025 | EPYC Cluster SSH Timeout - nested hop requires longer timeouts |
| 65 | 🔴 CRITICAL | Cluster | Dec 1, 2025 | Distributed Worker Quality Filter - dict vs callable |
| 66 | 🔴 CRITICAL | Smart Entry | Dec 1, 2025 | Smart Entry Validation Queue wrong price display |
| 67 | 🔴 CRITICAL | Race Condition | Dec 2, 2025 | Ghost detection race condition causing duplicate notifications with P&L compounding |
| 68 | 🔴 CRITICAL | Smart Entry | Dec 3, 2025 | Smart Entry using webhook percentage as signal price |
| 69 | 🟡 MEDIUM | Configuration | Dec 3, 2025 | Direction-specific leverage thresholds not explicit in code |
| 70 | 🔴 CRITICAL | Smart Entry | Dec 3, 2025 | Smart Validation Queue rejected by execute endpoint |
| 71 | 🔴 CRITICAL | Revenge System | Dec 3, 2025 | Revenge system missing external closure integration |
| 72 | 🔴 CRITICAL | Telegram | Dec 4, 2025 | Telegram webhook conflicts with polling bot |
Category Index
🔴 P&L Calculation Errors
- #11 - P&L calculation using SDK values incorrectly
- #41 - Stats API recalculating P&L incorrectly
- #48 - P&L compounding during close verification
- #49 - P&L exponential compounding
- #54 - MFE/MAE storing dollars instead of percentages
- #57 - P&L calculation inaccuracy for external closures
- #61 - P&L compounding STILL happening
🔴 Race Conditions & Duplicates
- #27 - Runner stop loss gap - no protection between TP1 and TP2
- #28 - External closure duplicate updates
- #59 - Layer 2 ghost detection duplicates
- #60 - Stale array snapshot duplicates
- #67 - Ghost detection race condition
🔴 SDK/API Integration
- #1 - Drift SDK memory leak
- #2 - Wrong RPC provider (Alchemy)
- #12 - Transaction confirmation missing
- #24 - Position.size tokens vs USD
- #36 - closePosition() missing retry logic
- #45 - position.entryPrice recalculates after partial closes
🔴 Database Operations
- #29 - Database-First Pattern required
- #35 - Phantom trades need exitReason
- #37 - Ghost position accumulation
- #50 - Database not tracking trades
- #58 - 5-Layer Database Protection System
🔴 Configuration & Settings
- #55 - Settings UI quality score variable name mismatch
- #62 - Adaptive leverage / Execute endpoint bypassing quality threshold
🔴 Deployment & Verification
- #31 - Declaring fixes "working" before deployment
- #47 - Position close verification gap - 6 hours unmonitored
🔴 Smart Entry & Validation
- #63 - Smart Entry Validation System
- #66 - Smart Entry wrong price display
- #68 - Smart Entry using webhook percentage
- #70 - Smart Validation Queue rejected by execute
⚠️ Ghost Positions & Orders
⚠️ Network & Infrastructure
- #30 - DNS retry logic
- #44 - Telegram bot DNS resolution
- #64 - EPYC Cluster SSH timeout
- #65 - Distributed Worker dict vs callable
⚠️ Trailing Stop & Exit Logic
- #22 - ATR-based trailing stop implementation
- #43 - Runner trailing stop never activates
- #51 - TP1 detection fails on-chain
- #52 - ADX-based runner SL one code path
Detailed Pitfall Entries
Pitfall #1: Drift SDK Memory Leak (🔴 CRITICAL - Fixed Nov 15, 2025, Enhanced Nov 24, 2025)
Symptom: JavaScript heap out of memory after 10+ hours runtime, Telegram bot timeouts (60s)
Root Cause: Drift SDK accumulates WebSocket subscriptions over time without cleanup
Real Incident:
- Thousands of
accountUnsubscribe error: readyState was 2 (CLOSING)in logs - Heap growth: Normal ~200MB → 4GB+ after 10 hours → OOM crash
Impact: System crashes after extended uptime, requires manual container restart
Fix Applied:
- File:
lib/monitoring/drift-health-monitor.ts - Implementation: Smart error-based health monitoring replaces blind timer
interceptWebSocketErrors()patches console.error to catch SDK WebSocket errors- 30-second sliding window: Only restarts if 50+ errors in 30 seconds
- Container restart via flag: Writes
/tmp/trading-bot-restart.flagfor watch-restart.sh
- API:
GET /api/drift/health- Check error count and health status - Commit: Enhanced Nov 24, 2025
Code Reference:
// lib/monitoring/drift-health-monitor.ts
interceptWebSocketErrors() // Patches console.error
if (errorsInWindow > 50) {
writeRestartFlag() // Triggers container restart
}
Prevention: Monitor for 🏥 Drift health monitor started and error threshold logs
Lesson Learned: Smart, reactive monitoring is better than blind timers. Only restart when actual problems occur, not on a schedule.
Pitfall #2: Wrong RPC Provider (🔴 CRITICAL - Investigation Complete Nov 14, 2025)
Symptom: Trades fail, duplicate closes, Position Manager loses tracking, database save failures
Root Cause: Alchemy's rate limiting breaks Drift SDK's burst subscription pattern during initialization
Real Incident (Nov 14, 21:14 CET):
- Created diagnostic endpoint
/api/testing/drift-init - Alchemy: 17-71 subscription errors EVERY init (49 avg over 5 runs), 1644ms avg init time
- Helius: 0 subscription errors EVERY init, 800ms avg init time
Impact: Complete system failure when using wrong RPC provider
Why Alchemy Fails:
- Drift SDK subscribes to 30-50+ accounts simultaneously during init (burst pattern)
- Alchemy's CUPS enforcement rate limits these burst requests
- Drift SDK does NOT retry failed subscriptions
- SDK reports "initialized successfully" but with incomplete subscription set
- Error:
"Received JSON-RPC error calling accountSubscribe"
Fix Applied:
- Use Helius RPC (https://mainnet.helius-rpc.com/?api-key=...)
- Retry logic: 5s exponential backoff for rate limits
- Documentation:
docs/ALCHEMY_RPC_INVESTIGATION_RESULTS.md
Code Reference:
# Test yourself
curl 'http://localhost:3001/api/testing/drift-init?rpc=alchemy'
Prevention: ALWAYS use Helius RPC. Do not use Alchemy for Drift SDK.
Lesson Learned: Documentation doesn't always reflect reality. Test with real infrastructure before trusting provider claims.
Pitfall #3: Prisma Not Generated in Docker (🟡 MEDIUM)
Symptom: Build fails with Prisma client errors
Root Cause: Must run npx prisma generate in Dockerfile BEFORE npm run build
Fix Applied: Add RUN npx prisma generate before build step in Dockerfile
Pitfall #4: Wrong DATABASE_URL (🟡 MEDIUM)
Symptom: Database connection failures
Root Cause: Container runtime needs trading-bot-postgres (container name), Prisma CLI from host needs localhost:5432
Fix Applied: Use correct hostname based on context:
- Container:
postgresql://postgres:password@trading-bot-postgres:5432/trading_bot_v4 - Host CLI:
postgresql://postgres:password@localhost:5432/trading_bot_v4
Pitfall #5: Symbol Format Mismatch (🟡 MEDIUM)
Symptom: Drift API rejects orders, symbol not found errors
Root Cause: TradingView sends "SOLUSDT" but Drift requires "SOL-PERP"
Fix Applied: Always normalize with normalizeTradingViewSymbol() before calling Drift
- File:
config/trading.ts - Applies to ALL endpoints including
/api/trading/close
Pitfall #6: Missing Reduce-Only Flag (⚠️ HIGH)
Symptom: Exit orders accidentally open new positions instead of closing
Root Cause: Exit orders without reduceOnly: true can open new positions
Fix Applied: All TP/SL orders MUST include reduceOnly: true
const orderParams = {
reduceOnly: true, // CRITICAL for TP/SL orders
// ... other params
}
Pitfall #7: Singleton Violations (🟡 MEDIUM)
Symptom: Connection issues, state inconsistencies, multiple WebSocket connections
Root Cause: Creating multiple DriftClient or Position Manager instances
Fix Applied: Always use getter functions:
const driftService = await initializeDriftService() // NOT: new DriftService()
const positionManager = getPositionManager() // NOT: new PositionManager()
const prisma = getPrismaClient() // NOT: new PrismaClient()
Pitfall #8: Prisma Type Errors (🟡 MEDIUM)
Symptom: TypeScript compilation fails with Prisma types
Root Cause: Trade type from Prisma only available AFTER npx prisma generate
Fix Applied: Run npx prisma generate after any schema changes
Pitfall #9: Quality Score Duplication (🟡 MEDIUM)
Symptom: Inconsistent quality scoring between endpoints
Root Cause: Signal quality calculation exists in BOTH check-risk and execute endpoints
Fix Applied: Keep logic synchronized between both endpoints when making changes
Pitfall #10: TP2-as-Runner Configuration (⚠️ HIGH)
Symptom: Confusion about runner size and TP2 behavior
Root Cause: takeProfit2SizePercent: 0 means "TP2 activates trailing stop, no position close"
Fix Applied:
TAKE_PROFIT_2_PERCENT=0.7sets TP2 trigger priceTAKE_PROFIT_2_SIZE_PERCENTshould be 0 for runner system- Runner = 100% - TAKE_PROFIT_1_SIZE_PERCENT (default 40%)
Pitfall #11: P&L Calculation Critical (🔴 CRITICAL)
Symptom: Incorrect P&L values in database and analytics
Root Cause: Using SDK values instead of actual entry vs exit price calculation
Fix Applied:
const profitPercent = this.calculateProfitPercent(trade.entryPrice, exitPrice, trade.direction)
const actualRealizedPnL = (closedSizeUSD * profitPercent) / 100
trade.realizedPnL += actualRealizedPnL // NOT: result.realizedPnL from SDK
Pitfall #12: Transaction Confirmation Critical (🔴 CRITICAL)
Symptom: "Phantom trades" - SDK returns signatures for transactions that never execute
Root Cause: Both openPosition() AND closePosition() must call connection.confirmTransaction()
Fix Applied:
const txSig = await driftClient.placePerpOrder(orderParams)
console.log('⏳ Confirming transaction on-chain...')
const connection = driftService.getConnection()
const confirmation = await connection.confirmTransaction(txSig, 'confirmed')
if (confirmation.value.err) {
throw new Error(`Transaction failed: ${JSON.stringify(confirmation.value.err)}`)
}
console.log('✅ Transaction confirmed on-chain')
Pitfall #13: Execution Order Matters (⚠️ HIGH)
Symptom: Race conditions where monitoring starts before trade exists in database
Root Cause: Position Manager added before database save
Fix Applied: Order MUST be:
- Open position + place exit orders
- Save to database (
createTrade()) - Add to Position Manager (
positionManager.addTrade())
Pitfall #14: New Trade Grace Period (⚠️ HIGH)
Symptom: New positions immediately detected as "closed externally" and cancelled
Root Cause: Drift positions take 5-10 seconds to propagate after opening
Fix Applied: Position Manager skips "external closure" detection for trades <30 seconds old
Pitfall #15: Drift Minimum Position Sizes (🟡 MEDIUM)
Symptom: Orders rejected for being too small
Root Cause: Actual minimums differ from documentation:
- SOL-PERP: 0.1 SOL (~$5-15)
- ETH-PERP: 0.01 ETH (~$38-40)
- BTC-PERP: 0.0001 BTC (~$10-12)
Fix Applied: Calculate minOrderSize × currentPrice must exceed Drift's $4 minimum. Add buffer.
Pitfall #16: Exit Reason Detection Bug (🔴 CRITICAL)
Symptom: Profitable trades mislabeled as "SL" exits
Root Cause: Position Manager using current price to determine exit reason, but on-chain orders filled at different price
Fix Applied: Use trade.tp1Hit / trade.tp2Hit flags and realized P&L to correctly identify exit trigger
Pitfall #17: Per-Symbol Cooldown (🟡 MEDIUM)
Symptom: ETH trade incorrectly blocking SOL trade
Root Cause: Cooldown was global, not per-symbol
Fix Applied: Each coin (SOL/ETH/BTC) has independent cooldown timer via getLastTradeTimeForSymbol(symbol)
Pitfall #18: Timeframe-Aware Scoring Crucial (⚠️ HIGH)
Symptom: Valid 5min breakouts blocked as "low quality"
Root Cause: Signal quality thresholds not adjusted for 5min vs higher timeframes
- 5min: ADX 12-22 healthy, ATR 0.2-0.7%
- Daily: ADX 18-30 healthy, ATR 0.4%+
Fix Applied: Always pass timeframe parameter from TradingView alerts to scoreSignalQuality()
Pitfall #19: Price Position Chasing (🔴 CRITICAL)
Symptom: Rapid flip-flop losses
Root Cause: Opening longs at 90%+ range or shorts at <10% range
Real Incident: Overnight flip-flop losses all had price position 9-94%
Fix Applied: Quality scoring now penalizes -15 to -30 points for range extremes
Pitfall #20: TradingView ADX Minimum (🟡 MEDIUM)
Symptom: Too many signals blocked or too many low-quality signals passing
Root Cause: TradingView ADX filter should be 15 for 5min (not 20+)
Fix Applied: Set ADX ≥15 in TradingView alerts for 5min charts. Bot's quality scoring provides second-layer filtering.
Pitfall #21: Prisma Decimal Type Handling (🟡 MEDIUM)
Symptom: Frontend errors with .toFixed() on undefined
Root Cause: Raw SQL queries return Prisma Decimal objects, not plain numbers
Fix Applied:
// Use `any` type for numeric fields in $queryRaw results
const stat: { total_pnl: any } = await prisma.$queryRaw`...`
// Convert with Number() before returning to frontend
totalPnL: Number(stat.total_pnl) || 0
Pitfall #22: ATR-Based Trailing Stop Implementation (🔴 CRITICAL - Nov 11, 2025)
Symptom: Trades with +7-9% MFE exited for losses
Root Cause: Runner system was using FIXED 0.3% trailing instead of ATR-based
Real Incident: At $168 SOL, 0.3% = $0.50 wiggle room - too tight
Fix Applied:
trailingDistancePercent = (atrAtEntry / currentPrice * 100) × trailingStopAtrMultiplier
Configuration:
TRAILING_STOP_ATR_MULTIPLIER=1.5MIN=0.25%,MAX=0.9%ACTIVATION=0.5%
Result: 0.45% ATR × 1.5 = 0.675% trail ($1.13 vs $0.50 = 2.26x more room)
Documentation: ATR_TRAILING_STOP_FIX.md
Pitfall #23: CreateTradeParams Interface Sync (🟡 MEDIUM)
Symptom: TypeScript build fails when endpoint passes field not in interface
Root Cause: New database fields added to Trade model but not to CreateTradeParams interface
Fix Applied: When adding new fields:
- Add to interface in
lib/database/trades.ts - Add to Prisma create data object in
createTrade()function
Pitfall #24: Position.size Tokens vs USD Bug (🔴 CRITICAL - Fixed Nov 12, 2025)
Symptom: Position Manager detects false TP1 hits, moves SL to breakeven prematurely
Root Cause: lib/drift/client.ts returns position.size as BASE ASSET TOKENS (12.28 SOL), not USD ($1,950)
Real Incident: Comparing tokens (12.28) directly to USD ($1,950) → "99.4% reduction" → FALSE TP1!
Fix Applied:
// In Position Manager (lines 322, 519, 558, 591)
const positionSizeUSD = Math.abs(position.size) * currentPrice
// Now compare USD to USD
if (positionSizeUSD < trade.currentSize * 0.95) {
// Actual 5%+ reduction detected
}
Impact: Without this fix, TP1 never triggers correctly, SL moves at wrong times, runner system fails
Pitfall #25: Leverage Display Bug (🟡 MEDIUM - Fixed Nov 12, 2025)
Symptom: Telegram notifications showing "⚡ Leverage: 10x" when actual position uses 15x
Root Cause: API response returning config.leverage (global default) instead of symbol-specific value
Fix Applied:
const { size, leverage, enabled } = getPositionSizeForSymbol(driftSymbol, config)
// Return symbol-specific leverage
leverage: leverage, // NOT: config.leverage
Pitfall #26: Indicator Version Tracking (🟡 MEDIUM - Nov 12, 2025+)
Symptom: Unable to compare performance between TradingView strategies
Root Cause: No tracking of which indicator generated the signal
Fix Applied: Database field indicatorVersion tracks:
- v5: Buy/Sell Signal (pre-Nov 12)
- v6: HalfTrend + BarColor (Nov 12-18)
- v7: v6 with toggles (deprecated)
- v8: Money Line Sticky Trend (Nov 18+)
- v9: Money Line with Momentum Filter (Nov 26+)
Pitfall #27: Runner Stop Loss Gap - No Protection Between TP1 and TP2 (🔴 CRITICAL - Fixed Nov 15, 2025)
Symptom: Runner position remained open despite price moving far past stop loss level
Root Cause: Position Manager only checked stop loss BEFORE TP1 (line 877), creating a protection gap
Real Incident:
- SHORT opened, TP1 hit at 70% close (runner = 30% remaining)
- Runner had stop loss at profit-lock level (+0.5%)
- Price moved past stop loss → NO CHECK RAN (tp1Hit = true, so SL check skipped)
- Runner exposed to unlimited loss for hours during TP1→TP2 window
Fix Applied:
// Added explicit runner stop loss check at line ~881:
if (trade.tp1Hit && !trade.tp2Hit && this.shouldStopLoss(currentPrice, trade)) {
console.log(`🔴 RUNNER STOP LOSS: ${trade.symbol}`)
await this.executeExit(trade, 100, 'SL', currentPrice)
return
}
Lesson Learned: Every conditional branch in risk management MUST have explicit stop loss checks - never assume "it'll get caught somewhere"
Pitfall #28: External Closure Duplicate Updates Bug (<28><> CRITICAL - Fixed Nov 12, 2025)
Symptom: Trades showing 7-8x larger losses than actual ($58 loss when Drift shows $7 loss)
Root Cause: Position Manager monitoring loop re-processes external closures multiple times before trade removed from activeTrades Map
Real Incident:
- Trade closed externally at -$7.98
- Position Manager detects closure, calculates P&L → -$7.50 in DB
- Trade still in Map (removal async), loop runs again
- Accumulates P&L: -$7.50 + -$7.50 = -$15.00
- Repeats 8 times → final -$58.43
Fix Applied:
// BEFORE (BROKEN):
await updateTradeExit({ ... })
await this.removeTrade(trade.id) // Too late!
// AFTER (FIXED):
this.activeTrades.delete(trade.id) // Remove FIRST
await updateTradeExit({ ... }) // Then update DB
Commit: Fixed Nov 12, 2025
Pitfall #29: Database-First Pattern (🔴 CRITICAL - Fixed Nov 13, 2025)
Symptom: Positions opened on Drift with NO database record, NO Position Manager tracking, NO TP/SL protection
Root Cause: Execute endpoint saved to database AFTER adding to Position Manager, with silent error catch
Real Incident: Unprotected position opened, database save failed silently, Position Manager never tracked it
Fix Applied:
// CRITICAL: Save to database FIRST before adding to Position Manager
try {
await createTrade({...})
} catch (dbError) {
console.error('❌ CRITICAL: Failed to save trade to database:', dbError)
return NextResponse.json({
success: false,
error: 'Database save failed - position unprotected',
message: `CLOSE POSITION MANUALLY IMMEDIATELY. Transaction: ${openResult.transactionSignature}`,
}, { status: 500 })
}
// ONLY add to Position Manager if database save succeeded
await positionManager.addTrade(activeTrade)
Documentation: CRITICAL_INCIDENT_UNPROTECTED_POSITION.md
Pitfall #30: DNS Retry Logic (⚠️ HIGH - Nov 13, 2025)
Symptom: Trading bot fails with "fetch failed" errors when DNS resolution temporarily fails
Root Cause: EAI_AGAIN errors are transient DNS issues that resolve in seconds
Fix Applied: Automatic retry in lib/drift/client.ts:
// Detects: fetch failed, EAI_AGAIN, ENOTFOUND, ETIMEDOUT
// Retries up to 3 times with 2s delay
await this.retryOperation(async () => {
// Initialize Drift SDK, subscribe, get user account
}, 3, 2000, 'Drift initialization')
Documentation: docs/DNS_RETRY_LOGIC.md
Pitfall #31: Declaring Fixes "Working" Before Deployment (🔴 CRITICAL - Nov 13, 2025)
Symptom: AI says "position is protected" when container still running old code
Root Cause: Conflating "code committed to git" with "code running in production"
Real Incident: Fix committed 15:56, declared "working" at 19:42, but container started 15:06 (old code)
Verification Required:
# ALWAYS check before declaring fix deployed:
docker logs trading-bot-v4 | grep "Server starting" | head -1
# Compare container start time to git commit timestamp
# If container older: FIX NOT DEPLOYED
Rule: NEVER say "fixed", "working", "protected", or "deployed" without verifying container restart timestamp
Pitfall #32: Phantom Trade Notification Workflow Breaks (🔴 CRITICAL - Nov 14, 2025)
Symptom: Phantom trade detected, position opened, but n8n workflow stops. User NOT notified.
Root Cause: Execute endpoint returned HTTP 500 when phantom detected, causing n8n chain to halt
Fix Applied: Auto-close phantom trades immediately + return HTTP 200 with warning:
return NextResponse.json({
success: true,
warning: 'Phantom trade detected and auto-closed',
isPhantom: true,
message: '[Full notification text]',
phantomDetails: {...}
})
Database tracking: status='phantom', exitReason='manual'
Pitfall #33: Wrong Entry Price After Orphaned Position Restoration (🔴 CRITICAL - Fixed Nov 15, 2025)
Symptom: Position Manager tracking wrong entry price after container restart
Root Cause: Startup validation restored orphaned position using OLD database entry price instead of querying Drift
Real Incident: DB showed $141.51, Drift showed $141.31 actual entry → 0.14% SL placement error
Fix Applied: Query Drift SDK for actual entry price during orphaned position restoration:
await prisma.trade.update({
data: {
entryPrice: position.entryPrice, // CRITICAL: Use Drift's actual entry price
positionSizeUSD: positionSizeUSD,
}
})
Pitfall #35: Phantom Trades Need exitReason (🔴 CRITICAL - Fixed Nov 15, 2025)
Symptom: Position Manager keeps restoring phantom trade on every restart
Root Cause: Phantom auto-closure sets status='phantom' but leaves exitReason=NULL
Real Incident: Phantom trade caused 232% size mismatch, hundreds of false alerts
Fix Applied: MUST set exitReason when auto-closing phantoms:
await updateTradeExit({
tradeId: trade.id,
exitPrice: currentPrice,
exitReason: 'manual', // CRITICAL: Must set exitReason for cleanup
status: 'phantom'
})
Pitfall #36: closePosition() Missing Retry Logic (🔴 CRITICAL - Fixed Nov 15, 2025)
Symptom: Position Manager tries to close, gets 429 error, retries EVERY 2 SECONDS → 100+ failed attempts
Root Cause: placeExitOrders() had retry wrapper but closePosition() did NOT
Real Incident: 100+ "❌ Failed to close position: 429" + compounding P&L
Fix Applied: Wrapped closePosition() with retryWithBackoff():
const txSig = await retryWithBackoff(async () => {
return await driftClient.placePerpOrder(orderParams)
}, 3, 8000) // 8s base delay, 3 max retries (8s → 16s → 32s)
Pitfall #37: Ghost Position Accumulation (🔴 CRITICAL - Fixed Nov 15, 2025)
Symptom: Position Manager tracking 4+ positions when database shows only 1 open trade
Root Cause: Database has exitReason IS NULL for positions actually closed on Drift
Real Incident: 4+ ghosts → massive rate limiting, "vanishing orders"
Fix Applied: Periodic Drift position validation:
private scheduleValidation(): void {
this.validationInterval = setInterval(async () => {
await this.validatePositions()
}, 5 * 60 * 1000)
}
Pitfall #38: Analytics Dashboard Wrong Size (🟡 MEDIUM - Fixed Nov 15, 2025)
Symptom: Analytics page displays $42.54 when actual runner is $12.59 after TP1
Root Cause: API returns trade.positionSizeUSD (original) not runner size
Fix Applied: Check Position Manager state for open positions:
const currentSize = configSnapshot?.positionManagerState?.currentSize
const displaySize = trade.exitReason === null && currentSize
? currentSize
: trade.positionSizeUSD
Pitfall #40: Ghost Position Death Spiral (🔴 CRITICAL - Fixed Nov 15-16, 2025)
Symptom: Container crashes from cascading ghost detection failures
Root Cause: Position validation skipped during death spiral recovery, creating more ghosts
Fix Applied: Never skip validation during recovery operations
Pitfall #41: Stats API Recalculating P&L Incorrectly (🔴 CRITICAL - Fixed Nov 19, 2025)
Symptom: Analytics showing wrong P&L for trades with TP1+runner
Root Cause: Stats API recalculating P&L from partial position data
Fix Applied: Use stored realizedPnL directly, don't recalculate
Pitfall #43: Runner Trailing Stop Never Activates (🔴 CRITICAL - Fixed Nov 20, 2025)
Symptom: Runner position sits without trailing stop after TP1
Root Cause: Trailing stop activation logic only ran in one code path
Fix Applied: Ensure trailing stop activates in all TP1 detection paths
Pitfall #44: Telegram Bot DNS Resolution (⚠️ HIGH - Fixed Nov 16, 2025)
Symptom: Telegram notifications fail intermittently
Root Cause: DNS resolution failures for api.telegram.org
Fix Applied: Retry logic for Telegram API calls
Pitfall #45: Drift SDK position.entryPrice Recalculates (🔴 CRITICAL - Fixed Nov 16, 2025)
Symptom: Entry price changes after partial closes
Root Cause: Drift SDK calculates position.entryPrice from quoteAssetAmount / baseAssetAmount
Impact: After TP1 closes 75%, remaining 25% has "new" entry price
Fix Applied: Store and use original entry price from trade record, not SDK
Pitfall #46: 100% Position Sizing InsufficientCollateral (🔴 CRITICAL - Fixed Nov 16, 2025)
Symptom: Bot gets InsufficientCollateral errors when Drift UI can open same size
Root Cause: Drift's margin calculation includes fees, slippage buffers
Real Incident: $85.55 collateral, bot tries 100% → rejected, shortage: $0.03
Fix Applied:
if (configuredSize >= 100) {
percentDecimal = 0.99
console.log(`⚠️ Applying 99% safety buffer for 100% position`)
}
Commit: 7129cbf
Pitfall #47: Position Close Verification Gap (🔴 CRITICAL - Fixed Nov 16, 2025)
Symptom: Close transaction confirmed, database marked "closed", but position stayed open 6+ hours
Root Cause: Transaction confirmation ≠ Drift internal state updated immediately (5-10s delay)
Real Incident: Trailing stop triggered 02:51, position stayed open until 08:51 restart
Fix Applied: 2-layer verification:
if (params.percentToClose === 100) {
await cancelAllOrders(params.symbol)
console.log('⏳ Waiting 5s for Drift state to propagate...')
await new Promise(resolve => setTimeout(resolve, 5000))
const verifyPosition = await driftService.getPosition(marketConfig.driftMarketIndex)
if (verifyPosition && Math.abs(verifyPosition.size) >= 0.01) {
console.error(`🔴 CRITICAL: Close confirmed BUT position still exists!`)
return { ...result, needsVerification: true }
}
}
Commit: c607a66
Pitfall #48: P&L Compounding During Close Verification (🔴 CRITICAL - Fixed Nov 16, 2025)
Symptom: P&L accumulates during the 5-10s verification wait
Root Cause: Monitoring loop continues during verification, detecting "external closure" multiple times
Fix Applied: closingInProgress flag:
if ((result as any).needsVerification) {
trade.closingInProgress = true
trade.closeConfirmedAt = Date.now()
console.log(`🔒 Marked as closing in progress - external closure detection disabled`)
return
}
// Skip external closure check if closingInProgress
if ((position === null || position.size === 0) && !trade.closingInProgress) {
// ... handle external closure
}
Related: Pitfalls #27, #49
Pitfall #49: P&L Exponential Compounding in External Closure Detection (🔴 CRITICAL - Fixed Nov 17, 2025)
Symptom: Database P&L shows 15-20× actual value ($92.46 when Drift shows $6.00)
Root Cause: trade.realizedPnL was being mutated during each external closure detection cycle
Real Incident (Nov 17, 13:54 CET):
- SOL-PERP SHORT closed by on-chain orders
- Actual P&L: ~$6.00, Database recorded: $92.46 (15.4× too high)
- Rate limiting caused 15+ detection cycles → $6 → $12 → $24 → $48 → $96
Fix Applied:
// DON'T mutate trade.realizedPnL - causes compounding!
// trade.realizedPnL = totalRealizedPnL ← REMOVED
// Use local variable for DB update
await updateTradeExit({
realizedPnL: totalRealizedPnL, // Use local variable
})
Commit: 6156c0f
Lesson Learned: In monitoring loops, NEVER mutate shared state during calculation phases. Calculate locally, update shared state ONCE at the end.
Pitfall #50: Database Not Tracking Trades (🔴 CRITICAL - RESOLVED Nov 19, 2025)
Symptom: Drift UI shows 6 trades, database shows only 3 trades
Root Cause: P&L compounding bug (#49) - in-memory object with stale/accumulated values
Fix Applied: Calculate P&L from immutable source values (entry/exit prices), never from in-memory fields
Pitfall #51: TP1 Detection Fails When On-Chain Orders Fill Fast (🔴 CRITICAL - Fixed Nov 19, 2025)
Symptom: TP1 order fills, but database records exitReason as "SL" instead of "TP1"
Root Cause: Position Manager detects closure AFTER both TP1 and runner already closed on-chain
Real Incident: LONG opened, TP1+runner closed within 7 minutes, trade.tp1Hit = false
Fix Applied: Simple percentage-based exit reason:
if (runnerProfitPercent > 0.3) {
if (runnerProfitPercent >= 1.2) {
exitReason = 'TP2' // Large profit (>1.2%)
} else {
exitReason = 'TP1' // Moderate profit (0.3-1.2%)
}
} else {
exitReason = 'SL' // Negative or tiny profit (<0.3%)
}
Commit: de57c96
Pitfall #52: ADX-Based Runner SL Only Applied in One Code Path (🔴 CRITICAL - Fixed Nov 19, 2025)
Symptom: TP1 fills via on-chain order, runner gets breakeven SL instead of ADX-based positioning
Root Cause: Two TP1 detection paths, only one had ADX logic
Fix Applied: Added ADX-based runner SL to on-chain fill detection path (lines 607-642)
Pitfall #53: Container Restart Kills Positions + Phantom Detection Bug (🔴 CRITICAL - Fixed Nov 19, 2025)
Two bugs from container restart:
Bug 1: Startup order restore failure
- Wrong database field names (
takeProfit1OrderTxvs correcttp1OrderTx) - Fix: Use correct field names
Bug 2: Phantom detection killing runners
- Runners (40% remaining) flagged as phantom
- Fix: Check
!trade.tp1Hitbefore phantom detection:
const wasPhantom = !trade.tp1Hit && trade.currentSize > 0 && (trade.currentSize / trade.positionSize) < 0.5
Commit: eccecf7
Pitfall #54: MFE/MAE Storing Dollars Instead of Percentages (🔴 CRITICAL - Fixed Nov 23, 2025)
Symptom: Database showing maxFavorableExcursion = 64.08% when TradingView showed 0.48%
Root Cause: Position Manager storing DOLLAR amounts instead of PERCENTAGES
Real Incident: 133× inflation (64.08% stored vs 0.48% actual)
Fix Applied:
// BEFORE (BROKEN):
if (currentPnLDollars > trade.maxFavorableExcursion) {
trade.maxFavorableExcursion = currentPnLDollars // Storing $64.08
// AFTER (FIXED):
if (profitPercent > trade.maxFavorableExcursion) {
trade.maxFavorableExcursion = profitPercent // Storing 0.48%
Commit: 6255662
Lesson Learned: Always verify data storage units match schema expectations. Comments don't override schema.
Pitfall #55: Configuration Issues (🔴 CRITICAL - Fixed Nov 19-20, 2025)
Two configuration bugs:
Bug 1: Settings UI quality score variable name mismatch
- Settings API used
MIN_QUALITY_SCORE(wrong) - Code actually reads
MIN_SIGNAL_QUALITY_SCORE(correct) - User changes in UI had ZERO effect
Bug 2: BlockedSignalTracker using Pyth cache instead of Drift oracle
priceAfter1Min/5Min/15Min/30Minfields staying NULL- Fix: Use
driftService.getOraclePrice()instead ofgetPythPriceMonitor().getCachedPrice()
Commit: 6b00303
Pitfall #56: Ghost Orders After External Closures (🔴 CRITICAL - Fixed Nov 20-21, 2025)
Symptom: Position closed, but TP/SL orders remain active on Drift
Root Cause: External closure handler didn't call cancelAllOrders() before completing
Real Incident: Risk of ghost order filling → unintended positions
Fix Applied:
// In external closure handler:
console.log(`🗑️ Cancelling remaining orders for ${trade.symbol}...`)
const cancelResult = await cancelAllOrders(trade.symbol)
Additional Bug: False positive "32 open orders" on restart
- Fix: Check
baseAssetAmount.eq(new BN(0))to filter truly active orders
Commits: a3a6222 (Nov 20), 29fce01 (Nov 21)
Pitfall #57: P&L Calculation Inaccuracy for External Closures (🔴 CRITICAL - Fixed Nov 20, 2025)
Symptom: Database P&L shows -$101.68 when Drift UI shows -$138.35 (36% error)
Root Cause: External closure handler calculates P&L from monitoring loop's currentPrice, which lags behind actual fill price
Fix Applied: Query Drift's actual settledPnL:
const position = userAccount.perpPositions.find((p: any) =>
p.marketIndex === marketConfig.driftMarketIndex
)
const settledPnL = Number(position.settledPnl || 0) / 1e6 // Convert to USD
if (Math.abs(settledPnL) > 0.01) {
totalRealizedPnL = settledPnL
console.log(`✅ Using Drift's actual P&L: $${totalRealizedPnL.toFixed(2)}`)
}
Commit: 8e600c8
Pitfall #58: 5-Layer Database Protection System (⚠️ HIGH - Implemented Nov 21, 2025)
Purpose: Bulletproof protection against untracked positions from database failures
5 Layers:
- Persistent File Logger (
lib/utils/persistent-logger.ts) - Survives container restarts - Database Save with Retry + Verification - 3 retries with exponential backoff
- Orphan Position Detection - Runs on EVERY container startup
- Critical Logging in Execute Endpoint - Full trade details for recovery
- Infrastructure (Docker volumes) -
./logs:/app/logs
Real-world validation: Nov 21, 2025 - No database failure occurred, but protection now in place
Pitfall #59: Layer 2 Ghost Detection Causing Duplicate Telegram Notifications (🔴 CRITICAL - Fixed Nov 22, 2025)
Symptom: Trade #8 sent 13 duplicate notifications with compounding P&L ($11.50 → $155.05)
Root Cause: Layer 2 ghost detection (failureCount > 20) didn't check closingInProgress flag
Real Incident (Nov 22, 04:05 CET):
- Actual P&L: +$18.79, Database final: $155.05 (8.2× actual)
- Rate limit storm: 6,581 failed close attempts
Fix Applied:
// AFTER (FIXED):
if (trade.priceCheckCount > 20 && !trade.closingInProgress) {
if (!position || Math.abs(position.size) < 0.01) {
trade.closingInProgress = true
trade.closeConfirmedAt = Date.now()
await this.handleExternalClosure(trade, 'Layer 2: Ghost detected')
return
}
}
Commit: b19f156
Pitfall #60: Stale Array Snapshot in Monitoring Loop (🔴 CRITICAL - Fixed Nov 23, 2025)
Symptom: Manual closure sends duplicate "POSITION CLOSED" Telegram notifications
Root Cause: Position Manager creates array snapshot before async processing
Real Incident: Two identical notifications for cmibdii4k0004pe07nzfmturo
Fix Applied:
private async checkTradeConditions(trade: ActiveTrade, currentPrice: number): Promise<void> {
// CRITICAL FIX: Check if trade still in monitoring
if (!this.activeTrades.has(trade.id)) {
console.log(`⏭️ Skipping ${trade.symbol} - already removed from monitoring`)
return
}
// ... rest of function
}
Commit: a7c5930
Pitfall #61: P&L Compounding STILL Happening Despite All Guards (🔴 CRITICAL - Under Investigation Nov 24, 2025)
Symptom: Trade showed $974.05 P&L when actual was $72.41 (13.4× inflation)
Evidence: 14 duplicate Telegram notifications with compounding P&L
Status: All existing guards in place, yet duplicates still occurred
Interim Fix: Manual P&L correction, container restart with enhanced closingInProgress flag
Investigation Needed:
- Serialization lock around external closure detection
- Unique transaction ID to prevent duplicate DB updates
- Telegram notification deduplication
Commit: 0466295
Pitfall #62: Adaptive Leverage and Quality Bypass (🔴 CRITICAL - Fixed Nov 24-27, 2025)
Two related bugs:
Bug 1: Adaptive leverage not working (Nov 24)
USE_ADAPTIVE_LEVERAGEENV variable not set in .env- Quality 90 trade used 15x instead of intended 10x
Bug 2: Execute endpoint bypassing quality threshold (Nov 27)
- Bot executed trades at quality 30, 50, 50 when minimum is 90/95
- Execute endpoint calculated quality but never validated it
Fix Applied (Nov 27):
if (qualityResult.score < minQualityScore) {
console.log(`❌ QUALITY TOO LOW: ${qualityResult.score} < ${minQualityScore} threshold`)
return NextResponse.json({
success: false,
error: 'Quality score too low',
}, { status: 400 })
}
console.log(`✅ Quality check passed: ${qualityResult.score} >= ${minQualityScore}`)
Commit: cefa3e6
Pitfall #63: Smart Entry Validation System (⚠️ HIGH - Deployed Nov 30, 2025)
Purpose: Recover profits from marginal quality signals (50-89)
Implementation: lib/trading/smart-validation-queue.ts (330+ lines)
Threshold Results (Dec 1, 2025):
- ±0.3%: 28/200 entries (14%), 67.9% WR, +4.73% total ✅
- ±0.2%: 51/200 entries (26%), 43.1% WR, -18.49% total
- ±0.15%: 73/200 entries (36%), 35.6% WR, -38.27% total
Commit: 7c9cfba
Pitfall #64: EPYC Cluster SSH Timeout (🔴 CRITICAL - Fixed Dec 1, 2025)
Symptom: Coordinator reports "SSH command timed out for v9_chunk_000002 on worker1"
Root Cause: 30-second subprocess timeout insufficient for nested SSH hop (master → worker1 → worker2)
Fix Applied:
ssh_opts = "-o StrictHostKeyChecking=no -o ConnectTimeout=10 -o ServerAliveInterval=5"
result = subprocess.run(ssh_cmd, timeout=60) # Increased from 30s to 60s
Commit: ef371a1
Lesson Learned: Nested SSH hops need 2× minimum timeout. Latency compounds at each hop.
Pitfall #65: Distributed Worker Quality Filter - Dict vs Callable (🔴 CRITICAL - Fixed Dec 1, 2025)
Symptom: ALL 2,096 distributed backtests returned 0 trades
Root Cause: Passed dict {'min_adx': 15, 'min_volume_ratio': vol_min} instead of lambda function
Error: 'dict' object is not callable
Fix Applied:
# BEFORE (BROKEN):
quality_filter = {'min_adx': 15, 'min_volume_ratio': vol_min}
# AFTER (FIXED):
if vol_min > 0:
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
else:
quality_filter = None
Commit: 11a0ea3
Lesson Learned: Silent failures more dangerous than crashes. Exception handler hid severity by returning zeros.
Pitfall #66: Smart Entry Wrong Price Display (🔴 CRITICAL - Fixed Dec 1, 2025)
Symptom: Abandonment notifications showing impossible prices ($126 → $98 = -22% in 30 seconds)
Root Cause: Symbol format mismatch between validation queue ("SOLUSDT") and market data cache ("SOL-PERP")
Real Incident: Cache lookup marketDataCache.get("SOLUSDT") returned null
Fix Applied:
// Normalize symbol before validation queue
const normalizedSymbol = normalizeTradingViewSymbol(body.symbol)
const queued = await validationQueue.addSignal({
symbol: normalizedSymbol, // Use normalized format for cache lookup
// ...
})
Commit: 6cec2e8
Pitfall #67: Ghost Detection Race Condition (🔴 CRITICAL - Fixed Dec 2, 2025)
Symptom: 23 duplicate "POSITION CLOSED" notifications with P&L compounding (-$47.96 to -$1,129.24)
Root Cause: Race condition in ghost detection - check Map.has() happened AFTER function entry
Real Incident (Dec 2, 17:20 CET):
- Expected P&L: ~-$48
- Actual: 23 notifications with compounding P&L
Fix Applied: Use Map.delete() atomic return value as deduplication lock:
// FIXED CODE:
async handleExternalClosure(trade: ActiveTrade, reason: string) {
const tradeId = trade.id
// ✅ Delete IMMEDIATELY - atomic operation
if (!this.activeTrades.delete(tradeId)) {
console.log('DUPLICATE PREVENTED (atomic lock)')
return
}
// ONLY first caller reaches here
// ... rest of cleanup
}
Commit: 93dd950
Lesson Learned: When async handler can be called by multiple code paths simultaneously, use atomic operations (like Map.delete()) as locks at function entry.
Pitfall #68: Smart Entry Using Webhook Percentage as Signal Price (🔴 CRITICAL - Fixed Dec 3, 2025)
Symptom: $89 position sizes, 97% pullback calculations, impossible entry conditions
Root Cause: TradingView webhook signal.price contained percentage (70.80) instead of market price ($142.50)
Real Incident: Smart Entry log showed "97.4% pullback required" (impossible)
Fix Applied:
// Use Pyth current price instead of webhook signal price
const pythPrice = await pythClient.getPrice(symbol)
const signalPrice = pythPrice.price // ✅ Use actual market price
Commit: 7d0d38a
Lesson Learned: Never trust webhook data for calculations. Use authoritative price sources (Pyth, Drift).
Pitfall #69: Direction-Specific Leverage Thresholds Not Explicit (🟡 MEDIUM - Fixed Dec 3, 2025)
Symptom: Leverage code checked quality score without explicit direction context
Root Cause: Code pattern was ambiguous about which direction's threshold applied
Fix Applied: Made direction-specific thresholds explicit:
if (body.direction === 'LONG') {
if (qualityResult.score >= 90) leverage = 5
// ...
} else { // SHORT
if (qualityResult.score >= 90) leverage = 5 // Same as LONG but explicit
// ...
}
Commit: 58f812f
Pitfall #70: Smart Validation Queue Rejected by Execute Endpoint (🔴 CRITICAL - Fixed Dec 3, 2025)
Symptom: Quality 50-89 signals validated by queue get rejected with "Quality score too low"
Root Cause: Execute endpoint applies quality threshold check AFTER validation queue confirmed price action
Fix Applied:
const isValidatedEntry = body.validatedEntry === true
if (isValidatedEntry) {
console.log(`✅ VALIDATED ENTRY BYPASS: Quality ${qualityResult.score} accepted`)
}
// Only apply quality threshold if NOT a validated entry
if (!isValidatedEntry && qualityResult.score < minQualityScore) {
return NextResponse.json({ error: 'Quality too low' }, { status: 400 })
}
Commit: 785b09e
Pitfall #71: Revenge System Missing External Closure Integration (🔴 CRITICAL - Fixed Dec 3, 2025)
Symptom: High-quality signals (85+) stopped by external closures don't trigger revenge window
Root Cause: Revenge eligibility check only existed in executeExit() path, not handleExternalClosure()
Real Incident (Nov 20): Quality 90 SHORT at $141.37, stopped at $142.48 (-$138.35), price dropped to $131.32 (+$490 opportunity missed)
Fix Applied:
// In external closure handler:
if (exitReason === 'SL' && trade.signalQualityScore && trade.signalQualityScore >= 85) {
console.log(`🎯 External SL closure - Quality ${trade.signalQualityScore} >= 85`)
await stopHuntTracker.recordStopHunt({
originalTradeId: trade.id,
symbol: trade.symbol,
direction: trade.direction,
stopHuntPrice: currentPrice,
originalEntryPrice: trade.entryPrice,
originalQualityScore: trade.signalQualityScore,
stopLossAmount: Math.abs(totalRealizedPnL)
})
console.log(`✅ Revenge window activated for external closure (30min monitoring)`)
}
Commit: 785b09e
Pitfall #72: Telegram Webhook Conflicts with Polling Bot (🔴 CRITICAL - Fixed Dec 4, 2025)
Symptom: Python Telegram bot crashes with "Conflict: can't use getUpdates method while webhook is active"
Root Cause: n8n had active Telegram webhook that intercepted ALL messages before Python bot
Real Incident: /status command returned n8n test message with broken template syntax
Fix Applied:
# Delete Telegram webhook
curl -s "https://api.telegram.org/bot{TOKEN}/deleteWebhook"
# Restart Python bot
docker restart telegram-trade-bot
Architecture Decision: Cannot run both n8n webhook AND Python polling bot simultaneously. Choose one.
Appendix: Pattern Recognition
Common Root Causes
- Race Conditions: Multiple code paths detecting same event (P&L compounding bugs #48, #49, #59, #60, #67)
- Unit Mismatches: Tokens vs USD, dollars vs percentages (#24, #54)
- Symbol Format: TradingView ("SOLUSDT") vs Drift ("SOL-PERP") (#5, #66)
- Deployment Verification: Declaring "fixed" without checking container timestamp (#31)
- SDK Behavior: Documentation doesn't match reality (#2, #24, #45)
- Async Timing: Operations completing out of expected order (#13, #28, #60)
Prevention Strategies
- Use atomic operations for state changes (Map.delete() returns boolean)
- Always normalize symbols at integration boundaries
- Verify deployment with container timestamp vs commit time
- Never mutate shared state during calculation phases
- Add explicit checks in ALL code paths, not just happy path
- Test with real infrastructure before trusting provider claims
Cross-Reference Index
- See Also:
.github/copilot-instructions.md- Main AI agent instructions with Top 10 Critical Pitfalls - Related:
docs/bugs/- Additional bug documentation - Related:
docs/architecture/- System design context
Last Updated: December 4, 2025
Maintainer: AI Agent team following "NOTHING gets lost" principle