trading_bot_v4/docs/COMMON_PITFALLS.md

# Common Pitfalls Reference Documentation

> **Last Updated:** December 4, 2025
> **Total Documented:** 72 Pitfalls
> **Primary Source:** `.github/copilot-instructions.md`

## Purpose

This document is the **comprehensive reference** for all documented pitfalls, bugs, and lessons learned from the Trading Bot v4 project. Each entry represents a real incident that caused financial loss, system instability, or operational issues.

**How to Use This Document:**
1. **Before making changes:** Search for related pitfalls to avoid repeating mistakes
2. **When debugging:** Look for symptoms matching your issue
3. **After fixing bugs:** Add new entries to preserve institutional knowledge
4. **Code review:** Verify changes don't reintroduce known issues

**Severity Levels:**
- 🔴 **CRITICAL** - Financial loss, data corruption, or system failure
- ⚠️ **HIGH** - System stability or significant operational impact
- 🟡 **MEDIUM** - Performance degradation or UX issues
- 🔵 **LOW** - Code quality or minor improvements

---

## Quick Reference Table

| # | Severity | Category | Date | Summary |
|---|----------|----------|------|---------|
| 1 | 🔴 CRITICAL | SDK/Memory | Nov 15, 2025 | Drift SDK memory leak - heap OOM after 10+ hours |
| 2 | 🔴 CRITICAL | RPC/Infrastructure | Nov 14, 2025 | Wrong RPC provider (Alchemy) breaks Drift SDK |
| 3 | 🟡 MEDIUM | Build/Docker | - | Prisma not generated in Docker |
| 4 | 🟡 MEDIUM | Configuration | - | Wrong DATABASE_URL for container vs host |
| 5 | 🟡 MEDIUM | Data/Symbols | - | Symbol format mismatch (TradingView → Drift) |
| 6 | ⚠️ HIGH | Orders | - | Missing reduce-only flag on exit orders |
| 7 | 🟡 MEDIUM | Architecture | - | Singleton violations (DriftClient, Position Manager) |
| 8 | 🟡 MEDIUM | Types/Prisma | - | Type errors with Prisma after generate |
| 9 | 🟡 MEDIUM | Code Quality | - | Quality score duplication in check-risk and execute |
| 10 | ⚠️ HIGH | Configuration | - | TP2-as-Runner configuration confusion |
| 11 | 🔴 CRITICAL | P&L Calculation | - | P&L calculation using SDK values incorrectly |
| 12 | 🔴 CRITICAL | Transactions | - | Transaction confirmation missing (phantom trades) |
| 13 | ⚠️ HIGH | Execution Order | - | Execution order matters (Position Manager before DB) |
| 14 | ⚠️ HIGH | Timing | - | New trade grace period (30s for Drift propagation) |
| 15 | 🟡 MEDIUM | SDK/Drift | - | Drift minimum position sizes differ from docs |
| 16 | 🔴 CRITICAL | Exit Logic | - | Exit reason detection bug (using current price) |
| 17 | 🟡 MEDIUM | Cooldown | - | Per-symbol cooldown, not global |
| 18 | ⚠️ HIGH | Quality Scoring | - | Timeframe-aware scoring crucial for 5min |
| 19 | 🔴 CRITICAL | Trading Logic | - | Price position chasing causes flip-flops |
| 20 | 🟡 MEDIUM | TradingView | - | TradingView ADX minimum for 5min charts |
| 21 | 🟡 MEDIUM | Types/Prisma | - | Prisma Decimal type handling in raw SQL |
| 22 | 🔴 CRITICAL | Trailing Stop | Nov 11, 2025 | ATR-based trailing stop implementation bug |
| 23 | 🟡 MEDIUM | Database Schema | - | CreateTradeParams interface sync required |
| 24 | 🔴 CRITICAL | SDK/Units | Nov 12, 2025 | Position.size returns tokens not USD |
| 25 | 🟡 MEDIUM | Display | Nov 12, 2025 | Leverage display showing global instead of symbol-specific |
| 26 | 🟡 MEDIUM | Tracking | Nov 12, 2025 | Indicator version tracking (v5→v6→v7→v8) |
| 27 | 🔴 CRITICAL | Race Condition | Nov 15, 2025 | Runner stop loss gap - no protection between TP1 and TP2 |
| 28 | 🔴 CRITICAL | Race Condition | Nov 12, 2025 | External closure duplicate updates bug |
| 29 | 🔴 CRITICAL | Database | Nov 13, 2025 | Database-First Pattern required |
| 30 | ⚠️ HIGH | Network | Nov 13, 2025 | DNS retry logic needed |
| 31 | 🔴 CRITICAL | Deployment | Nov 13, 2025 | Declaring fixes "working" before deployment |
| 32 | 🔴 CRITICAL | Workflow | Nov 14, 2025 | Phantom trade notification workflow breaks |
| 33 | 🔴 CRITICAL | Data Integrity | Nov 15, 2025 | Wrong entry price after orphaned position restoration |
| 34 | 🔴 CRITICAL | Monitoring | Nov 15, 2025 | Runner stop loss gap (duplicate of #27) |
| 35 | 🔴 CRITICAL | Database | Nov 15, 2025 | Phantom trades need exitReason for cleanup |
| 36 | 🔴 CRITICAL | Rate Limits | Nov 15, 2025 | closePosition() missing retry logic causes rate limit storm |
| 37 | 🔴 CRITICAL | Ghost Positions | Nov 15, 2025 | Ghost position accumulation from failed DB updates |
| 38 | 🟡 MEDIUM | Display | Nov 15, 2025 | Analytics dashboard showing original position size |
| 39 | 🔴 CRITICAL | Permissions | Nov 15, 2025 | Settings UI permission error (.env not writable) |
| 40 | 🔴 CRITICAL | Ghost Positions | Nov 15-16, 2025 | Ghost position death spiral from skipped validation |
| 41 | 🔴 CRITICAL | P&L Calculation | Nov 19, 2025 | Stats API recalculating P&L incorrectly for TP1+runner |
| 42 | 🟡 MEDIUM | Notifications | Nov 16, 2025 | Missing Telegram notifications for position closures |
| 43 | 🔴 CRITICAL | Trailing Stop | Nov 20, 2025 | Runner trailing stop never activates after TP1 |
| 44 | ⚠️ HIGH | DNS | Nov 16, 2025 | Telegram bot DNS resolution failures |
| 45 | 🔴 CRITICAL | SDK/Drift | Nov 16, 2025 | Drift SDK position.entryPrice recalculates after partial closes |
| 46 | 🔴 CRITICAL | Leverage | Nov 16, 2025 | Drift account leverage must be set in UI, not API |
| 47 | 🔴 CRITICAL | Verification | Nov 16, 2025 | Position close verification gap - 6 hours unmonitored |
| 48 | 🔴 CRITICAL | P&L Compounding | Nov 16, 2025 | P&L compounding during close verification |
| 49 | 🔴 CRITICAL | P&L Compounding | Nov 17, 2025 | P&L exponential compounding in external closure detection |
| 50 | 🔴 CRITICAL | Database | Nov 19, 2025 | Database not tracking trades despite successful Drift executions |
| 51 | 🔴 CRITICAL | Detection | Nov 19, 2025 | TP1 detection fails when on-chain orders fill fast |
| 52 | 🔴 CRITICAL | Exit Logic | Nov 19, 2025 | ADX-based runner SL only applied in one code path |
| 53 | 🔴 CRITICAL | Container | Nov 19, 2025 | Container restart kills positions + phantom detection bug |
| 54 | 🔴 CRITICAL | Data Integrity | Nov 23, 2025 | MFE/MAE storing dollars instead of percentages |
| 55 | 🔴 CRITICAL | Configuration | Nov 19-20, 2025 | Settings UI quality score variable name mismatch / BlockedSignalTracker using wrong price source |
| 56 | 🔴 CRITICAL | Ghost Orders | Nov 20-21, 2025 | Ghost orders after external closures + false order count bug |
| 57 | 🔴 CRITICAL | P&L Calculation | Nov 20, 2025 | P&L calculation inaccuracy for external closures |
| 58 | ⚠️ HIGH | Database | Nov 21, 2025 | 5-Layer Database Protection System implemented |
| 59 | 🔴 CRITICAL | Duplicates | Nov 22, 2025 | Layer 2 ghost detection causing duplicate Telegram notifications |
| 60 | 🔴 CRITICAL | Race Condition | Nov 23, 2025 | Stale array snapshot in monitoring loop causes duplicate processing |
| 61 | 🔴 CRITICAL | P&L Compounding | Nov 24, 2025 | P&L compounding STILL happening despite all guards |
| 62 | 🔴 CRITICAL | Quality Check | Nov 24-27, 2025 | Adaptive leverage not working / Execute endpoint bypassing quality threshold |
| 63 | ⚠️ HIGH | Feature | Nov 30, 2025 | Smart Entry Validation System - Block & Watch deployed |
| 64 | 🔴 CRITICAL | Cluster | Dec 1, 2025 | EPYC Cluster SSH Timeout - nested hop requires longer timeouts |
| 65 | 🔴 CRITICAL | Cluster | Dec 1, 2025 | Distributed Worker Quality Filter - dict vs callable |
| 66 | 🔴 CRITICAL | Smart Entry | Dec 1, 2025 | Smart Entry Validation Queue wrong price display |
| 67 | 🔴 CRITICAL | Race Condition | Dec 2, 2025 | Ghost detection race condition causing duplicate notifications with P&L compounding |
| 68 | 🔴 CRITICAL | Smart Entry | Dec 3, 2025 | Smart Entry using webhook percentage as signal price |
| 69 | 🟡 MEDIUM | Configuration | Dec 3, 2025 | Direction-specific leverage thresholds not explicit in code |
| 70 | 🔴 CRITICAL | Smart Entry | Dec 3, 2025 | Smart Validation Queue rejected by execute endpoint |
| 71 | 🔴 CRITICAL | Revenge System | Dec 3, 2025 | Revenge system missing external closure integration |
| 72 | 🔴 CRITICAL | Telegram | Dec 4, 2025 | Telegram webhook conflicts with polling bot |

---

## Category Index

### 🔴 P&L Calculation Errors
- [#11](#pitfall-11-pl-calculation-critical) - P&L calculation using SDK values incorrectly
- [#41](#pitfall-41-stats-api-recalculating-pl-incorrectly-critical---fixed-nov-19-2025) - Stats API recalculating P&L incorrectly
- [#48](#pitfall-48-pl-compounding-during-close-verification-critical---fixed-nov-16-2025) - P&L compounding during close verification
- [#49](#pitfall-49-pl-exponential-compounding-in-external-closure-detection-critical---fixed-nov-17-2025) - P&L exponential compounding
- [#54](#pitfall-54-mfemae-storing-dollars-instead-of-percentages-critical---fixed-nov-23-2025) - MFE/MAE storing dollars instead of percentages
- [#57](#pitfall-57-pl-calculation-inaccuracy-for-external-closures-critical---fixed-nov-20-2025) - P&L calculation inaccuracy for external closures
- [#61](#pitfall-61-pl-compounding-still-happening-despite-all-guards-critical---under-investigation-nov-24-2025) - P&L compounding STILL happening

### 🔴 Race Conditions & Duplicates
- [#27](#pitfall-27-runner-stop-loss-gap---no-protection-between-tp1-and-tp2-critical---fixed-nov-15-2025) - Runner stop loss gap - no protection between TP1 and TP2
- [#28](#pitfall-28-external-closure-duplicate-updates-bug-critical---fixed-nov-12-2025) - External closure duplicate updates
- [#59](#pitfall-59-layer-2-ghost-detection-causing-duplicate-telegram-notifications-critical---fixed-nov-22-2025) - Layer 2 ghost detection duplicates
- [#60](#pitfall-60-stale-array-snapshot-in-monitoring-loop-critical---fixed-nov-23-2025) - Stale array snapshot duplicates
- [#67](#pitfall-67-ghost-detection-race-condition-critical---fixed-dec-2-2025) - Ghost detection race condition

### 🔴 SDK/API Integration
- [#1](#pitfall-1-drift-sdk-memory-leak-critical---fixed-nov-15-2025) - Drift SDK memory leak
- [#2](#pitfall-2-wrong-rpc-provider-critical---investigation-complete-nov-14-2025) - Wrong RPC provider (Alchemy)
- [#12](#pitfall-12-transaction-confirmation-critical) - Transaction confirmation missing
- [#24](#pitfall-24-positionsize-tokens-vs-usd-bug-critical---fixed-nov-12-2025) - Position.size tokens vs USD
- [#36](#pitfall-36-closeposition-missing-retry-logic-critical---fixed-nov-15-2025) - closePosition() missing retry logic
- [#45](#pitfall-45-drift-sdk-positionentryprice-recalculates-critical---fixed-nov-16-2025) - position.entryPrice recalculates after partial closes

### 🔴 Database Operations
- [#29](#pitfall-29-database-first-pattern-critical---fixed-nov-13-2025) - Database-First Pattern required
- [#35](#pitfall-35-phantom-trades-need-exitreason-critical---fixed-nov-15-2025) - Phantom trades need exitReason
- [#37](#pitfall-37-ghost-position-accumulation-critical---fixed-nov-15-2025) - Ghost position accumulation
- [#50](#pitfall-50-database-not-tracking-trades-resolved---nov-19-2025) - Database not tracking trades
- [#58](#pitfall-58-5-layer-database-protection-system-implemented---nov-21-2025) - 5-Layer Database Protection System

### 🔴 Configuration & Settings
- [#55](#pitfall-55-configuration-issues-critical---fixed-nov-19-20-2025) - Settings UI quality score variable name mismatch
- [#62](#pitfall-62-adaptive-leverage-and-quality-bypass-critical---fixed-nov-24-27-2025) - Adaptive leverage / Execute endpoint bypassing quality threshold

### 🔴 Deployment & Verification
- [#31](#pitfall-31-declaring-fixes-working-before-deployment-critical---nov-13-2025) - Declaring fixes "working" before deployment
- [#47](#pitfall-47-position-close-verification-gap-critical---fixed-nov-16-2025) - Position close verification gap - 6 hours unmonitored

### 🔴 Smart Entry & Validation
- [#63](#pitfall-63-smart-entry-validation-system-deployed---nov-30-2025) - Smart Entry Validation System
- [#66](#pitfall-66-smart-entry-wrong-price-display-critical---fixed-dec-1-2025) - Smart Entry wrong price display
- [#68](#pitfall-68-smart-entry-using-webhook-percentage-critical---fixed-dec-3-2025) - Smart Entry using webhook percentage
- [#70](#pitfall-70-smart-validation-queue-rejected-critical---fixed-dec-3-2025) - Smart Validation Queue rejected by execute

### ⚠️ Ghost Positions & Orders
- [#40](#pitfall-40-ghost-position-death-spiral-critical---fixed-nov-15-16-2025) - Ghost position death spiral
- [#56](#pitfall-56-ghost-orders-after-external-closures-critical---fixed-nov-20-21-2025) - Ghost orders after external closures

### ⚠️ Network & Infrastructure
- [#30](#pitfall-30-dns-retry-logic-high---nov-13-2025) - DNS retry logic
- [#44](#pitfall-44-telegram-bot-dns-resolution-high---fixed-nov-16-2025) - Telegram bot DNS resolution
- [#64](#pitfall-64-epyc-cluster-ssh-timeout-critical---fixed-dec-1-2025) - EPYC Cluster SSH timeout
- [#65](#pitfall-65-distributed-worker-quality-filter-critical---fixed-dec-1-2025) - Distributed Worker dict vs callable

### ⚠️ Trailing Stop & Exit Logic
- [#22](#pitfall-22-atr-based-trailing-stop-implementation-critical---nov-11-2025) - ATR-based trailing stop implementation
- [#43](#pitfall-43-runner-trailing-stop-never-activates-critical---fixed-nov-20-2025) - Runner trailing stop never activates
- [#51](#pitfall-51-tp1-detection-fails-critical---fixed-nov-19-2025) - TP1 detection fails on-chain
- [#52](#pitfall-52-adx-based-runner-sl-critical---fixed-nov-19-2025) - ADX-based runner SL one code path

---

## Detailed Pitfall Entries


### Pitfall #1: Drift SDK Memory Leak (🔴 CRITICAL - Fixed Nov 15, 2025, Enhanced Nov 24, 2025)

**Symptom:** JavaScript heap out of memory after 10+ hours runtime, Telegram bot timeouts (60s)

**Root Cause:** Drift SDK accumulates WebSocket subscriptions over time without cleanup

**Real Incident:**
- Thousands of `accountUnsubscribe error: readyState was 2 (CLOSING)` in logs
- Heap growth: Normal ~200MB → 4GB+ after 10 hours → OOM crash

**Impact:** System crashes after extended uptime, requires manual container restart

**Fix Applied:**
- **File:** `lib/monitoring/drift-health-monitor.ts`
- **Implementation:** Smart error-based health monitoring replaces blind timer
  - `interceptWebSocketErrors()` patches console.error to catch SDK WebSocket errors
  - 30-second sliding window: Only restarts if 50+ errors in 30 seconds
  - Container restart via flag: Writes `/tmp/trading-bot-restart.flag` for watch-restart.sh
- **API:** `GET /api/drift/health` - Check error count and health status
- **Commit:** Enhanced Nov 24, 2025

**Code Reference:**
```typescript
// lib/monitoring/drift-health-monitor.ts
interceptWebSocketErrors()  // Patches console.error
if (errorsInWindow > 50) {
  writeRestartFlag()  // Triggers container restart
}
```

**Prevention:** Monitor for `🏥 Drift health monitor started` and error threshold logs

**Lesson Learned:** Smart, reactive monitoring is better than blind timers. Only restart when actual problems occur, not on a schedule.

---

### Pitfall #2: Wrong RPC Provider (🔴 CRITICAL - Investigation Complete Nov 14, 2025)

**Symptom:** Trades fail, duplicate closes, Position Manager loses tracking, database save failures

**Root Cause:** Alchemy's rate limiting breaks Drift SDK's burst subscription pattern during initialization

**Real Incident (Nov 14, 21:14 CET):**
- Created diagnostic endpoint `/api/testing/drift-init`
- Alchemy: 17-71 subscription errors EVERY init (49 avg over 5 runs), 1644ms avg init time
- Helius: 0 subscription errors EVERY init, 800ms avg init time

**Impact:** Complete system failure when using wrong RPC provider

**Why Alchemy Fails:**
- Drift SDK subscribes to 30-50+ accounts simultaneously during init (burst pattern)
- Alchemy's CUPS enforcement rate limits these burst requests
- Drift SDK does NOT retry failed subscriptions
- SDK reports "initialized successfully" but with incomplete subscription set
- Error: `"Received JSON-RPC error calling accountSubscribe"`

**Fix Applied:**
- **Use Helius RPC** (https://mainnet.helius-rpc.com/?api-key=...)
- Retry logic: 5s exponential backoff for rate limits
- **Documentation:** `docs/ALCHEMY_RPC_INVESTIGATION_RESULTS.md`

**Code Reference:**
```bash
# Test yourself
curl 'http://localhost:3001/api/testing/drift-init?rpc=alchemy'
```

**Prevention:** ALWAYS use Helius RPC. Do not use Alchemy for Drift SDK.

**Lesson Learned:** Documentation doesn't always reflect reality. Test with real infrastructure before trusting provider claims.

---

### Pitfall #3: Prisma Not Generated in Docker (🟡 MEDIUM)

**Symptom:** Build fails with Prisma client errors

**Root Cause:** Must run `npx prisma generate` in Dockerfile BEFORE `npm run build`

**Fix Applied:** Add `RUN npx prisma generate` before build step in Dockerfile

---

### Pitfall #4: Wrong DATABASE_URL (🟡 MEDIUM)

**Symptom:** Database connection failures

**Root Cause:** Container runtime needs `trading-bot-postgres` (container name), Prisma CLI from host needs `localhost:5432`

**Fix Applied:** Use correct hostname based on context:
- Container: `postgresql://postgres:password@trading-bot-postgres:5432/trading_bot_v4`
- Host CLI: `postgresql://postgres:password@localhost:5432/trading_bot_v4`

---

### Pitfall #5: Symbol Format Mismatch (🟡 MEDIUM)

**Symptom:** Drift API rejects orders, symbol not found errors

**Root Cause:** TradingView sends "SOLUSDT" but Drift requires "SOL-PERP"

**Fix Applied:** Always normalize with `normalizeTradingViewSymbol()` before calling Drift
- **File:** `config/trading.ts`
- Applies to ALL endpoints including `/api/trading/close`

---

### Pitfall #6: Missing Reduce-Only Flag (⚠️ HIGH)

**Symptom:** Exit orders accidentally open new positions instead of closing

**Root Cause:** Exit orders without `reduceOnly: true` can open new positions

**Fix Applied:** All TP/SL orders MUST include `reduceOnly: true`

```typescript
const orderParams = {
  reduceOnly: true,  // CRITICAL for TP/SL orders
  // ... other params
}
```

---

### Pitfall #7: Singleton Violations (🟡 MEDIUM)

**Symptom:** Connection issues, state inconsistencies, multiple WebSocket connections

**Root Cause:** Creating multiple DriftClient or Position Manager instances

**Fix Applied:** Always use getter functions:
```typescript
const driftService = await initializeDriftService() // NOT: new DriftService()
const positionManager = getPositionManager()         // NOT: new PositionManager()
const prisma = getPrismaClient()                     // NOT: new PrismaClient()
```

---

### Pitfall #8: Prisma Type Errors (🟡 MEDIUM)

**Symptom:** TypeScript compilation fails with Prisma types

**Root Cause:** Trade type from Prisma only available AFTER `npx prisma generate`

**Fix Applied:** Run `npx prisma generate` after any schema changes

---

### Pitfall #9: Quality Score Duplication (🟡 MEDIUM)

**Symptom:** Inconsistent quality scoring between endpoints

**Root Cause:** Signal quality calculation exists in BOTH `check-risk` and `execute` endpoints

**Fix Applied:** Keep logic synchronized between both endpoints when making changes

---

### Pitfall #10: TP2-as-Runner Configuration (⚠️ HIGH)

**Symptom:** Confusion about runner size and TP2 behavior

**Root Cause:** `takeProfit2SizePercent: 0` means "TP2 activates trailing stop, no position close"

**Fix Applied:**
- `TAKE_PROFIT_2_PERCENT=0.7` sets TP2 trigger price
- `TAKE_PROFIT_2_SIZE_PERCENT` should be 0 for runner system
- Runner = 100% - TAKE_PROFIT_1_SIZE_PERCENT (default 40%)

---

### Pitfall #11: P&L Calculation Critical (🔴 CRITICAL)

**Symptom:** Incorrect P&L values in database and analytics

**Root Cause:** Using SDK values instead of actual entry vs exit price calculation

**Fix Applied:**
```typescript
const profitPercent = this.calculateProfitPercent(trade.entryPrice, exitPrice, trade.direction)
const actualRealizedPnL = (closedSizeUSD * profitPercent) / 100
trade.realizedPnL += actualRealizedPnL  // NOT: result.realizedPnL from SDK
```

---

### Pitfall #12: Transaction Confirmation Critical (🔴 CRITICAL)

**Symptom:** "Phantom trades" - SDK returns signatures for transactions that never execute

**Root Cause:** Both `openPosition()` AND `closePosition()` must call `connection.confirmTransaction()`

**Fix Applied:**
```typescript
const txSig = await driftClient.placePerpOrder(orderParams)
console.log('⏳ Confirming transaction on-chain...')
const connection = driftService.getConnection()
const confirmation = await connection.confirmTransaction(txSig, 'confirmed')

if (confirmation.value.err) {
  throw new Error(`Transaction failed: ${JSON.stringify(confirmation.value.err)}`)
}
console.log('✅ Transaction confirmed on-chain')
```

---

### Pitfall #13: Execution Order Matters (⚠️ HIGH)

**Symptom:** Race conditions where monitoring starts before trade exists in database

**Root Cause:** Position Manager added before database save

**Fix Applied:** Order MUST be:
1. Open position + place exit orders
2. Save to database (`createTrade()`)
3. Add to Position Manager (`positionManager.addTrade()`)

---

### Pitfall #14: New Trade Grace Period (⚠️ HIGH)

**Symptom:** New positions immediately detected as "closed externally" and cancelled

**Root Cause:** Drift positions take 5-10 seconds to propagate after opening

**Fix Applied:** Position Manager skips "external closure" detection for trades <30 seconds old

---

### Pitfall #15: Drift Minimum Position Sizes (🟡 MEDIUM)

**Symptom:** Orders rejected for being too small

**Root Cause:** Actual minimums differ from documentation:
- SOL-PERP: 0.1 SOL (~$5-15)
- ETH-PERP: 0.01 ETH (~$38-40)
- BTC-PERP: 0.0001 BTC (~$10-12)

**Fix Applied:** Calculate `minOrderSize × currentPrice` must exceed Drift's $4 minimum. Add buffer.

---

### Pitfall #16: Exit Reason Detection Bug (🔴 CRITICAL)

**Symptom:** Profitable trades mislabeled as "SL" exits

**Root Cause:** Position Manager using current price to determine exit reason, but on-chain orders filled at different price

**Fix Applied:** Use `trade.tp1Hit` / `trade.tp2Hit` flags and realized P&L to correctly identify exit trigger

---

### Pitfall #17: Per-Symbol Cooldown (🟡 MEDIUM)

**Symptom:** ETH trade incorrectly blocking SOL trade

**Root Cause:** Cooldown was global, not per-symbol

**Fix Applied:** Each coin (SOL/ETH/BTC) has independent cooldown timer via `getLastTradeTimeForSymbol(symbol)`

---

### Pitfall #18: Timeframe-Aware Scoring Crucial (⚠️ HIGH)

**Symptom:** Valid 5min breakouts blocked as "low quality"

**Root Cause:** Signal quality thresholds not adjusted for 5min vs higher timeframes
- 5min: ADX 12-22 healthy, ATR 0.2-0.7%
- Daily: ADX 18-30 healthy, ATR 0.4%+

**Fix Applied:** Always pass `timeframe` parameter from TradingView alerts to `scoreSignalQuality()`

---

### Pitfall #19: Price Position Chasing (🔴 CRITICAL)

**Symptom:** Rapid flip-flop losses

**Root Cause:** Opening longs at 90%+ range or shorts at <10% range

**Real Incident:** Overnight flip-flop losses all had price position 9-94%

**Fix Applied:** Quality scoring now penalizes -15 to -30 points for range extremes

---

### Pitfall #20: TradingView ADX Minimum (🟡 MEDIUM)

**Symptom:** Too many signals blocked or too many low-quality signals passing

**Root Cause:** TradingView ADX filter should be 15 for 5min (not 20+)

**Fix Applied:** Set ADX ≥15 in TradingView alerts for 5min charts. Bot's quality scoring provides second-layer filtering.

---

### Pitfall #21: Prisma Decimal Type Handling (🟡 MEDIUM)

**Symptom:** Frontend errors with `.toFixed()` on undefined

**Root Cause:** Raw SQL queries return Prisma `Decimal` objects, not plain numbers

**Fix Applied:**
```typescript
// Use `any` type for numeric fields in $queryRaw results
const stat: { total_pnl: any } = await prisma.$queryRaw`...`

// Convert with Number() before returning to frontend
totalPnL: Number(stat.total_pnl) || 0
```

---

### Pitfall #22: ATR-Based Trailing Stop Implementation (🔴 CRITICAL - Nov 11, 2025)

**Symptom:** Trades with +7-9% MFE exited for losses

**Root Cause:** Runner system was using FIXED 0.3% trailing instead of ATR-based

**Real Incident:** At $168 SOL, 0.3% = $0.50 wiggle room - too tight

**Fix Applied:**
```typescript
trailingDistancePercent = (atrAtEntry / currentPrice * 100) × trailingStopAtrMultiplier
```

**Configuration:**
- `TRAILING_STOP_ATR_MULTIPLIER=1.5`
- `MIN=0.25%`, `MAX=0.9%`
- `ACTIVATION=0.5%`

**Result:** 0.45% ATR × 1.5 = 0.675% trail ($1.13 vs $0.50 = 2.26x more room)

**Documentation:** `ATR_TRAILING_STOP_FIX.md`

---

### Pitfall #23: CreateTradeParams Interface Sync (🟡 MEDIUM)

**Symptom:** TypeScript build fails when endpoint passes field not in interface

**Root Cause:** New database fields added to Trade model but not to `CreateTradeParams` interface

**Fix Applied:** When adding new fields:
1. Add to interface in `lib/database/trades.ts`
2. Add to Prisma create data object in `createTrade()` function

---

### Pitfall #24: Position.size Tokens vs USD Bug (🔴 CRITICAL - Fixed Nov 12, 2025)

**Symptom:** Position Manager detects false TP1 hits, moves SL to breakeven prematurely

**Root Cause:** `lib/drift/client.ts` returns `position.size` as BASE ASSET TOKENS (12.28 SOL), not USD ($1,950)

**Real Incident:** Comparing tokens (12.28) directly to USD ($1,950) → "99.4% reduction" → FALSE TP1!

**Fix Applied:**
```typescript
// In Position Manager (lines 322, 519, 558, 591)
const positionSizeUSD = Math.abs(position.size) * currentPrice

// Now compare USD to USD
if (positionSizeUSD < trade.currentSize * 0.95) {
  // Actual 5%+ reduction detected
}
```

**Impact:** Without this fix, TP1 never triggers correctly, SL moves at wrong times, runner system fails

---

### Pitfall #25: Leverage Display Bug (🟡 MEDIUM - Fixed Nov 12, 2025)

**Symptom:** Telegram notifications showing "⚡ Leverage: 10x" when actual position uses 15x

**Root Cause:** API response returning `config.leverage` (global default) instead of symbol-specific value

**Fix Applied:**
```typescript
const { size, leverage, enabled } = getPositionSizeForSymbol(driftSymbol, config)
// Return symbol-specific leverage
leverage: leverage,  // NOT: config.leverage
```

---

### Pitfall #26: Indicator Version Tracking (🟡 MEDIUM - Nov 12, 2025+)

**Symptom:** Unable to compare performance between TradingView strategies

**Root Cause:** No tracking of which indicator generated the signal

**Fix Applied:** Database field `indicatorVersion` tracks:
- v5: Buy/Sell Signal (pre-Nov 12)
- v6: HalfTrend + BarColor (Nov 12-18)
- v7: v6 with toggles (deprecated)
- v8: Money Line Sticky Trend (Nov 18+)
- v9: Money Line with Momentum Filter (Nov 26+)

---

### Pitfall #27: Runner Stop Loss Gap - No Protection Between TP1 and TP2 (🔴 CRITICAL - Fixed Nov 15, 2025)

**Symptom:** Runner position remained open despite price moving far past stop loss level

**Root Cause:** Position Manager only checked stop loss BEFORE TP1 (line 877), creating a protection gap

**Real Incident:**
1. SHORT opened, TP1 hit at 70% close (runner = 30% remaining)
2. Runner had stop loss at profit-lock level (+0.5%)
3. Price moved past stop loss → NO CHECK RAN (tp1Hit = true, so SL check skipped)
4. Runner exposed to unlimited loss for hours during TP1→TP2 window

**Fix Applied:**
```typescript
// Added explicit runner stop loss check at line ~881:
if (trade.tp1Hit && !trade.tp2Hit && this.shouldStopLoss(currentPrice, trade)) {
  console.log(`🔴 RUNNER STOP LOSS: ${trade.symbol}`)
  await this.executeExit(trade, 100, 'SL', currentPrice)
  return
}
```

**Lesson Learned:** Every conditional branch in risk management MUST have explicit stop loss checks - never assume "it'll get caught somewhere"

---

### Pitfall #28: External Closure Duplicate Updates Bug (<28><> CRITICAL - Fixed Nov 12, 2025)

**Symptom:** Trades showing 7-8x larger losses than actual ($58 loss when Drift shows $7 loss)

**Root Cause:** Position Manager monitoring loop re-processes external closures multiple times before trade removed from activeTrades Map

**Real Incident:**
1. Trade closed externally at -$7.98
2. Position Manager detects closure, calculates P&L → -$7.50 in DB
3. Trade still in Map (removal async), loop runs again
4. Accumulates P&L: -$7.50 + -$7.50 = -$15.00
5. Repeats 8 times → final -$58.43

**Fix Applied:**
```typescript
// BEFORE (BROKEN):
await updateTradeExit({ ... })
await this.removeTrade(trade.id)  // Too late!

// AFTER (FIXED):
this.activeTrades.delete(trade.id)  // Remove FIRST
await updateTradeExit({ ... })      // Then update DB
```

**Commit:** Fixed Nov 12, 2025

---

### Pitfall #29: Database-First Pattern (🔴 CRITICAL - Fixed Nov 13, 2025)

**Symptom:** Positions opened on Drift with NO database record, NO Position Manager tracking, NO TP/SL protection

**Root Cause:** Execute endpoint saved to database AFTER adding to Position Manager, with silent error catch

**Real Incident:** Unprotected position opened, database save failed silently, Position Manager never tracked it

**Fix Applied:**
```typescript
// CRITICAL: Save to database FIRST before adding to Position Manager
try {
  await createTrade({...})
} catch (dbError) {
  console.error('❌ CRITICAL: Failed to save trade to database:', dbError)
  return NextResponse.json({
    success: false,
    error: 'Database save failed - position unprotected',
    message: `CLOSE POSITION MANUALLY IMMEDIATELY. Transaction: ${openResult.transactionSignature}`,
  }, { status: 500 })
}

// ONLY add to Position Manager if database save succeeded
await positionManager.addTrade(activeTrade)
```

**Documentation:** `CRITICAL_INCIDENT_UNPROTECTED_POSITION.md`

---

### Pitfall #30: DNS Retry Logic (⚠️ HIGH - Nov 13, 2025)

**Symptom:** Trading bot fails with "fetch failed" errors when DNS resolution temporarily fails

**Root Cause:** `EAI_AGAIN` errors are transient DNS issues that resolve in seconds

**Fix Applied:** Automatic retry in `lib/drift/client.ts`:
```typescript
// Detects: fetch failed, EAI_AGAIN, ENOTFOUND, ETIMEDOUT
// Retries up to 3 times with 2s delay
await this.retryOperation(async () => {
  // Initialize Drift SDK, subscribe, get user account
}, 3, 2000, 'Drift initialization')
```

**Documentation:** `docs/DNS_RETRY_LOGIC.md`

---

### Pitfall #31: Declaring Fixes "Working" Before Deployment (🔴 CRITICAL - Nov 13, 2025)

**Symptom:** AI says "position is protected" when container still running old code

**Root Cause:** Conflating "code committed to git" with "code running in production"

**Real Incident:** Fix committed 15:56, declared "working" at 19:42, but container started 15:06 (old code)

**Verification Required:**
```bash
# ALWAYS check before declaring fix deployed:
docker logs trading-bot-v4 | grep "Server starting" | head -1
# Compare container start time to git commit timestamp
# If container older: FIX NOT DEPLOYED
```

**Rule:** NEVER say "fixed", "working", "protected", or "deployed" without verifying container restart timestamp

---

### Pitfall #32: Phantom Trade Notification Workflow Breaks (🔴 CRITICAL - Nov 14, 2025)

**Symptom:** Phantom trade detected, position opened, but n8n workflow stops. User NOT notified.

**Root Cause:** Execute endpoint returned HTTP 500 when phantom detected, causing n8n chain to halt

**Fix Applied:** Auto-close phantom trades immediately + return HTTP 200 with warning:
```typescript
return NextResponse.json({
  success: true,
  warning: 'Phantom trade detected and auto-closed',
  isPhantom: true,
  message: '[Full notification text]',
  phantomDetails: {...}
})
```

**Database tracking:** `status='phantom'`, `exitReason='manual'`

---

### Pitfall #33: Wrong Entry Price After Orphaned Position Restoration (🔴 CRITICAL - Fixed Nov 15, 2025)

**Symptom:** Position Manager tracking wrong entry price after container restart

**Root Cause:** Startup validation restored orphaned position using OLD database entry price instead of querying Drift

**Real Incident:** DB showed $141.51, Drift showed $141.31 actual entry → 0.14% SL placement error

**Fix Applied:** Query Drift SDK for actual entry price during orphaned position restoration:
```typescript
await prisma.trade.update({
  data: {
    entryPrice: position.entryPrice, // CRITICAL: Use Drift's actual entry price
    positionSizeUSD: positionSizeUSD,
  }
})
```

---

### Pitfall #35: Phantom Trades Need exitReason (🔴 CRITICAL - Fixed Nov 15, 2025)

**Symptom:** Position Manager keeps restoring phantom trade on every restart

**Root Cause:** Phantom auto-closure sets `status='phantom'` but leaves `exitReason=NULL`

**Real Incident:** Phantom trade caused 232% size mismatch, hundreds of false alerts

**Fix Applied:** MUST set exitReason when auto-closing phantoms:
```typescript
await updateTradeExit({
  tradeId: trade.id,
  exitPrice: currentPrice,
  exitReason: 'manual', // CRITICAL: Must set exitReason for cleanup
  status: 'phantom'
})
```

---

### Pitfall #36: closePosition() Missing Retry Logic (🔴 CRITICAL - Fixed Nov 15, 2025)

**Symptom:** Position Manager tries to close, gets 429 error, retries EVERY 2 SECONDS → 100+ failed attempts

**Root Cause:** `placeExitOrders()` had retry wrapper but `closePosition()` did NOT

**Real Incident:** 100+ "❌ Failed to close position: 429" + compounding P&L

**Fix Applied:** Wrapped closePosition() with retryWithBackoff():
```typescript
const txSig = await retryWithBackoff(async () => {
  return await driftClient.placePerpOrder(orderParams)
}, 3, 8000) // 8s base delay, 3 max retries (8s → 16s → 32s)
```

---

### Pitfall #37: Ghost Position Accumulation (🔴 CRITICAL - Fixed Nov 15, 2025)

**Symptom:** Position Manager tracking 4+ positions when database shows only 1 open trade

**Root Cause:** Database has `exitReason IS NULL` for positions actually closed on Drift

**Real Incident:** 4+ ghosts → massive rate limiting, "vanishing orders"

**Fix Applied:** Periodic Drift position validation:
```typescript
private scheduleValidation(): void {
  this.validationInterval = setInterval(async () => {
    await this.validatePositions()
  }, 5 * 60 * 1000)
}
```

---

### Pitfall #38: Analytics Dashboard Wrong Size (🟡 MEDIUM - Fixed Nov 15, 2025)

**Symptom:** Analytics page displays $42.54 when actual runner is $12.59 after TP1

**Root Cause:** API returns `trade.positionSizeUSD` (original) not runner size

**Fix Applied:** Check Position Manager state for open positions:
```typescript
const currentSize = configSnapshot?.positionManagerState?.currentSize
const displaySize = trade.exitReason === null && currentSize
  ? currentSize
  : trade.positionSizeUSD
```

---

### Pitfall #40: Ghost Position Death Spiral (🔴 CRITICAL - Fixed Nov 15-16, 2025)

**Symptom:** Container crashes from cascading ghost detection failures

**Root Cause:** Position validation skipped during death spiral recovery, creating more ghosts

**Fix Applied:** Never skip validation during recovery operations

---

### Pitfall #41: Stats API Recalculating P&L Incorrectly (🔴 CRITICAL - Fixed Nov 19, 2025)

**Symptom:** Analytics showing wrong P&L for trades with TP1+runner

**Root Cause:** Stats API recalculating P&L from partial position data

**Fix Applied:** Use stored `realizedPnL` directly, don't recalculate

---

### Pitfall #43: Runner Trailing Stop Never Activates (🔴 CRITICAL - Fixed Nov 20, 2025)

**Symptom:** Runner position sits without trailing stop after TP1

**Root Cause:** Trailing stop activation logic only ran in one code path

**Fix Applied:** Ensure trailing stop activates in all TP1 detection paths

---

### Pitfall #44: Telegram Bot DNS Resolution (⚠️ HIGH - Fixed Nov 16, 2025)

**Symptom:** Telegram notifications fail intermittently

**Root Cause:** DNS resolution failures for api.telegram.org

**Fix Applied:** Retry logic for Telegram API calls

---

### Pitfall #45: Drift SDK position.entryPrice Recalculates (🔴 CRITICAL - Fixed Nov 16, 2025)

**Symptom:** Entry price changes after partial closes

**Root Cause:** Drift SDK calculates `position.entryPrice` from `quoteAssetAmount / baseAssetAmount`

**Impact:** After TP1 closes 75%, remaining 25% has "new" entry price

**Fix Applied:** Store and use original entry price from trade record, not SDK

---

### Pitfall #46: 100% Position Sizing InsufficientCollateral (🔴 CRITICAL - Fixed Nov 16, 2025)

**Symptom:** Bot gets InsufficientCollateral errors when Drift UI can open same size

**Root Cause:** Drift's margin calculation includes fees, slippage buffers

**Real Incident:** $85.55 collateral, bot tries 100% → rejected, shortage: $0.03

**Fix Applied:**
```typescript
if (configuredSize >= 100) {
  percentDecimal = 0.99
  console.log(`⚠️ Applying 99% safety buffer for 100% position`)
}
```

**Commit:** 7129cbf

---

### Pitfall #47: Position Close Verification Gap (🔴 CRITICAL - Fixed Nov 16, 2025)

**Symptom:** Close transaction confirmed, database marked "closed", but position stayed open 6+ hours

**Root Cause:** Transaction confirmation ≠ Drift internal state updated immediately (5-10s delay)

**Real Incident:** Trailing stop triggered 02:51, position stayed open until 08:51 restart

**Fix Applied:** 2-layer verification:
```typescript
if (params.percentToClose === 100) {
  await cancelAllOrders(params.symbol)

  console.log('⏳ Waiting 5s for Drift state to propagate...')
  await new Promise(resolve => setTimeout(resolve, 5000))

  const verifyPosition = await driftService.getPosition(marketConfig.driftMarketIndex)
  if (verifyPosition && Math.abs(verifyPosition.size) >= 0.01) {
    console.error(`🔴 CRITICAL: Close confirmed BUT position still exists!`)
    return { ...result, needsVerification: true }
  }
}
```

**Commit:** c607a66

---

### Pitfall #48: P&L Compounding During Close Verification (🔴 CRITICAL - Fixed Nov 16, 2025)

**Symptom:** P&L accumulates during the 5-10s verification wait

**Root Cause:** Monitoring loop continues during verification, detecting "external closure" multiple times

**Fix Applied:** `closingInProgress` flag:
```typescript
if ((result as any).needsVerification) {
  trade.closingInProgress = true
  trade.closeConfirmedAt = Date.now()
  console.log(`🔒 Marked as closing in progress - external closure detection disabled`)
  return
}

// Skip external closure check if closingInProgress
if ((position === null || position.size === 0) && !trade.closingInProgress) {
  // ... handle external closure
}
```

**Related:** Pitfalls #27, #49

---

### Pitfall #49: P&L Exponential Compounding in External Closure Detection (🔴 CRITICAL - Fixed Nov 17, 2025)

**Symptom:** Database P&L shows 15-20× actual value ($92.46 when Drift shows $6.00)

**Root Cause:** `trade.realizedPnL` was being mutated during each external closure detection cycle

**Real Incident (Nov 17, 13:54 CET):**
- SOL-PERP SHORT closed by on-chain orders
- Actual P&L: ~$6.00, Database recorded: $92.46 (15.4× too high)
- Rate limiting caused 15+ detection cycles → $6 → $12 → $24 → $48 → $96

**Fix Applied:**
```typescript
// DON'T mutate trade.realizedPnL - causes compounding!
// trade.realizedPnL = totalRealizedPnL  ← REMOVED

// Use local variable for DB update
await updateTradeExit({
  realizedPnL: totalRealizedPnL,  // Use local variable
})
```

**Commit:** 6156c0f

**Lesson Learned:** In monitoring loops, NEVER mutate shared state during calculation phases. Calculate locally, update shared state ONCE at the end.

---

### Pitfall #50: Database Not Tracking Trades (🔴 CRITICAL - RESOLVED Nov 19, 2025)

**Symptom:** Drift UI shows 6 trades, database shows only 3 trades

**Root Cause:** P&L compounding bug (#49) - in-memory object with stale/accumulated values

**Fix Applied:** Calculate P&L from immutable source values (entry/exit prices), never from in-memory fields

---

### Pitfall #51: TP1 Detection Fails When On-Chain Orders Fill Fast (🔴 CRITICAL - Fixed Nov 19, 2025)

**Symptom:** TP1 order fills, but database records exitReason as "SL" instead of "TP1"

**Root Cause:** Position Manager detects closure AFTER both TP1 and runner already closed on-chain

**Real Incident:** LONG opened, TP1+runner closed within 7 minutes, `trade.tp1Hit = false`

**Fix Applied:** Simple percentage-based exit reason:
```typescript
if (runnerProfitPercent > 0.3) {
  if (runnerProfitPercent >= 1.2) {
    exitReason = 'TP2'  // Large profit (>1.2%)
  } else {
    exitReason = 'TP1'  // Moderate profit (0.3-1.2%)
  }
} else {
  exitReason = 'SL'  // Negative or tiny profit (<0.3%)
}
```

**Commit:** de57c96

---

### Pitfall #52: ADX-Based Runner SL Only Applied in One Code Path (🔴 CRITICAL - Fixed Nov 19, 2025)

**Symptom:** TP1 fills via on-chain order, runner gets breakeven SL instead of ADX-based positioning

**Root Cause:** Two TP1 detection paths, only one had ADX logic

**Fix Applied:** Added ADX-based runner SL to on-chain fill detection path (lines 607-642)

**Commits:** b2cb6a3, 66b2922

---

### Pitfall #53: Container Restart Kills Positions + Phantom Detection Bug (🔴 CRITICAL - Fixed Nov 19, 2025)

**Two bugs from container restart:**

**Bug 1: Startup order restore failure**
- Wrong database field names (`takeProfit1OrderTx` vs correct `tp1OrderTx`)
- Fix: Use correct field names

**Bug 2: Phantom detection killing runners**
- Runners (40% remaining) flagged as phantom
- Fix: Check `!trade.tp1Hit` before phantom detection:
```typescript
const wasPhantom = !trade.tp1Hit && trade.currentSize > 0 && (trade.currentSize / trade.positionSize) < 0.5
```

**Commit:** eccecf7

---

### Pitfall #54: MFE/MAE Storing Dollars Instead of Percentages (🔴 CRITICAL - Fixed Nov 23, 2025)

**Symptom:** Database showing maxFavorableExcursion = 64.08% when TradingView showed 0.48%

**Root Cause:** Position Manager storing DOLLAR amounts instead of PERCENTAGES

**Real Incident:** 133× inflation (64.08% stored vs 0.48% actual)

**Fix Applied:**
```typescript
// BEFORE (BROKEN):
if (currentPnLDollars > trade.maxFavorableExcursion) {
  trade.maxFavorableExcursion = currentPnLDollars  // Storing $64.08

// AFTER (FIXED):
if (profitPercent > trade.maxFavorableExcursion) {
  trade.maxFavorableExcursion = profitPercent      // Storing 0.48%
```

**Commit:** 6255662

**Lesson Learned:** Always verify data storage units match schema expectations. Comments don't override schema.

---

### Pitfall #55: Configuration Issues (🔴 CRITICAL - Fixed Nov 19-20, 2025)

**Two configuration bugs:**

**Bug 1: Settings UI quality score variable name mismatch**
- Settings API used `MIN_QUALITY_SCORE` (wrong)
- Code actually reads `MIN_SIGNAL_QUALITY_SCORE` (correct)
- User changes in UI had ZERO effect

**Bug 2: BlockedSignalTracker using Pyth cache instead of Drift oracle**
- `priceAfter1Min/5Min/15Min/30Min` fields staying NULL
- Fix: Use `driftService.getOraclePrice()` instead of `getPythPriceMonitor().getCachedPrice()`

**Commit:** 6b00303

---

### Pitfall #56: Ghost Orders After External Closures (🔴 CRITICAL - Fixed Nov 20-21, 2025)

**Symptom:** Position closed, but TP/SL orders remain active on Drift

**Root Cause:** External closure handler didn't call `cancelAllOrders()` before completing

**Real Incident:** Risk of ghost order filling → unintended positions

**Fix Applied:**
```typescript
// In external closure handler:
console.log(`🗑️ Cancelling remaining orders for ${trade.symbol}...`)
const cancelResult = await cancelAllOrders(trade.symbol)
```

**Additional Bug:** False positive "32 open orders" on restart
- Fix: Check `baseAssetAmount.eq(new BN(0))` to filter truly active orders

**Commits:** a3a6222 (Nov 20), 29fce01 (Nov 21)

---

### Pitfall #57: P&L Calculation Inaccuracy for External Closures (🔴 CRITICAL - Fixed Nov 20, 2025)

**Symptom:** Database P&L shows -$101.68 when Drift UI shows -$138.35 (36% error)

**Root Cause:** External closure handler calculates P&L from monitoring loop's `currentPrice`, which lags behind actual fill price

**Fix Applied:** Query Drift's actual settledPnL:
```typescript
const position = userAccount.perpPositions.find((p: any) =>
  p.marketIndex === marketConfig.driftMarketIndex
)
const settledPnL = Number(position.settledPnl || 0) / 1e6  // Convert to USD
if (Math.abs(settledPnL) > 0.01) {
  totalRealizedPnL = settledPnL
  console.log(`✅ Using Drift's actual P&L: $${totalRealizedPnL.toFixed(2)}`)
}
```

**Commit:** 8e600c8

---

### Pitfall #58: 5-Layer Database Protection System (⚠️ HIGH - Implemented Nov 21, 2025)

**Purpose:** Bulletproof protection against untracked positions from database failures

**5 Layers:**
1. **Persistent File Logger** (`lib/utils/persistent-logger.ts`) - Survives container restarts
2. **Database Save with Retry + Verification** - 3 retries with exponential backoff
3. **Orphan Position Detection** - Runs on EVERY container startup
4. **Critical Logging in Execute Endpoint** - Full trade details for recovery
5. **Infrastructure (Docker volumes)** - `./logs:/app/logs`

**Real-world validation:** Nov 21, 2025 - No database failure occurred, but protection now in place

---

### Pitfall #59: Layer 2 Ghost Detection Causing Duplicate Telegram Notifications (🔴 CRITICAL - Fixed Nov 22, 2025)

**Symptom:** Trade #8 sent 13 duplicate notifications with compounding P&L ($11.50 → $155.05)

**Root Cause:** Layer 2 ghost detection (failureCount > 20) didn't check `closingInProgress` flag

**Real Incident (Nov 22, 04:05 CET):**
- Actual P&L: +$18.79, Database final: $155.05 (8.2× actual)
- Rate limit storm: 6,581 failed close attempts

**Fix Applied:**
```typescript
// AFTER (FIXED):
if (trade.priceCheckCount > 20 && !trade.closingInProgress) {
  if (!position || Math.abs(position.size) < 0.01) {
    trade.closingInProgress = true
    trade.closeConfirmedAt = Date.now()
    await this.handleExternalClosure(trade, 'Layer 2: Ghost detected')
    return
  }
}
```

**Commit:** b19f156

---

### Pitfall #60: Stale Array Snapshot in Monitoring Loop (🔴 CRITICAL - Fixed Nov 23, 2025)

**Symptom:** Manual closure sends duplicate "POSITION CLOSED" Telegram notifications

**Root Cause:** Position Manager creates array snapshot before async processing

**Real Incident:** Two identical notifications for cmibdii4k0004pe07nzfmturo

**Fix Applied:**
```typescript
private async checkTradeConditions(trade: ActiveTrade, currentPrice: number): Promise<void> {
  // CRITICAL FIX: Check if trade still in monitoring
  if (!this.activeTrades.has(trade.id)) {
    console.log(`⏭️ Skipping ${trade.symbol} - already removed from monitoring`)
    return
  }
  // ... rest of function
}
```

**Commit:** a7c5930

---

### Pitfall #61: P&L Compounding STILL Happening Despite All Guards (🔴 CRITICAL - Under Investigation Nov 24, 2025)

**Symptom:** Trade showed $974.05 P&L when actual was $72.41 (13.4× inflation)

**Evidence:** 14 duplicate Telegram notifications with compounding P&L

**Status:** All existing guards in place, yet duplicates still occurred

**Interim Fix:** Manual P&L correction, container restart with enhanced closingInProgress flag

**Investigation Needed:**
- Serialization lock around external closure detection
- Unique transaction ID to prevent duplicate DB updates
- Telegram notification deduplication

**Commit:** 0466295

---

### Pitfall #62: Adaptive Leverage and Quality Bypass (🔴 CRITICAL - Fixed Nov 24-27, 2025)

**Two related bugs:**

**Bug 1: Adaptive leverage not working (Nov 24)**
- `USE_ADAPTIVE_LEVERAGE` ENV variable not set in .env
- Quality 90 trade used 15x instead of intended 10x

**Bug 2: Execute endpoint bypassing quality threshold (Nov 27)**
- Bot executed trades at quality 30, 50, 50 when minimum is 90/95
- Execute endpoint calculated quality but never validated it

**Fix Applied (Nov 27):**
```typescript
if (qualityResult.score < minQualityScore) {
  console.log(`❌ QUALITY TOO LOW: ${qualityResult.score} < ${minQualityScore} threshold`)
  return NextResponse.json({
    success: false,
    error: 'Quality score too low',
  }, { status: 400 })
}
console.log(`✅ Quality check passed: ${qualityResult.score} >= ${minQualityScore}`)
```

**Commit:** cefa3e6

---

### Pitfall #63: Smart Entry Validation System (⚠️ HIGH - Deployed Nov 30, 2025)

**Purpose:** Recover profits from marginal quality signals (50-89)

**Implementation:** `lib/trading/smart-validation-queue.ts` (330+ lines)

**Threshold Results (Dec 1, 2025):**
- **±0.3%:** 28/200 entries (14%), 67.9% WR, +4.73% total ✅
- ±0.2%: 51/200 entries (26%), 43.1% WR, -18.49% total
- ±0.15%: 73/200 entries (36%), 35.6% WR, -38.27% total

**Commit:** 7c9cfba

---

### Pitfall #64: EPYC Cluster SSH Timeout (🔴 CRITICAL - Fixed Dec 1, 2025)

**Symptom:** Coordinator reports "SSH command timed out for v9_chunk_000002 on worker1"

**Root Cause:** 30-second subprocess timeout insufficient for nested SSH hop (master → worker1 → worker2)

**Fix Applied:**
```python
ssh_opts = "-o StrictHostKeyChecking=no -o ConnectTimeout=10 -o ServerAliveInterval=5"
result = subprocess.run(ssh_cmd, timeout=60)  # Increased from 30s to 60s
```

**Commit:** ef371a1

**Lesson Learned:** Nested SSH hops need 2× minimum timeout. Latency compounds at each hop.

---

### Pitfall #65: Distributed Worker Quality Filter - Dict vs Callable (🔴 CRITICAL - Fixed Dec 1, 2025)

**Symptom:** ALL 2,096 distributed backtests returned 0 trades

**Root Cause:** Passed dict `{'min_adx': 15, 'min_volume_ratio': vol_min}` instead of lambda function

**Error:** `'dict' object is not callable`

**Fix Applied:**
```python
# BEFORE (BROKEN):
quality_filter = {'min_adx': 15, 'min_volume_ratio': vol_min}

# AFTER (FIXED):
if vol_min > 0:
    quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
else:
    quality_filter = None
```

**Commit:** 11a0ea3

**Lesson Learned:** Silent failures more dangerous than crashes. Exception handler hid severity by returning zeros.

---

### Pitfall #66: Smart Entry Wrong Price Display (🔴 CRITICAL - Fixed Dec 1, 2025)

**Symptom:** Abandonment notifications showing impossible prices ($126 → $98 = -22% in 30 seconds)

**Root Cause:** Symbol format mismatch between validation queue ("SOLUSDT") and market data cache ("SOL-PERP")

**Real Incident:** Cache lookup `marketDataCache.get("SOLUSDT")` returned null

**Fix Applied:**
```typescript
// Normalize symbol before validation queue
const normalizedSymbol = normalizeTradingViewSymbol(body.symbol)

const queued = await validationQueue.addSignal({
  symbol: normalizedSymbol, // Use normalized format for cache lookup
  // ...
})
```

**Commit:** 6cec2e8

---

### Pitfall #67: Ghost Detection Race Condition (🔴 CRITICAL - Fixed Dec 2, 2025)

**Symptom:** 23 duplicate "POSITION CLOSED" notifications with P&L compounding (-$47.96 to -$1,129.24)

**Root Cause:** Race condition in ghost detection - check `Map.has()` happened AFTER function entry

**Real Incident (Dec 2, 17:20 CET):**
- Expected P&L: ~-$48
- Actual: 23 notifications with compounding P&L

**Fix Applied:** Use Map.delete() atomic return value as deduplication lock:
```typescript
// FIXED CODE:
async handleExternalClosure(trade: ActiveTrade, reason: string) {
  const tradeId = trade.id

  // ✅ Delete IMMEDIATELY - atomic operation
  if (!this.activeTrades.delete(tradeId)) {
    console.log('DUPLICATE PREVENTED (atomic lock)')
    return
  }

  // ONLY first caller reaches here
  // ... rest of cleanup
}
```

**Commit:** 93dd950

**Lesson Learned:** When async handler can be called by multiple code paths simultaneously, use atomic operations (like Map.delete()) as locks at function entry.

---

### Pitfall #68: Smart Entry Using Webhook Percentage as Signal Price (🔴 CRITICAL - Fixed Dec 3, 2025)

**Symptom:** $89 position sizes, 97% pullback calculations, impossible entry conditions

**Root Cause:** TradingView webhook `signal.price` contained percentage (70.80) instead of market price ($142.50)

**Real Incident:** Smart Entry log showed "97.4% pullback required" (impossible)

**Fix Applied:**
```typescript
// Use Pyth current price instead of webhook signal price
const pythPrice = await pythClient.getPrice(symbol)
const signalPrice = pythPrice.price // ✅ Use actual market price
```

**Commit:** 7d0d38a

**Lesson Learned:** Never trust webhook data for calculations. Use authoritative price sources (Pyth, Drift).

---

### Pitfall #69: Direction-Specific Leverage Thresholds Not Explicit (🟡 MEDIUM - Fixed Dec 3, 2025)

**Symptom:** Leverage code checked quality score without explicit direction context

**Root Cause:** Code pattern was ambiguous about which direction's threshold applied

**Fix Applied:** Made direction-specific thresholds explicit:
```typescript
if (body.direction === 'LONG') {
  if (qualityResult.score >= 90) leverage = 5
  // ...
} else { // SHORT
  if (qualityResult.score >= 90) leverage = 5 // Same as LONG but explicit
  // ...
}
```

**Commit:** 58f812f

---

### Pitfall #70: Smart Validation Queue Rejected by Execute Endpoint (🔴 CRITICAL - Fixed Dec 3, 2025)

**Symptom:** Quality 50-89 signals validated by queue get rejected with "Quality score too low"

**Root Cause:** Execute endpoint applies quality threshold check AFTER validation queue confirmed price action

**Fix Applied:**
```typescript
const isValidatedEntry = body.validatedEntry === true

if (isValidatedEntry) {
  console.log(`✅ VALIDATED ENTRY BYPASS: Quality ${qualityResult.score} accepted`)
}

// Only apply quality threshold if NOT a validated entry
if (!isValidatedEntry && qualityResult.score < minQualityScore) {
  return NextResponse.json({ error: 'Quality too low' }, { status: 400 })
}
```

**Commit:** 785b09e

---

### Pitfall #71: Revenge System Missing External Closure Integration (🔴 CRITICAL - Fixed Dec 3, 2025)

**Symptom:** High-quality signals (85+) stopped by external closures don't trigger revenge window

**Root Cause:** Revenge eligibility check only existed in executeExit() path, not handleExternalClosure()

**Real Incident (Nov 20):** Quality 90 SHORT at $141.37, stopped at $142.48 (-$138.35), price dropped to $131.32 (+$490 opportunity missed)

**Fix Applied:**
```typescript
// In external closure handler:
if (exitReason === 'SL' && trade.signalQualityScore && trade.signalQualityScore >= 85) {
  console.log(`🎯 External SL closure - Quality ${trade.signalQualityScore} >= 85`)
  await stopHuntTracker.recordStopHunt({
    originalTradeId: trade.id,
    symbol: trade.symbol,
    direction: trade.direction,
    stopHuntPrice: currentPrice,
    originalEntryPrice: trade.entryPrice,
    originalQualityScore: trade.signalQualityScore,
    stopLossAmount: Math.abs(totalRealizedPnL)
  })
  console.log(`✅ Revenge window activated for external closure (30min monitoring)`)
}
```

**Commit:** 785b09e

---

### Pitfall #72: Telegram Webhook Conflicts with Polling Bot (🔴 CRITICAL - Fixed Dec 4, 2025)

**Symptom:** Python Telegram bot crashes with "Conflict: can't use getUpdates method while webhook is active"

**Root Cause:** n8n had active Telegram webhook that intercepted ALL messages before Python bot

**Real Incident:** `/status` command returned n8n test message with broken template syntax

**Fix Applied:**
```bash
# Delete Telegram webhook
curl -s "https://api.telegram.org/bot{TOKEN}/deleteWebhook"

# Restart Python bot
docker restart telegram-trade-bot
```

**Architecture Decision:** Cannot run both n8n webhook AND Python polling bot simultaneously. Choose one.

---

## Appendix: Pattern Recognition

### Common Root Causes

1. **Race Conditions:** Multiple code paths detecting same event (P&L compounding bugs #48, #49, #59, #60, #67)
2. **Unit Mismatches:** Tokens vs USD, dollars vs percentages (#24, #54)
3. **Symbol Format:** TradingView ("SOLUSDT") vs Drift ("SOL-PERP") (#5, #66)
4. **Deployment Verification:** Declaring "fixed" without checking container timestamp (#31)
5. **SDK Behavior:** Documentation doesn't match reality (#2, #24, #45)
6. **Async Timing:** Operations completing out of expected order (#13, #28, #60)

### Prevention Strategies

1. **Use atomic operations** for state changes (Map.delete() returns boolean)
2. **Always normalize symbols** at integration boundaries
3. **Verify deployment** with container timestamp vs commit time
4. **Never mutate shared state** during calculation phases
5. **Add explicit checks** in ALL code paths, not just happy path
6. **Test with real infrastructure** before trusting provider claims

---

## Cross-Reference Index

- **See Also:** `.github/copilot-instructions.md` - Main AI agent instructions with Top 10 Critical Pitfalls
- **Related:** `docs/bugs/` - Additional bug documentation
- **Related:** `docs/architecture/` - System design context

---

**Last Updated:** December 4, 2025
**Maintainer:** AI Agent team following "NOTHING gets lost" principle