1494 lines
56 KiB
Markdown
1494 lines
56 KiB
Markdown
# Common Pitfalls Reference Documentation
|
||
|
||
> **Last Updated:** December 4, 2025
|
||
> **Total Documented:** 72 Pitfalls
|
||
> **Primary Source:** `.github/copilot-instructions.md`
|
||
|
||
## Purpose
|
||
|
||
This document is the **comprehensive reference** for all documented pitfalls, bugs, and lessons learned from the Trading Bot v4 project. Each entry represents a real incident that caused financial loss, system instability, or operational issues.
|
||
|
||
**How to Use This Document:**
|
||
1. **Before making changes:** Search for related pitfalls to avoid repeating mistakes
|
||
2. **When debugging:** Look for symptoms matching your issue
|
||
3. **After fixing bugs:** Add new entries to preserve institutional knowledge
|
||
4. **Code review:** Verify changes don't reintroduce known issues
|
||
|
||
**Severity Levels:**
|
||
- 🔴 **CRITICAL** - Financial loss, data corruption, or system failure
|
||
- ⚠️ **HIGH** - System stability or significant operational impact
|
||
- 🟡 **MEDIUM** - Performance degradation or UX issues
|
||
- 🔵 **LOW** - Code quality or minor improvements
|
||
|
||
---
|
||
|
||
## Quick Reference Table
|
||
|
||
| # | Severity | Category | Date | Summary |
|
||
|---|----------|----------|------|---------|
|
||
| 1 | 🔴 CRITICAL | SDK/Memory | Nov 15, 2025 | Drift SDK memory leak - heap OOM after 10+ hours |
|
||
| 2 | 🔴 CRITICAL | RPC/Infrastructure | Nov 14, 2025 | Wrong RPC provider (Alchemy) breaks Drift SDK |
|
||
| 3 | 🟡 MEDIUM | Build/Docker | - | Prisma not generated in Docker |
|
||
| 4 | 🟡 MEDIUM | Configuration | - | Wrong DATABASE_URL for container vs host |
|
||
| 5 | 🟡 MEDIUM | Data/Symbols | - | Symbol format mismatch (TradingView → Drift) |
|
||
| 6 | ⚠️ HIGH | Orders | - | Missing reduce-only flag on exit orders |
|
||
| 7 | 🟡 MEDIUM | Architecture | - | Singleton violations (DriftClient, Position Manager) |
|
||
| 8 | 🟡 MEDIUM | Types/Prisma | - | Type errors with Prisma after generate |
|
||
| 9 | 🟡 MEDIUM | Code Quality | - | Quality score duplication in check-risk and execute |
|
||
| 10 | ⚠️ HIGH | Configuration | - | TP2-as-Runner configuration confusion |
|
||
| 11 | 🔴 CRITICAL | P&L Calculation | - | P&L calculation using SDK values incorrectly |
|
||
| 12 | 🔴 CRITICAL | Transactions | - | Transaction confirmation missing (phantom trades) |
|
||
| 13 | ⚠️ HIGH | Execution Order | - | Execution order matters (Position Manager before DB) |
|
||
| 14 | ⚠️ HIGH | Timing | - | New trade grace period (30s for Drift propagation) |
|
||
| 15 | 🟡 MEDIUM | SDK/Drift | - | Drift minimum position sizes differ from docs |
|
||
| 16 | 🔴 CRITICAL | Exit Logic | - | Exit reason detection bug (using current price) |
|
||
| 17 | 🟡 MEDIUM | Cooldown | - | Per-symbol cooldown, not global |
|
||
| 18 | ⚠️ HIGH | Quality Scoring | - | Timeframe-aware scoring crucial for 5min |
|
||
| 19 | 🔴 CRITICAL | Trading Logic | - | Price position chasing causes flip-flops |
|
||
| 20 | 🟡 MEDIUM | TradingView | - | TradingView ADX minimum for 5min charts |
|
||
| 21 | 🟡 MEDIUM | Types/Prisma | - | Prisma Decimal type handling in raw SQL |
|
||
| 22 | 🔴 CRITICAL | Trailing Stop | Nov 11, 2025 | ATR-based trailing stop implementation bug |
|
||
| 23 | 🟡 MEDIUM | Database Schema | - | CreateTradeParams interface sync required |
|
||
| 24 | 🔴 CRITICAL | SDK/Units | Nov 12, 2025 | Position.size returns tokens not USD |
|
||
| 25 | 🟡 MEDIUM | Display | Nov 12, 2025 | Leverage display showing global instead of symbol-specific |
|
||
| 26 | 🟡 MEDIUM | Tracking | Nov 12, 2025 | Indicator version tracking (v5→v6→v7→v8) |
|
||
| 27 | 🔴 CRITICAL | Race Condition | Nov 15, 2025 | Runner stop loss gap - no protection between TP1 and TP2 |
|
||
| 28 | 🔴 CRITICAL | Race Condition | Nov 12, 2025 | External closure duplicate updates bug |
|
||
| 29 | 🔴 CRITICAL | Database | Nov 13, 2025 | Database-First Pattern required |
|
||
| 30 | ⚠️ HIGH | Network | Nov 13, 2025 | DNS retry logic needed |
|
||
| 31 | 🔴 CRITICAL | Deployment | Nov 13, 2025 | Declaring fixes "working" before deployment |
|
||
| 32 | 🔴 CRITICAL | Workflow | Nov 14, 2025 | Phantom trade notification workflow breaks |
|
||
| 33 | 🔴 CRITICAL | Data Integrity | Nov 15, 2025 | Wrong entry price after orphaned position restoration |
|
||
| 34 | 🔴 CRITICAL | Monitoring | Nov 15, 2025 | Runner stop loss gap (duplicate of #27) |
|
||
| 35 | 🔴 CRITICAL | Database | Nov 15, 2025 | Phantom trades need exitReason for cleanup |
|
||
| 36 | 🔴 CRITICAL | Rate Limits | Nov 15, 2025 | closePosition() missing retry logic causes rate limit storm |
|
||
| 37 | 🔴 CRITICAL | Ghost Positions | Nov 15, 2025 | Ghost position accumulation from failed DB updates |
|
||
| 38 | 🟡 MEDIUM | Display | Nov 15, 2025 | Analytics dashboard showing original position size |
|
||
| 39 | 🔴 CRITICAL | Permissions | Nov 15, 2025 | Settings UI permission error (.env not writable) |
|
||
| 40 | 🔴 CRITICAL | Ghost Positions | Nov 15-16, 2025 | Ghost position death spiral from skipped validation |
|
||
| 41 | 🔴 CRITICAL | P&L Calculation | Nov 19, 2025 | Stats API recalculating P&L incorrectly for TP1+runner |
|
||
| 42 | 🟡 MEDIUM | Notifications | Nov 16, 2025 | Missing Telegram notifications for position closures |
|
||
| 43 | 🔴 CRITICAL | Trailing Stop | Nov 20, 2025 | Runner trailing stop never activates after TP1 |
|
||
| 44 | ⚠️ HIGH | DNS | Nov 16, 2025 | Telegram bot DNS resolution failures |
|
||
| 45 | 🔴 CRITICAL | SDK/Drift | Nov 16, 2025 | Drift SDK position.entryPrice recalculates after partial closes |
|
||
| 46 | 🔴 CRITICAL | Leverage | Nov 16, 2025 | Drift account leverage must be set in UI, not API |
|
||
| 47 | 🔴 CRITICAL | Verification | Nov 16, 2025 | Position close verification gap - 6 hours unmonitored |
|
||
| 48 | 🔴 CRITICAL | P&L Compounding | Nov 16, 2025 | P&L compounding during close verification |
|
||
| 49 | 🔴 CRITICAL | P&L Compounding | Nov 17, 2025 | P&L exponential compounding in external closure detection |
|
||
| 50 | 🔴 CRITICAL | Database | Nov 19, 2025 | Database not tracking trades despite successful Drift executions |
|
||
| 51 | 🔴 CRITICAL | Detection | Nov 19, 2025 | TP1 detection fails when on-chain orders fill fast |
|
||
| 52 | 🔴 CRITICAL | Exit Logic | Nov 19, 2025 | ADX-based runner SL only applied in one code path |
|
||
| 53 | 🔴 CRITICAL | Container | Nov 19, 2025 | Container restart kills positions + phantom detection bug |
|
||
| 54 | 🔴 CRITICAL | Data Integrity | Nov 23, 2025 | MFE/MAE storing dollars instead of percentages |
|
||
| 55 | 🔴 CRITICAL | Configuration | Nov 19-20, 2025 | Settings UI quality score variable name mismatch / BlockedSignalTracker using wrong price source |
|
||
| 56 | 🔴 CRITICAL | Ghost Orders | Nov 20-21, 2025 | Ghost orders after external closures + false order count bug |
|
||
| 57 | 🔴 CRITICAL | P&L Calculation | Nov 20, 2025 | P&L calculation inaccuracy for external closures |
|
||
| 58 | ⚠️ HIGH | Database | Nov 21, 2025 | 5-Layer Database Protection System implemented |
|
||
| 59 | 🔴 CRITICAL | Duplicates | Nov 22, 2025 | Layer 2 ghost detection causing duplicate Telegram notifications |
|
||
| 60 | 🔴 CRITICAL | Race Condition | Nov 23, 2025 | Stale array snapshot in monitoring loop causes duplicate processing |
|
||
| 61 | 🔴 CRITICAL | P&L Compounding | Nov 24, 2025 | P&L compounding STILL happening despite all guards |
|
||
| 62 | 🔴 CRITICAL | Quality Check | Nov 24-27, 2025 | Adaptive leverage not working / Execute endpoint bypassing quality threshold |
|
||
| 63 | ⚠️ HIGH | Feature | Nov 30, 2025 | Smart Entry Validation System - Block & Watch deployed |
|
||
| 64 | 🔴 CRITICAL | Cluster | Dec 1, 2025 | EPYC Cluster SSH Timeout - nested hop requires longer timeouts |
|
||
| 65 | 🔴 CRITICAL | Cluster | Dec 1, 2025 | Distributed Worker Quality Filter - dict vs callable |
|
||
| 66 | 🔴 CRITICAL | Smart Entry | Dec 1, 2025 | Smart Entry Validation Queue wrong price display |
|
||
| 67 | 🔴 CRITICAL | Race Condition | Dec 2, 2025 | Ghost detection race condition causing duplicate notifications with P&L compounding |
|
||
| 68 | 🔴 CRITICAL | Smart Entry | Dec 3, 2025 | Smart Entry using webhook percentage as signal price |
|
||
| 69 | 🟡 MEDIUM | Configuration | Dec 3, 2025 | Direction-specific leverage thresholds not explicit in code |
|
||
| 70 | 🔴 CRITICAL | Smart Entry | Dec 3, 2025 | Smart Validation Queue rejected by execute endpoint |
|
||
| 71 | 🔴 CRITICAL | Revenge System | Dec 3, 2025 | Revenge system missing external closure integration |
|
||
| 72 | 🔴 CRITICAL | Telegram | Dec 4, 2025 | Telegram webhook conflicts with polling bot |
|
||
|
||
---
|
||
|
||
## Category Index
|
||
|
||
### 🔴 P&L Calculation Errors
|
||
- [#11](#pitfall-11-pl-calculation-critical) - P&L calculation using SDK values incorrectly
|
||
- [#41](#pitfall-41-stats-api-recalculating-pl-incorrectly-critical---fixed-nov-19-2025) - Stats API recalculating P&L incorrectly
|
||
- [#48](#pitfall-48-pl-compounding-during-close-verification-critical---fixed-nov-16-2025) - P&L compounding during close verification
|
||
- [#49](#pitfall-49-pl-exponential-compounding-in-external-closure-detection-critical---fixed-nov-17-2025) - P&L exponential compounding
|
||
- [#54](#pitfall-54-mfemae-storing-dollars-instead-of-percentages-critical---fixed-nov-23-2025) - MFE/MAE storing dollars instead of percentages
|
||
- [#57](#pitfall-57-pl-calculation-inaccuracy-for-external-closures-critical---fixed-nov-20-2025) - P&L calculation inaccuracy for external closures
|
||
- [#61](#pitfall-61-pl-compounding-still-happening-despite-all-guards-critical---under-investigation-nov-24-2025) - P&L compounding STILL happening
|
||
|
||
### 🔴 Race Conditions & Duplicates
|
||
- [#27](#pitfall-27-runner-stop-loss-gap---no-protection-between-tp1-and-tp2-critical---fixed-nov-15-2025) - Runner stop loss gap - no protection between TP1 and TP2
|
||
- [#28](#pitfall-28-external-closure-duplicate-updates-bug-critical---fixed-nov-12-2025) - External closure duplicate updates
|
||
- [#59](#pitfall-59-layer-2-ghost-detection-causing-duplicate-telegram-notifications-critical---fixed-nov-22-2025) - Layer 2 ghost detection duplicates
|
||
- [#60](#pitfall-60-stale-array-snapshot-in-monitoring-loop-critical---fixed-nov-23-2025) - Stale array snapshot duplicates
|
||
- [#67](#pitfall-67-ghost-detection-race-condition-critical---fixed-dec-2-2025) - Ghost detection race condition
|
||
|
||
### 🔴 SDK/API Integration
|
||
- [#1](#pitfall-1-drift-sdk-memory-leak-critical---fixed-nov-15-2025) - Drift SDK memory leak
|
||
- [#2](#pitfall-2-wrong-rpc-provider-critical---investigation-complete-nov-14-2025) - Wrong RPC provider (Alchemy)
|
||
- [#12](#pitfall-12-transaction-confirmation-critical) - Transaction confirmation missing
|
||
- [#24](#pitfall-24-positionsize-tokens-vs-usd-bug-critical---fixed-nov-12-2025) - Position.size tokens vs USD
|
||
- [#36](#pitfall-36-closeposition-missing-retry-logic-critical---fixed-nov-15-2025) - closePosition() missing retry logic
|
||
- [#45](#pitfall-45-drift-sdk-positionentryprice-recalculates-critical---fixed-nov-16-2025) - position.entryPrice recalculates after partial closes
|
||
|
||
### 🔴 Database Operations
|
||
- [#29](#pitfall-29-database-first-pattern-critical---fixed-nov-13-2025) - Database-First Pattern required
|
||
- [#35](#pitfall-35-phantom-trades-need-exitreason-critical---fixed-nov-15-2025) - Phantom trades need exitReason
|
||
- [#37](#pitfall-37-ghost-position-accumulation-critical---fixed-nov-15-2025) - Ghost position accumulation
|
||
- [#50](#pitfall-50-database-not-tracking-trades-resolved---nov-19-2025) - Database not tracking trades
|
||
- [#58](#pitfall-58-5-layer-database-protection-system-implemented---nov-21-2025) - 5-Layer Database Protection System
|
||
|
||
### 🔴 Configuration & Settings
|
||
- [#55](#pitfall-55-configuration-issues-critical---fixed-nov-19-20-2025) - Settings UI quality score variable name mismatch
|
||
- [#62](#pitfall-62-adaptive-leverage-and-quality-bypass-critical---fixed-nov-24-27-2025) - Adaptive leverage / Execute endpoint bypassing quality threshold
|
||
|
||
### 🔴 Deployment & Verification
|
||
- [#31](#pitfall-31-declaring-fixes-working-before-deployment-critical---nov-13-2025) - Declaring fixes "working" before deployment
|
||
- [#47](#pitfall-47-position-close-verification-gap-critical---fixed-nov-16-2025) - Position close verification gap - 6 hours unmonitored
|
||
|
||
### 🔴 Smart Entry & Validation
|
||
- [#63](#pitfall-63-smart-entry-validation-system-deployed---nov-30-2025) - Smart Entry Validation System
|
||
- [#66](#pitfall-66-smart-entry-wrong-price-display-critical---fixed-dec-1-2025) - Smart Entry wrong price display
|
||
- [#68](#pitfall-68-smart-entry-using-webhook-percentage-critical---fixed-dec-3-2025) - Smart Entry using webhook percentage
|
||
- [#70](#pitfall-70-smart-validation-queue-rejected-critical---fixed-dec-3-2025) - Smart Validation Queue rejected by execute
|
||
|
||
### ⚠️ Ghost Positions & Orders
|
||
- [#40](#pitfall-40-ghost-position-death-spiral-critical---fixed-nov-15-16-2025) - Ghost position death spiral
|
||
- [#56](#pitfall-56-ghost-orders-after-external-closures-critical---fixed-nov-20-21-2025) - Ghost orders after external closures
|
||
|
||
### ⚠️ Network & Infrastructure
|
||
- [#30](#pitfall-30-dns-retry-logic-high---nov-13-2025) - DNS retry logic
|
||
- [#44](#pitfall-44-telegram-bot-dns-resolution-high---fixed-nov-16-2025) - Telegram bot DNS resolution
|
||
- [#64](#pitfall-64-epyc-cluster-ssh-timeout-critical---fixed-dec-1-2025) - EPYC Cluster SSH timeout
|
||
- [#65](#pitfall-65-distributed-worker-quality-filter-critical---fixed-dec-1-2025) - Distributed Worker dict vs callable
|
||
|
||
### ⚠️ Trailing Stop & Exit Logic
|
||
- [#22](#pitfall-22-atr-based-trailing-stop-implementation-critical---nov-11-2025) - ATR-based trailing stop implementation
|
||
- [#43](#pitfall-43-runner-trailing-stop-never-activates-critical---fixed-nov-20-2025) - Runner trailing stop never activates
|
||
- [#51](#pitfall-51-tp1-detection-fails-critical---fixed-nov-19-2025) - TP1 detection fails on-chain
|
||
- [#52](#pitfall-52-adx-based-runner-sl-critical---fixed-nov-19-2025) - ADX-based runner SL one code path
|
||
|
||
---
|
||
|
||
## Detailed Pitfall Entries
|
||
|
||
|
||
### Pitfall #1: Drift SDK Memory Leak (🔴 CRITICAL - Fixed Nov 15, 2025, Enhanced Nov 24, 2025)
|
||
|
||
**Symptom:** JavaScript heap out of memory after 10+ hours runtime, Telegram bot timeouts (60s)
|
||
|
||
**Root Cause:** Drift SDK accumulates WebSocket subscriptions over time without cleanup
|
||
|
||
**Real Incident:**
|
||
- Thousands of `accountUnsubscribe error: readyState was 2 (CLOSING)` in logs
|
||
- Heap growth: Normal ~200MB → 4GB+ after 10 hours → OOM crash
|
||
|
||
**Impact:** System crashes after extended uptime, requires manual container restart
|
||
|
||
**Fix Applied:**
|
||
- **File:** `lib/monitoring/drift-health-monitor.ts`
|
||
- **Implementation:** Smart error-based health monitoring replaces blind timer
|
||
- `interceptWebSocketErrors()` patches console.error to catch SDK WebSocket errors
|
||
- 30-second sliding window: Only restarts if 50+ errors in 30 seconds
|
||
- Container restart via flag: Writes `/tmp/trading-bot-restart.flag` for watch-restart.sh
|
||
- **API:** `GET /api/drift/health` - Check error count and health status
|
||
- **Commit:** Enhanced Nov 24, 2025
|
||
|
||
**Code Reference:**
|
||
```typescript
|
||
// lib/monitoring/drift-health-monitor.ts
|
||
interceptWebSocketErrors() // Patches console.error
|
||
if (errorsInWindow > 50) {
|
||
writeRestartFlag() // Triggers container restart
|
||
}
|
||
```
|
||
|
||
**Prevention:** Monitor for `🏥 Drift health monitor started` and error threshold logs
|
||
|
||
**Lesson Learned:** Smart, reactive monitoring is better than blind timers. Only restart when actual problems occur, not on a schedule.
|
||
|
||
---
|
||
|
||
### Pitfall #2: Wrong RPC Provider (🔴 CRITICAL - Investigation Complete Nov 14, 2025)
|
||
|
||
**Symptom:** Trades fail, duplicate closes, Position Manager loses tracking, database save failures
|
||
|
||
**Root Cause:** Alchemy's rate limiting breaks Drift SDK's burst subscription pattern during initialization
|
||
|
||
**Real Incident (Nov 14, 21:14 CET):**
|
||
- Created diagnostic endpoint `/api/testing/drift-init`
|
||
- Alchemy: 17-71 subscription errors EVERY init (49 avg over 5 runs), 1644ms avg init time
|
||
- Helius: 0 subscription errors EVERY init, 800ms avg init time
|
||
|
||
**Impact:** Complete system failure when using wrong RPC provider
|
||
|
||
**Why Alchemy Fails:**
|
||
- Drift SDK subscribes to 30-50+ accounts simultaneously during init (burst pattern)
|
||
- Alchemy's CUPS enforcement rate limits these burst requests
|
||
- Drift SDK does NOT retry failed subscriptions
|
||
- SDK reports "initialized successfully" but with incomplete subscription set
|
||
- Error: `"Received JSON-RPC error calling accountSubscribe"`
|
||
|
||
**Fix Applied:**
|
||
- **Use Helius RPC** (https://mainnet.helius-rpc.com/?api-key=...)
|
||
- Retry logic: 5s exponential backoff for rate limits
|
||
- **Documentation:** `docs/ALCHEMY_RPC_INVESTIGATION_RESULTS.md`
|
||
|
||
**Code Reference:**
|
||
```bash
|
||
# Test yourself
|
||
curl 'http://localhost:3001/api/testing/drift-init?rpc=alchemy'
|
||
```
|
||
|
||
**Prevention:** ALWAYS use Helius RPC. Do not use Alchemy for Drift SDK.
|
||
|
||
**Lesson Learned:** Documentation doesn't always reflect reality. Test with real infrastructure before trusting provider claims.
|
||
|
||
---
|
||
|
||
### Pitfall #3: Prisma Not Generated in Docker (🟡 MEDIUM)
|
||
|
||
**Symptom:** Build fails with Prisma client errors
|
||
|
||
**Root Cause:** Must run `npx prisma generate` in Dockerfile BEFORE `npm run build`
|
||
|
||
**Fix Applied:** Add `RUN npx prisma generate` before build step in Dockerfile
|
||
|
||
---
|
||
|
||
### Pitfall #4: Wrong DATABASE_URL (🟡 MEDIUM)
|
||
|
||
**Symptom:** Database connection failures
|
||
|
||
**Root Cause:** Container runtime needs `trading-bot-postgres` (container name), Prisma CLI from host needs `localhost:5432`
|
||
|
||
**Fix Applied:** Use correct hostname based on context:
|
||
- Container: `postgresql://postgres:password@trading-bot-postgres:5432/trading_bot_v4`
|
||
- Host CLI: `postgresql://postgres:password@localhost:5432/trading_bot_v4`
|
||
|
||
---
|
||
|
||
### Pitfall #5: Symbol Format Mismatch (🟡 MEDIUM)
|
||
|
||
**Symptom:** Drift API rejects orders, symbol not found errors
|
||
|
||
**Root Cause:** TradingView sends "SOLUSDT" but Drift requires "SOL-PERP"
|
||
|
||
**Fix Applied:** Always normalize with `normalizeTradingViewSymbol()` before calling Drift
|
||
- **File:** `config/trading.ts`
|
||
- Applies to ALL endpoints including `/api/trading/close`
|
||
|
||
---
|
||
|
||
### Pitfall #6: Missing Reduce-Only Flag (⚠️ HIGH)
|
||
|
||
**Symptom:** Exit orders accidentally open new positions instead of closing
|
||
|
||
**Root Cause:** Exit orders without `reduceOnly: true` can open new positions
|
||
|
||
**Fix Applied:** All TP/SL orders MUST include `reduceOnly: true`
|
||
|
||
```typescript
|
||
const orderParams = {
|
||
reduceOnly: true, // CRITICAL for TP/SL orders
|
||
// ... other params
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### Pitfall #7: Singleton Violations (🟡 MEDIUM)
|
||
|
||
**Symptom:** Connection issues, state inconsistencies, multiple WebSocket connections
|
||
|
||
**Root Cause:** Creating multiple DriftClient or Position Manager instances
|
||
|
||
**Fix Applied:** Always use getter functions:
|
||
```typescript
|
||
const driftService = await initializeDriftService() // NOT: new DriftService()
|
||
const positionManager = getPositionManager() // NOT: new PositionManager()
|
||
const prisma = getPrismaClient() // NOT: new PrismaClient()
|
||
```
|
||
|
||
---
|
||
|
||
### Pitfall #8: Prisma Type Errors (🟡 MEDIUM)
|
||
|
||
**Symptom:** TypeScript compilation fails with Prisma types
|
||
|
||
**Root Cause:** Trade type from Prisma only available AFTER `npx prisma generate`
|
||
|
||
**Fix Applied:** Run `npx prisma generate` after any schema changes
|
||
|
||
---
|
||
|
||
### Pitfall #9: Quality Score Duplication (🟡 MEDIUM)
|
||
|
||
**Symptom:** Inconsistent quality scoring between endpoints
|
||
|
||
**Root Cause:** Signal quality calculation exists in BOTH `check-risk` and `execute` endpoints
|
||
|
||
**Fix Applied:** Keep logic synchronized between both endpoints when making changes
|
||
|
||
---
|
||
|
||
### Pitfall #10: TP2-as-Runner Configuration (⚠️ HIGH)
|
||
|
||
**Symptom:** Confusion about runner size and TP2 behavior
|
||
|
||
**Root Cause:** `takeProfit2SizePercent: 0` means "TP2 activates trailing stop, no position close"
|
||
|
||
**Fix Applied:**
|
||
- `TAKE_PROFIT_2_PERCENT=0.7` sets TP2 trigger price
|
||
- `TAKE_PROFIT_2_SIZE_PERCENT` should be 0 for runner system
|
||
- Runner = 100% - TAKE_PROFIT_1_SIZE_PERCENT (default 40%)
|
||
|
||
---
|
||
|
||
### Pitfall #11: P&L Calculation Critical (🔴 CRITICAL)
|
||
|
||
**Symptom:** Incorrect P&L values in database and analytics
|
||
|
||
**Root Cause:** Using SDK values instead of actual entry vs exit price calculation
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
const profitPercent = this.calculateProfitPercent(trade.entryPrice, exitPrice, trade.direction)
|
||
const actualRealizedPnL = (closedSizeUSD * profitPercent) / 100
|
||
trade.realizedPnL += actualRealizedPnL // NOT: result.realizedPnL from SDK
|
||
```
|
||
|
||
---
|
||
|
||
### Pitfall #12: Transaction Confirmation Critical (🔴 CRITICAL)
|
||
|
||
**Symptom:** "Phantom trades" - SDK returns signatures for transactions that never execute
|
||
|
||
**Root Cause:** Both `openPosition()` AND `closePosition()` must call `connection.confirmTransaction()`
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
const txSig = await driftClient.placePerpOrder(orderParams)
|
||
console.log('⏳ Confirming transaction on-chain...')
|
||
const connection = driftService.getConnection()
|
||
const confirmation = await connection.confirmTransaction(txSig, 'confirmed')
|
||
|
||
if (confirmation.value.err) {
|
||
throw new Error(`Transaction failed: ${JSON.stringify(confirmation.value.err)}`)
|
||
}
|
||
console.log('✅ Transaction confirmed on-chain')
|
||
```
|
||
|
||
---
|
||
|
||
### Pitfall #13: Execution Order Matters (⚠️ HIGH)
|
||
|
||
**Symptom:** Race conditions where monitoring starts before trade exists in database
|
||
|
||
**Root Cause:** Position Manager added before database save
|
||
|
||
**Fix Applied:** Order MUST be:
|
||
1. Open position + place exit orders
|
||
2. Save to database (`createTrade()`)
|
||
3. Add to Position Manager (`positionManager.addTrade()`)
|
||
|
||
---
|
||
|
||
### Pitfall #14: New Trade Grace Period (⚠️ HIGH)
|
||
|
||
**Symptom:** New positions immediately detected as "closed externally" and cancelled
|
||
|
||
**Root Cause:** Drift positions take 5-10 seconds to propagate after opening
|
||
|
||
**Fix Applied:** Position Manager skips "external closure" detection for trades <30 seconds old
|
||
|
||
---
|
||
|
||
### Pitfall #15: Drift Minimum Position Sizes (🟡 MEDIUM)
|
||
|
||
**Symptom:** Orders rejected for being too small
|
||
|
||
**Root Cause:** Actual minimums differ from documentation:
|
||
- SOL-PERP: 0.1 SOL (~$5-15)
|
||
- ETH-PERP: 0.01 ETH (~$38-40)
|
||
- BTC-PERP: 0.0001 BTC (~$10-12)
|
||
|
||
**Fix Applied:** Calculate `minOrderSize × currentPrice` must exceed Drift's $4 minimum. Add buffer.
|
||
|
||
---
|
||
|
||
### Pitfall #16: Exit Reason Detection Bug (🔴 CRITICAL)
|
||
|
||
**Symptom:** Profitable trades mislabeled as "SL" exits
|
||
|
||
**Root Cause:** Position Manager using current price to determine exit reason, but on-chain orders filled at different price
|
||
|
||
**Fix Applied:** Use `trade.tp1Hit` / `trade.tp2Hit` flags and realized P&L to correctly identify exit trigger
|
||
|
||
---
|
||
|
||
### Pitfall #17: Per-Symbol Cooldown (🟡 MEDIUM)
|
||
|
||
**Symptom:** ETH trade incorrectly blocking SOL trade
|
||
|
||
**Root Cause:** Cooldown was global, not per-symbol
|
||
|
||
**Fix Applied:** Each coin (SOL/ETH/BTC) has independent cooldown timer via `getLastTradeTimeForSymbol(symbol)`
|
||
|
||
---
|
||
|
||
### Pitfall #18: Timeframe-Aware Scoring Crucial (⚠️ HIGH)
|
||
|
||
**Symptom:** Valid 5min breakouts blocked as "low quality"
|
||
|
||
**Root Cause:** Signal quality thresholds not adjusted for 5min vs higher timeframes
|
||
- 5min: ADX 12-22 healthy, ATR 0.2-0.7%
|
||
- Daily: ADX 18-30 healthy, ATR 0.4%+
|
||
|
||
**Fix Applied:** Always pass `timeframe` parameter from TradingView alerts to `scoreSignalQuality()`
|
||
|
||
---
|
||
|
||
### Pitfall #19: Price Position Chasing (🔴 CRITICAL)
|
||
|
||
**Symptom:** Rapid flip-flop losses
|
||
|
||
**Root Cause:** Opening longs at 90%+ range or shorts at <10% range
|
||
|
||
**Real Incident:** Overnight flip-flop losses all had price position 9-94%
|
||
|
||
**Fix Applied:** Quality scoring now penalizes -15 to -30 points for range extremes
|
||
|
||
---
|
||
|
||
### Pitfall #20: TradingView ADX Minimum (🟡 MEDIUM)
|
||
|
||
**Symptom:** Too many signals blocked or too many low-quality signals passing
|
||
|
||
**Root Cause:** TradingView ADX filter should be 15 for 5min (not 20+)
|
||
|
||
**Fix Applied:** Set ADX ≥15 in TradingView alerts for 5min charts. Bot's quality scoring provides second-layer filtering.
|
||
|
||
---
|
||
|
||
### Pitfall #21: Prisma Decimal Type Handling (🟡 MEDIUM)
|
||
|
||
**Symptom:** Frontend errors with `.toFixed()` on undefined
|
||
|
||
**Root Cause:** Raw SQL queries return Prisma `Decimal` objects, not plain numbers
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// Use `any` type for numeric fields in $queryRaw results
|
||
const stat: { total_pnl: any } = await prisma.$queryRaw`...`
|
||
|
||
// Convert with Number() before returning to frontend
|
||
totalPnL: Number(stat.total_pnl) || 0
|
||
```
|
||
|
||
---
|
||
|
||
### Pitfall #22: ATR-Based Trailing Stop Implementation (🔴 CRITICAL - Nov 11, 2025)
|
||
|
||
**Symptom:** Trades with +7-9% MFE exited for losses
|
||
|
||
**Root Cause:** Runner system was using FIXED 0.3% trailing instead of ATR-based
|
||
|
||
**Real Incident:** At $168 SOL, 0.3% = $0.50 wiggle room - too tight
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
trailingDistancePercent = (atrAtEntry / currentPrice * 100) × trailingStopAtrMultiplier
|
||
```
|
||
|
||
**Configuration:**
|
||
- `TRAILING_STOP_ATR_MULTIPLIER=1.5`
|
||
- `MIN=0.25%`, `MAX=0.9%`
|
||
- `ACTIVATION=0.5%`
|
||
|
||
**Result:** 0.45% ATR × 1.5 = 0.675% trail ($1.13 vs $0.50 = 2.26x more room)
|
||
|
||
**Documentation:** `ATR_TRAILING_STOP_FIX.md`
|
||
|
||
---
|
||
|
||
### Pitfall #23: CreateTradeParams Interface Sync (🟡 MEDIUM)
|
||
|
||
**Symptom:** TypeScript build fails when endpoint passes field not in interface
|
||
|
||
**Root Cause:** New database fields added to Trade model but not to `CreateTradeParams` interface
|
||
|
||
**Fix Applied:** When adding new fields:
|
||
1. Add to interface in `lib/database/trades.ts`
|
||
2. Add to Prisma create data object in `createTrade()` function
|
||
|
||
---
|
||
|
||
### Pitfall #24: Position.size Tokens vs USD Bug (🔴 CRITICAL - Fixed Nov 12, 2025)
|
||
|
||
**Symptom:** Position Manager detects false TP1 hits, moves SL to breakeven prematurely
|
||
|
||
**Root Cause:** `lib/drift/client.ts` returns `position.size` as BASE ASSET TOKENS (12.28 SOL), not USD ($1,950)
|
||
|
||
**Real Incident:** Comparing tokens (12.28) directly to USD ($1,950) → "99.4% reduction" → FALSE TP1!
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// In Position Manager (lines 322, 519, 558, 591)
|
||
const positionSizeUSD = Math.abs(position.size) * currentPrice
|
||
|
||
// Now compare USD to USD
|
||
if (positionSizeUSD < trade.currentSize * 0.95) {
|
||
// Actual 5%+ reduction detected
|
||
}
|
||
```
|
||
|
||
**Impact:** Without this fix, TP1 never triggers correctly, SL moves at wrong times, runner system fails
|
||
|
||
---
|
||
|
||
### Pitfall #25: Leverage Display Bug (🟡 MEDIUM - Fixed Nov 12, 2025)
|
||
|
||
**Symptom:** Telegram notifications showing "⚡ Leverage: 10x" when actual position uses 15x
|
||
|
||
**Root Cause:** API response returning `config.leverage` (global default) instead of symbol-specific value
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
const { size, leverage, enabled } = getPositionSizeForSymbol(driftSymbol, config)
|
||
// Return symbol-specific leverage
|
||
leverage: leverage, // NOT: config.leverage
|
||
```
|
||
|
||
---
|
||
|
||
### Pitfall #26: Indicator Version Tracking (🟡 MEDIUM - Nov 12, 2025+)
|
||
|
||
**Symptom:** Unable to compare performance between TradingView strategies
|
||
|
||
**Root Cause:** No tracking of which indicator generated the signal
|
||
|
||
**Fix Applied:** Database field `indicatorVersion` tracks:
|
||
- v5: Buy/Sell Signal (pre-Nov 12)
|
||
- v6: HalfTrend + BarColor (Nov 12-18)
|
||
- v7: v6 with toggles (deprecated)
|
||
- v8: Money Line Sticky Trend (Nov 18+)
|
||
- v9: Money Line with Momentum Filter (Nov 26+)
|
||
|
||
---
|
||
|
||
### Pitfall #27: Runner Stop Loss Gap - No Protection Between TP1 and TP2 (🔴 CRITICAL - Fixed Nov 15, 2025)
|
||
|
||
**Symptom:** Runner position remained open despite price moving far past stop loss level
|
||
|
||
**Root Cause:** Position Manager only checked stop loss BEFORE TP1 (line 877), creating a protection gap
|
||
|
||
**Real Incident:**
|
||
1. SHORT opened, TP1 hit at 70% close (runner = 30% remaining)
|
||
2. Runner had stop loss at profit-lock level (+0.5%)
|
||
3. Price moved past stop loss → NO CHECK RAN (tp1Hit = true, so SL check skipped)
|
||
4. Runner exposed to unlimited loss for hours during TP1→TP2 window
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// Added explicit runner stop loss check at line ~881:
|
||
if (trade.tp1Hit && !trade.tp2Hit && this.shouldStopLoss(currentPrice, trade)) {
|
||
console.log(`🔴 RUNNER STOP LOSS: ${trade.symbol}`)
|
||
await this.executeExit(trade, 100, 'SL', currentPrice)
|
||
return
|
||
}
|
||
```
|
||
|
||
**Lesson Learned:** Every conditional branch in risk management MUST have explicit stop loss checks - never assume "it'll get caught somewhere"
|
||
|
||
---
|
||
|
||
### Pitfall #28: External Closure Duplicate Updates Bug (<28><> CRITICAL - Fixed Nov 12, 2025)
|
||
|
||
**Symptom:** Trades showing 7-8x larger losses than actual ($58 loss when Drift shows $7 loss)
|
||
|
||
**Root Cause:** Position Manager monitoring loop re-processes external closures multiple times before trade removed from activeTrades Map
|
||
|
||
**Real Incident:**
|
||
1. Trade closed externally at -$7.98
|
||
2. Position Manager detects closure, calculates P&L → -$7.50 in DB
|
||
3. Trade still in Map (removal async), loop runs again
|
||
4. Accumulates P&L: -$7.50 + -$7.50 = -$15.00
|
||
5. Repeats 8 times → final -$58.43
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// BEFORE (BROKEN):
|
||
await updateTradeExit({ ... })
|
||
await this.removeTrade(trade.id) // Too late!
|
||
|
||
// AFTER (FIXED):
|
||
this.activeTrades.delete(trade.id) // Remove FIRST
|
||
await updateTradeExit({ ... }) // Then update DB
|
||
```
|
||
|
||
**Commit:** Fixed Nov 12, 2025
|
||
|
||
---
|
||
|
||
### Pitfall #29: Database-First Pattern (🔴 CRITICAL - Fixed Nov 13, 2025)
|
||
|
||
**Symptom:** Positions opened on Drift with NO database record, NO Position Manager tracking, NO TP/SL protection
|
||
|
||
**Root Cause:** Execute endpoint saved to database AFTER adding to Position Manager, with silent error catch
|
||
|
||
**Real Incident:** Unprotected position opened, database save failed silently, Position Manager never tracked it
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// CRITICAL: Save to database FIRST before adding to Position Manager
|
||
try {
|
||
await createTrade({...})
|
||
} catch (dbError) {
|
||
console.error('❌ CRITICAL: Failed to save trade to database:', dbError)
|
||
return NextResponse.json({
|
||
success: false,
|
||
error: 'Database save failed - position unprotected',
|
||
message: `CLOSE POSITION MANUALLY IMMEDIATELY. Transaction: ${openResult.transactionSignature}`,
|
||
}, { status: 500 })
|
||
}
|
||
|
||
// ONLY add to Position Manager if database save succeeded
|
||
await positionManager.addTrade(activeTrade)
|
||
```
|
||
|
||
**Documentation:** `CRITICAL_INCIDENT_UNPROTECTED_POSITION.md`
|
||
|
||
---
|
||
|
||
### Pitfall #30: DNS Retry Logic (⚠️ HIGH - Nov 13, 2025)
|
||
|
||
**Symptom:** Trading bot fails with "fetch failed" errors when DNS resolution temporarily fails
|
||
|
||
**Root Cause:** `EAI_AGAIN` errors are transient DNS issues that resolve in seconds
|
||
|
||
**Fix Applied:** Automatic retry in `lib/drift/client.ts`:
|
||
```typescript
|
||
// Detects: fetch failed, EAI_AGAIN, ENOTFOUND, ETIMEDOUT
|
||
// Retries up to 3 times with 2s delay
|
||
await this.retryOperation(async () => {
|
||
// Initialize Drift SDK, subscribe, get user account
|
||
}, 3, 2000, 'Drift initialization')
|
||
```
|
||
|
||
**Documentation:** `docs/DNS_RETRY_LOGIC.md`
|
||
|
||
---
|
||
|
||
### Pitfall #31: Declaring Fixes "Working" Before Deployment (🔴 CRITICAL - Nov 13, 2025)
|
||
|
||
**Symptom:** AI says "position is protected" when container still running old code
|
||
|
||
**Root Cause:** Conflating "code committed to git" with "code running in production"
|
||
|
||
**Real Incident:** Fix committed 15:56, declared "working" at 19:42, but container started 15:06 (old code)
|
||
|
||
**Verification Required:**
|
||
```bash
|
||
# ALWAYS check before declaring fix deployed:
|
||
docker logs trading-bot-v4 | grep "Server starting" | head -1
|
||
# Compare container start time to git commit timestamp
|
||
# If container older: FIX NOT DEPLOYED
|
||
```
|
||
|
||
**Rule:** NEVER say "fixed", "working", "protected", or "deployed" without verifying container restart timestamp
|
||
|
||
---
|
||
|
||
### Pitfall #32: Phantom Trade Notification Workflow Breaks (🔴 CRITICAL - Nov 14, 2025)
|
||
|
||
**Symptom:** Phantom trade detected, position opened, but n8n workflow stops. User NOT notified.
|
||
|
||
**Root Cause:** Execute endpoint returned HTTP 500 when phantom detected, causing n8n chain to halt
|
||
|
||
**Fix Applied:** Auto-close phantom trades immediately + return HTTP 200 with warning:
|
||
```typescript
|
||
return NextResponse.json({
|
||
success: true,
|
||
warning: 'Phantom trade detected and auto-closed',
|
||
isPhantom: true,
|
||
message: '[Full notification text]',
|
||
phantomDetails: {...}
|
||
})
|
||
```
|
||
|
||
**Database tracking:** `status='phantom'`, `exitReason='manual'`
|
||
|
||
---
|
||
|
||
### Pitfall #33: Wrong Entry Price After Orphaned Position Restoration (🔴 CRITICAL - Fixed Nov 15, 2025)
|
||
|
||
**Symptom:** Position Manager tracking wrong entry price after container restart
|
||
|
||
**Root Cause:** Startup validation restored orphaned position using OLD database entry price instead of querying Drift
|
||
|
||
**Real Incident:** DB showed $141.51, Drift showed $141.31 actual entry → 0.14% SL placement error
|
||
|
||
**Fix Applied:** Query Drift SDK for actual entry price during orphaned position restoration:
|
||
```typescript
|
||
await prisma.trade.update({
|
||
data: {
|
||
entryPrice: position.entryPrice, // CRITICAL: Use Drift's actual entry price
|
||
positionSizeUSD: positionSizeUSD,
|
||
}
|
||
})
|
||
```
|
||
|
||
---
|
||
|
||
### Pitfall #35: Phantom Trades Need exitReason (🔴 CRITICAL - Fixed Nov 15, 2025)
|
||
|
||
**Symptom:** Position Manager keeps restoring phantom trade on every restart
|
||
|
||
**Root Cause:** Phantom auto-closure sets `status='phantom'` but leaves `exitReason=NULL`
|
||
|
||
**Real Incident:** Phantom trade caused 232% size mismatch, hundreds of false alerts
|
||
|
||
**Fix Applied:** MUST set exitReason when auto-closing phantoms:
|
||
```typescript
|
||
await updateTradeExit({
|
||
tradeId: trade.id,
|
||
exitPrice: currentPrice,
|
||
exitReason: 'manual', // CRITICAL: Must set exitReason for cleanup
|
||
status: 'phantom'
|
||
})
|
||
```
|
||
|
||
---
|
||
|
||
### Pitfall #36: closePosition() Missing Retry Logic (🔴 CRITICAL - Fixed Nov 15, 2025)
|
||
|
||
**Symptom:** Position Manager tries to close, gets 429 error, retries EVERY 2 SECONDS → 100+ failed attempts
|
||
|
||
**Root Cause:** `placeExitOrders()` had retry wrapper but `closePosition()` did NOT
|
||
|
||
**Real Incident:** 100+ "❌ Failed to close position: 429" + compounding P&L
|
||
|
||
**Fix Applied:** Wrapped closePosition() with retryWithBackoff():
|
||
```typescript
|
||
const txSig = await retryWithBackoff(async () => {
|
||
return await driftClient.placePerpOrder(orderParams)
|
||
}, 3, 8000) // 8s base delay, 3 max retries (8s → 16s → 32s)
|
||
```
|
||
|
||
---
|
||
|
||
### Pitfall #37: Ghost Position Accumulation (🔴 CRITICAL - Fixed Nov 15, 2025)
|
||
|
||
**Symptom:** Position Manager tracking 4+ positions when database shows only 1 open trade
|
||
|
||
**Root Cause:** Database has `exitReason IS NULL` for positions actually closed on Drift
|
||
|
||
**Real Incident:** 4+ ghosts → massive rate limiting, "vanishing orders"
|
||
|
||
**Fix Applied:** Periodic Drift position validation:
|
||
```typescript
|
||
private scheduleValidation(): void {
|
||
this.validationInterval = setInterval(async () => {
|
||
await this.validatePositions()
|
||
}, 5 * 60 * 1000)
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### Pitfall #38: Analytics Dashboard Wrong Size (🟡 MEDIUM - Fixed Nov 15, 2025)
|
||
|
||
**Symptom:** Analytics page displays $42.54 when actual runner is $12.59 after TP1
|
||
|
||
**Root Cause:** API returns `trade.positionSizeUSD` (original) not runner size
|
||
|
||
**Fix Applied:** Check Position Manager state for open positions:
|
||
```typescript
|
||
const currentSize = configSnapshot?.positionManagerState?.currentSize
|
||
const displaySize = trade.exitReason === null && currentSize
|
||
? currentSize
|
||
: trade.positionSizeUSD
|
||
```
|
||
|
||
---
|
||
|
||
### Pitfall #40: Ghost Position Death Spiral (🔴 CRITICAL - Fixed Nov 15-16, 2025)
|
||
|
||
**Symptom:** Container crashes from cascading ghost detection failures
|
||
|
||
**Root Cause:** Position validation skipped during death spiral recovery, creating more ghosts
|
||
|
||
**Fix Applied:** Never skip validation during recovery operations
|
||
|
||
---
|
||
|
||
### Pitfall #41: Stats API Recalculating P&L Incorrectly (🔴 CRITICAL - Fixed Nov 19, 2025)
|
||
|
||
**Symptom:** Analytics showing wrong P&L for trades with TP1+runner
|
||
|
||
**Root Cause:** Stats API recalculating P&L from partial position data
|
||
|
||
**Fix Applied:** Use stored `realizedPnL` directly, don't recalculate
|
||
|
||
---
|
||
|
||
### Pitfall #43: Runner Trailing Stop Never Activates (🔴 CRITICAL - Fixed Nov 20, 2025)
|
||
|
||
**Symptom:** Runner position sits without trailing stop after TP1
|
||
|
||
**Root Cause:** Trailing stop activation logic only ran in one code path
|
||
|
||
**Fix Applied:** Ensure trailing stop activates in all TP1 detection paths
|
||
|
||
---
|
||
|
||
### Pitfall #44: Telegram Bot DNS Resolution (⚠️ HIGH - Fixed Nov 16, 2025)
|
||
|
||
**Symptom:** Telegram notifications fail intermittently
|
||
|
||
**Root Cause:** DNS resolution failures for api.telegram.org
|
||
|
||
**Fix Applied:** Retry logic for Telegram API calls
|
||
|
||
---
|
||
|
||
### Pitfall #45: Drift SDK position.entryPrice Recalculates (🔴 CRITICAL - Fixed Nov 16, 2025)
|
||
|
||
**Symptom:** Entry price changes after partial closes
|
||
|
||
**Root Cause:** Drift SDK calculates `position.entryPrice` from `quoteAssetAmount / baseAssetAmount`
|
||
|
||
**Impact:** After TP1 closes 75%, remaining 25% has "new" entry price
|
||
|
||
**Fix Applied:** Store and use original entry price from trade record, not SDK
|
||
|
||
---
|
||
|
||
### Pitfall #46: 100% Position Sizing InsufficientCollateral (🔴 CRITICAL - Fixed Nov 16, 2025)
|
||
|
||
**Symptom:** Bot gets InsufficientCollateral errors when Drift UI can open same size
|
||
|
||
**Root Cause:** Drift's margin calculation includes fees, slippage buffers
|
||
|
||
**Real Incident:** $85.55 collateral, bot tries 100% → rejected, shortage: $0.03
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
if (configuredSize >= 100) {
|
||
percentDecimal = 0.99
|
||
console.log(`⚠️ Applying 99% safety buffer for 100% position`)
|
||
}
|
||
```
|
||
|
||
**Commit:** 7129cbf
|
||
|
||
---
|
||
|
||
### Pitfall #47: Position Close Verification Gap (🔴 CRITICAL - Fixed Nov 16, 2025)
|
||
|
||
**Symptom:** Close transaction confirmed, database marked "closed", but position stayed open 6+ hours
|
||
|
||
**Root Cause:** Transaction confirmation ≠ Drift internal state updated immediately (5-10s delay)
|
||
|
||
**Real Incident:** Trailing stop triggered 02:51, position stayed open until 08:51 restart
|
||
|
||
**Fix Applied:** 2-layer verification:
|
||
```typescript
|
||
if (params.percentToClose === 100) {
|
||
await cancelAllOrders(params.symbol)
|
||
|
||
console.log('⏳ Waiting 5s for Drift state to propagate...')
|
||
await new Promise(resolve => setTimeout(resolve, 5000))
|
||
|
||
const verifyPosition = await driftService.getPosition(marketConfig.driftMarketIndex)
|
||
if (verifyPosition && Math.abs(verifyPosition.size) >= 0.01) {
|
||
console.error(`🔴 CRITICAL: Close confirmed BUT position still exists!`)
|
||
return { ...result, needsVerification: true }
|
||
}
|
||
}
|
||
```
|
||
|
||
**Commit:** c607a66
|
||
|
||
---
|
||
|
||
### Pitfall #48: P&L Compounding During Close Verification (🔴 CRITICAL - Fixed Nov 16, 2025)
|
||
|
||
**Symptom:** P&L accumulates during the 5-10s verification wait
|
||
|
||
**Root Cause:** Monitoring loop continues during verification, detecting "external closure" multiple times
|
||
|
||
**Fix Applied:** `closingInProgress` flag:
|
||
```typescript
|
||
if ((result as any).needsVerification) {
|
||
trade.closingInProgress = true
|
||
trade.closeConfirmedAt = Date.now()
|
||
console.log(`🔒 Marked as closing in progress - external closure detection disabled`)
|
||
return
|
||
}
|
||
|
||
// Skip external closure check if closingInProgress
|
||
if ((position === null || position.size === 0) && !trade.closingInProgress) {
|
||
// ... handle external closure
|
||
}
|
||
```
|
||
|
||
**Related:** Pitfalls #27, #49
|
||
|
||
---
|
||
|
||
### Pitfall #49: P&L Exponential Compounding in External Closure Detection (🔴 CRITICAL - Fixed Nov 17, 2025)
|
||
|
||
**Symptom:** Database P&L shows 15-20× actual value ($92.46 when Drift shows $6.00)
|
||
|
||
**Root Cause:** `trade.realizedPnL` was being mutated during each external closure detection cycle
|
||
|
||
**Real Incident (Nov 17, 13:54 CET):**
|
||
- SOL-PERP SHORT closed by on-chain orders
|
||
- Actual P&L: ~$6.00, Database recorded: $92.46 (15.4× too high)
|
||
- Rate limiting caused 15+ detection cycles → $6 → $12 → $24 → $48 → $96
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// DON'T mutate trade.realizedPnL - causes compounding!
|
||
// trade.realizedPnL = totalRealizedPnL ← REMOVED
|
||
|
||
// Use local variable for DB update
|
||
await updateTradeExit({
|
||
realizedPnL: totalRealizedPnL, // Use local variable
|
||
})
|
||
```
|
||
|
||
**Commit:** 6156c0f
|
||
|
||
**Lesson Learned:** In monitoring loops, NEVER mutate shared state during calculation phases. Calculate locally, update shared state ONCE at the end.
|
||
|
||
---
|
||
|
||
### Pitfall #50: Database Not Tracking Trades (🔴 CRITICAL - RESOLVED Nov 19, 2025)
|
||
|
||
**Symptom:** Drift UI shows 6 trades, database shows only 3 trades
|
||
|
||
**Root Cause:** P&L compounding bug (#49) - in-memory object with stale/accumulated values
|
||
|
||
**Fix Applied:** Calculate P&L from immutable source values (entry/exit prices), never from in-memory fields
|
||
|
||
---
|
||
|
||
### Pitfall #51: TP1 Detection Fails When On-Chain Orders Fill Fast (🔴 CRITICAL - Fixed Nov 19, 2025)
|
||
|
||
**Symptom:** TP1 order fills, but database records exitReason as "SL" instead of "TP1"
|
||
|
||
**Root Cause:** Position Manager detects closure AFTER both TP1 and runner already closed on-chain
|
||
|
||
**Real Incident:** LONG opened, TP1+runner closed within 7 minutes, `trade.tp1Hit = false`
|
||
|
||
**Fix Applied:** Simple percentage-based exit reason:
|
||
```typescript
|
||
if (runnerProfitPercent > 0.3) {
|
||
if (runnerProfitPercent >= 1.2) {
|
||
exitReason = 'TP2' // Large profit (>1.2%)
|
||
} else {
|
||
exitReason = 'TP1' // Moderate profit (0.3-1.2%)
|
||
}
|
||
} else {
|
||
exitReason = 'SL' // Negative or tiny profit (<0.3%)
|
||
}
|
||
```
|
||
|
||
**Commit:** de57c96
|
||
|
||
---
|
||
|
||
### Pitfall #52: ADX-Based Runner SL Only Applied in One Code Path (🔴 CRITICAL - Fixed Nov 19, 2025)
|
||
|
||
**Symptom:** TP1 fills via on-chain order, runner gets breakeven SL instead of ADX-based positioning
|
||
|
||
**Root Cause:** Two TP1 detection paths, only one had ADX logic
|
||
|
||
**Fix Applied:** Added ADX-based runner SL to on-chain fill detection path (lines 607-642)
|
||
|
||
**Commits:** b2cb6a3, 66b2922
|
||
|
||
---
|
||
|
||
### Pitfall #53: Container Restart Kills Positions + Phantom Detection Bug (🔴 CRITICAL - Fixed Nov 19, 2025)
|
||
|
||
**Two bugs from container restart:**
|
||
|
||
**Bug 1: Startup order restore failure**
|
||
- Wrong database field names (`takeProfit1OrderTx` vs correct `tp1OrderTx`)
|
||
- Fix: Use correct field names
|
||
|
||
**Bug 2: Phantom detection killing runners**
|
||
- Runners (40% remaining) flagged as phantom
|
||
- Fix: Check `!trade.tp1Hit` before phantom detection:
|
||
```typescript
|
||
const wasPhantom = !trade.tp1Hit && trade.currentSize > 0 && (trade.currentSize / trade.positionSize) < 0.5
|
||
```
|
||
|
||
**Commit:** eccecf7
|
||
|
||
---
|
||
|
||
### Pitfall #54: MFE/MAE Storing Dollars Instead of Percentages (🔴 CRITICAL - Fixed Nov 23, 2025)
|
||
|
||
**Symptom:** Database showing maxFavorableExcursion = 64.08% when TradingView showed 0.48%
|
||
|
||
**Root Cause:** Position Manager storing DOLLAR amounts instead of PERCENTAGES
|
||
|
||
**Real Incident:** 133× inflation (64.08% stored vs 0.48% actual)
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// BEFORE (BROKEN):
|
||
if (currentPnLDollars > trade.maxFavorableExcursion) {
|
||
trade.maxFavorableExcursion = currentPnLDollars // Storing $64.08
|
||
|
||
// AFTER (FIXED):
|
||
if (profitPercent > trade.maxFavorableExcursion) {
|
||
trade.maxFavorableExcursion = profitPercent // Storing 0.48%
|
||
```
|
||
|
||
**Commit:** 6255662
|
||
|
||
**Lesson Learned:** Always verify data storage units match schema expectations. Comments don't override schema.
|
||
|
||
---
|
||
|
||
### Pitfall #55: Configuration Issues (🔴 CRITICAL - Fixed Nov 19-20, 2025)
|
||
|
||
**Two configuration bugs:**
|
||
|
||
**Bug 1: Settings UI quality score variable name mismatch**
|
||
- Settings API used `MIN_QUALITY_SCORE` (wrong)
|
||
- Code actually reads `MIN_SIGNAL_QUALITY_SCORE` (correct)
|
||
- User changes in UI had ZERO effect
|
||
|
||
**Bug 2: BlockedSignalTracker using Pyth cache instead of Drift oracle**
|
||
- `priceAfter1Min/5Min/15Min/30Min` fields staying NULL
|
||
- Fix: Use `driftService.getOraclePrice()` instead of `getPythPriceMonitor().getCachedPrice()`
|
||
|
||
**Commit:** 6b00303
|
||
|
||
---
|
||
|
||
### Pitfall #56: Ghost Orders After External Closures (🔴 CRITICAL - Fixed Nov 20-21, 2025)
|
||
|
||
**Symptom:** Position closed, but TP/SL orders remain active on Drift
|
||
|
||
**Root Cause:** External closure handler didn't call `cancelAllOrders()` before completing
|
||
|
||
**Real Incident:** Risk of ghost order filling → unintended positions
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// In external closure handler:
|
||
console.log(`🗑️ Cancelling remaining orders for ${trade.symbol}...`)
|
||
const cancelResult = await cancelAllOrders(trade.symbol)
|
||
```
|
||
|
||
**Additional Bug:** False positive "32 open orders" on restart
|
||
- Fix: Check `baseAssetAmount.eq(new BN(0))` to filter truly active orders
|
||
|
||
**Commits:** a3a6222 (Nov 20), 29fce01 (Nov 21)
|
||
|
||
---
|
||
|
||
### Pitfall #57: P&L Calculation Inaccuracy for External Closures (🔴 CRITICAL - Fixed Nov 20, 2025)
|
||
|
||
**Symptom:** Database P&L shows -$101.68 when Drift UI shows -$138.35 (36% error)
|
||
|
||
**Root Cause:** External closure handler calculates P&L from monitoring loop's `currentPrice`, which lags behind actual fill price
|
||
|
||
**Fix Applied:** Query Drift's actual settledPnL:
|
||
```typescript
|
||
const position = userAccount.perpPositions.find((p: any) =>
|
||
p.marketIndex === marketConfig.driftMarketIndex
|
||
)
|
||
const settledPnL = Number(position.settledPnl || 0) / 1e6 // Convert to USD
|
||
if (Math.abs(settledPnL) > 0.01) {
|
||
totalRealizedPnL = settledPnL
|
||
console.log(`✅ Using Drift's actual P&L: $${totalRealizedPnL.toFixed(2)}`)
|
||
}
|
||
```
|
||
|
||
**Commit:** 8e600c8
|
||
|
||
---
|
||
|
||
### Pitfall #58: 5-Layer Database Protection System (⚠️ HIGH - Implemented Nov 21, 2025)
|
||
|
||
**Purpose:** Bulletproof protection against untracked positions from database failures
|
||
|
||
**5 Layers:**
|
||
1. **Persistent File Logger** (`lib/utils/persistent-logger.ts`) - Survives container restarts
|
||
2. **Database Save with Retry + Verification** - 3 retries with exponential backoff
|
||
3. **Orphan Position Detection** - Runs on EVERY container startup
|
||
4. **Critical Logging in Execute Endpoint** - Full trade details for recovery
|
||
5. **Infrastructure (Docker volumes)** - `./logs:/app/logs`
|
||
|
||
**Real-world validation:** Nov 21, 2025 - No database failure occurred, but protection now in place
|
||
|
||
---
|
||
|
||
### Pitfall #59: Layer 2 Ghost Detection Causing Duplicate Telegram Notifications (🔴 CRITICAL - Fixed Nov 22, 2025)
|
||
|
||
**Symptom:** Trade #8 sent 13 duplicate notifications with compounding P&L ($11.50 → $155.05)
|
||
|
||
**Root Cause:** Layer 2 ghost detection (failureCount > 20) didn't check `closingInProgress` flag
|
||
|
||
**Real Incident (Nov 22, 04:05 CET):**
|
||
- Actual P&L: +$18.79, Database final: $155.05 (8.2× actual)
|
||
- Rate limit storm: 6,581 failed close attempts
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// AFTER (FIXED):
|
||
if (trade.priceCheckCount > 20 && !trade.closingInProgress) {
|
||
if (!position || Math.abs(position.size) < 0.01) {
|
||
trade.closingInProgress = true
|
||
trade.closeConfirmedAt = Date.now()
|
||
await this.handleExternalClosure(trade, 'Layer 2: Ghost detected')
|
||
return
|
||
}
|
||
}
|
||
```
|
||
|
||
**Commit:** b19f156
|
||
|
||
---
|
||
|
||
### Pitfall #60: Stale Array Snapshot in Monitoring Loop (🔴 CRITICAL - Fixed Nov 23, 2025)
|
||
|
||
**Symptom:** Manual closure sends duplicate "POSITION CLOSED" Telegram notifications
|
||
|
||
**Root Cause:** Position Manager creates array snapshot before async processing
|
||
|
||
**Real Incident:** Two identical notifications for cmibdii4k0004pe07nzfmturo
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
private async checkTradeConditions(trade: ActiveTrade, currentPrice: number): Promise<void> {
|
||
// CRITICAL FIX: Check if trade still in monitoring
|
||
if (!this.activeTrades.has(trade.id)) {
|
||
console.log(`⏭️ Skipping ${trade.symbol} - already removed from monitoring`)
|
||
return
|
||
}
|
||
// ... rest of function
|
||
}
|
||
```
|
||
|
||
**Commit:** a7c5930
|
||
|
||
---
|
||
|
||
### Pitfall #61: P&L Compounding STILL Happening Despite All Guards (🔴 CRITICAL - Under Investigation Nov 24, 2025)
|
||
|
||
**Symptom:** Trade showed $974.05 P&L when actual was $72.41 (13.4× inflation)
|
||
|
||
**Evidence:** 14 duplicate Telegram notifications with compounding P&L
|
||
|
||
**Status:** All existing guards in place, yet duplicates still occurred
|
||
|
||
**Interim Fix:** Manual P&L correction, container restart with enhanced closingInProgress flag
|
||
|
||
**Investigation Needed:**
|
||
- Serialization lock around external closure detection
|
||
- Unique transaction ID to prevent duplicate DB updates
|
||
- Telegram notification deduplication
|
||
|
||
**Commit:** 0466295
|
||
|
||
---
|
||
|
||
### Pitfall #62: Adaptive Leverage and Quality Bypass (🔴 CRITICAL - Fixed Nov 24-27, 2025)
|
||
|
||
**Two related bugs:**
|
||
|
||
**Bug 1: Adaptive leverage not working (Nov 24)**
|
||
- `USE_ADAPTIVE_LEVERAGE` ENV variable not set in .env
|
||
- Quality 90 trade used 15x instead of intended 10x
|
||
|
||
**Bug 2: Execute endpoint bypassing quality threshold (Nov 27)**
|
||
- Bot executed trades at quality 30, 50, 50 when minimum is 90/95
|
||
- Execute endpoint calculated quality but never validated it
|
||
|
||
**Fix Applied (Nov 27):**
|
||
```typescript
|
||
if (qualityResult.score < minQualityScore) {
|
||
console.log(`❌ QUALITY TOO LOW: ${qualityResult.score} < ${minQualityScore} threshold`)
|
||
return NextResponse.json({
|
||
success: false,
|
||
error: 'Quality score too low',
|
||
}, { status: 400 })
|
||
}
|
||
console.log(`✅ Quality check passed: ${qualityResult.score} >= ${minQualityScore}`)
|
||
```
|
||
|
||
**Commit:** cefa3e6
|
||
|
||
---
|
||
|
||
### Pitfall #63: Smart Entry Validation System (⚠️ HIGH - Deployed Nov 30, 2025)
|
||
|
||
**Purpose:** Recover profits from marginal quality signals (50-89)
|
||
|
||
**Implementation:** `lib/trading/smart-validation-queue.ts` (330+ lines)
|
||
|
||
**Threshold Results (Dec 1, 2025):**
|
||
- **±0.3%:** 28/200 entries (14%), 67.9% WR, +4.73% total ✅
|
||
- ±0.2%: 51/200 entries (26%), 43.1% WR, -18.49% total
|
||
- ±0.15%: 73/200 entries (36%), 35.6% WR, -38.27% total
|
||
|
||
**Commit:** 7c9cfba
|
||
|
||
---
|
||
|
||
### Pitfall #64: EPYC Cluster SSH Timeout (🔴 CRITICAL - Fixed Dec 1, 2025)
|
||
|
||
**Symptom:** Coordinator reports "SSH command timed out for v9_chunk_000002 on worker1"
|
||
|
||
**Root Cause:** 30-second subprocess timeout insufficient for nested SSH hop (master → worker1 → worker2)
|
||
|
||
**Fix Applied:**
|
||
```python
|
||
ssh_opts = "-o StrictHostKeyChecking=no -o ConnectTimeout=10 -o ServerAliveInterval=5"
|
||
result = subprocess.run(ssh_cmd, timeout=60) # Increased from 30s to 60s
|
||
```
|
||
|
||
**Commit:** ef371a1
|
||
|
||
**Lesson Learned:** Nested SSH hops need 2× minimum timeout. Latency compounds at each hop.
|
||
|
||
---
|
||
|
||
### Pitfall #65: Distributed Worker Quality Filter - Dict vs Callable (🔴 CRITICAL - Fixed Dec 1, 2025)
|
||
|
||
**Symptom:** ALL 2,096 distributed backtests returned 0 trades
|
||
|
||
**Root Cause:** Passed dict `{'min_adx': 15, 'min_volume_ratio': vol_min}` instead of lambda function
|
||
|
||
**Error:** `'dict' object is not callable`
|
||
|
||
**Fix Applied:**
|
||
```python
|
||
# BEFORE (BROKEN):
|
||
quality_filter = {'min_adx': 15, 'min_volume_ratio': vol_min}
|
||
|
||
# AFTER (FIXED):
|
||
if vol_min > 0:
|
||
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
|
||
else:
|
||
quality_filter = None
|
||
```
|
||
|
||
**Commit:** 11a0ea3
|
||
|
||
**Lesson Learned:** Silent failures more dangerous than crashes. Exception handler hid severity by returning zeros.
|
||
|
||
---
|
||
|
||
### Pitfall #66: Smart Entry Wrong Price Display (🔴 CRITICAL - Fixed Dec 1, 2025)
|
||
|
||
**Symptom:** Abandonment notifications showing impossible prices ($126 → $98 = -22% in 30 seconds)
|
||
|
||
**Root Cause:** Symbol format mismatch between validation queue ("SOLUSDT") and market data cache ("SOL-PERP")
|
||
|
||
**Real Incident:** Cache lookup `marketDataCache.get("SOLUSDT")` returned null
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// Normalize symbol before validation queue
|
||
const normalizedSymbol = normalizeTradingViewSymbol(body.symbol)
|
||
|
||
const queued = await validationQueue.addSignal({
|
||
symbol: normalizedSymbol, // Use normalized format for cache lookup
|
||
// ...
|
||
})
|
||
```
|
||
|
||
**Commit:** 6cec2e8
|
||
|
||
---
|
||
|
||
### Pitfall #67: Ghost Detection Race Condition (🔴 CRITICAL - Fixed Dec 2, 2025)
|
||
|
||
**Symptom:** 23 duplicate "POSITION CLOSED" notifications with P&L compounding (-$47.96 to -$1,129.24)
|
||
|
||
**Root Cause:** Race condition in ghost detection - check `Map.has()` happened AFTER function entry
|
||
|
||
**Real Incident (Dec 2, 17:20 CET):**
|
||
- Expected P&L: ~-$48
|
||
- Actual: 23 notifications with compounding P&L
|
||
|
||
**Fix Applied:** Use Map.delete() atomic return value as deduplication lock:
|
||
```typescript
|
||
// FIXED CODE:
|
||
async handleExternalClosure(trade: ActiveTrade, reason: string) {
|
||
const tradeId = trade.id
|
||
|
||
// ✅ Delete IMMEDIATELY - atomic operation
|
||
if (!this.activeTrades.delete(tradeId)) {
|
||
console.log('DUPLICATE PREVENTED (atomic lock)')
|
||
return
|
||
}
|
||
|
||
// ONLY first caller reaches here
|
||
// ... rest of cleanup
|
||
}
|
||
```
|
||
|
||
**Commit:** 93dd950
|
||
|
||
**Lesson Learned:** When async handler can be called by multiple code paths simultaneously, use atomic operations (like Map.delete()) as locks at function entry.
|
||
|
||
---
|
||
|
||
### Pitfall #68: Smart Entry Using Webhook Percentage as Signal Price (🔴 CRITICAL - Fixed Dec 3, 2025)
|
||
|
||
**Symptom:** $89 position sizes, 97% pullback calculations, impossible entry conditions
|
||
|
||
**Root Cause:** TradingView webhook `signal.price` contained percentage (70.80) instead of market price ($142.50)
|
||
|
||
**Real Incident:** Smart Entry log showed "97.4% pullback required" (impossible)
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// Use Pyth current price instead of webhook signal price
|
||
const pythPrice = await pythClient.getPrice(symbol)
|
||
const signalPrice = pythPrice.price // ✅ Use actual market price
|
||
```
|
||
|
||
**Commit:** 7d0d38a
|
||
|
||
**Lesson Learned:** Never trust webhook data for calculations. Use authoritative price sources (Pyth, Drift).
|
||
|
||
---
|
||
|
||
### Pitfall #69: Direction-Specific Leverage Thresholds Not Explicit (🟡 MEDIUM - Fixed Dec 3, 2025)
|
||
|
||
**Symptom:** Leverage code checked quality score without explicit direction context
|
||
|
||
**Root Cause:** Code pattern was ambiguous about which direction's threshold applied
|
||
|
||
**Fix Applied:** Made direction-specific thresholds explicit:
|
||
```typescript
|
||
if (body.direction === 'LONG') {
|
||
if (qualityResult.score >= 90) leverage = 5
|
||
// ...
|
||
} else { // SHORT
|
||
if (qualityResult.score >= 90) leverage = 5 // Same as LONG but explicit
|
||
// ...
|
||
}
|
||
```
|
||
|
||
**Commit:** 58f812f
|
||
|
||
---
|
||
|
||
### Pitfall #70: Smart Validation Queue Rejected by Execute Endpoint (🔴 CRITICAL - Fixed Dec 3, 2025)
|
||
|
||
**Symptom:** Quality 50-89 signals validated by queue get rejected with "Quality score too low"
|
||
|
||
**Root Cause:** Execute endpoint applies quality threshold check AFTER validation queue confirmed price action
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
const isValidatedEntry = body.validatedEntry === true
|
||
|
||
if (isValidatedEntry) {
|
||
console.log(`✅ VALIDATED ENTRY BYPASS: Quality ${qualityResult.score} accepted`)
|
||
}
|
||
|
||
// Only apply quality threshold if NOT a validated entry
|
||
if (!isValidatedEntry && qualityResult.score < minQualityScore) {
|
||
return NextResponse.json({ error: 'Quality too low' }, { status: 400 })
|
||
}
|
||
```
|
||
|
||
**Commit:** 785b09e
|
||
|
||
---
|
||
|
||
### Pitfall #71: Revenge System Missing External Closure Integration (🔴 CRITICAL - Fixed Dec 3, 2025)
|
||
|
||
**Symptom:** High-quality signals (85+) stopped by external closures don't trigger revenge window
|
||
|
||
**Root Cause:** Revenge eligibility check only existed in executeExit() path, not handleExternalClosure()
|
||
|
||
**Real Incident (Nov 20):** Quality 90 SHORT at $141.37, stopped at $142.48 (-$138.35), price dropped to $131.32 (+$490 opportunity missed)
|
||
|
||
**Fix Applied:**
|
||
```typescript
|
||
// In external closure handler:
|
||
if (exitReason === 'SL' && trade.signalQualityScore && trade.signalQualityScore >= 85) {
|
||
console.log(`🎯 External SL closure - Quality ${trade.signalQualityScore} >= 85`)
|
||
await stopHuntTracker.recordStopHunt({
|
||
originalTradeId: trade.id,
|
||
symbol: trade.symbol,
|
||
direction: trade.direction,
|
||
stopHuntPrice: currentPrice,
|
||
originalEntryPrice: trade.entryPrice,
|
||
originalQualityScore: trade.signalQualityScore,
|
||
stopLossAmount: Math.abs(totalRealizedPnL)
|
||
})
|
||
console.log(`✅ Revenge window activated for external closure (30min monitoring)`)
|
||
}
|
||
```
|
||
|
||
**Commit:** 785b09e
|
||
|
||
---
|
||
|
||
### Pitfall #72: Telegram Webhook Conflicts with Polling Bot (🔴 CRITICAL - Fixed Dec 4, 2025)
|
||
|
||
**Symptom:** Python Telegram bot crashes with "Conflict: can't use getUpdates method while webhook is active"
|
||
|
||
**Root Cause:** n8n had active Telegram webhook that intercepted ALL messages before Python bot
|
||
|
||
**Real Incident:** `/status` command returned n8n test message with broken template syntax
|
||
|
||
**Fix Applied:**
|
||
```bash
|
||
# Delete Telegram webhook
|
||
curl -s "https://api.telegram.org/bot{TOKEN}/deleteWebhook"
|
||
|
||
# Restart Python bot
|
||
docker restart telegram-trade-bot
|
||
```
|
||
|
||
**Architecture Decision:** Cannot run both n8n webhook AND Python polling bot simultaneously. Choose one.
|
||
|
||
---
|
||
|
||
## Appendix: Pattern Recognition
|
||
|
||
### Common Root Causes
|
||
|
||
1. **Race Conditions:** Multiple code paths detecting same event (P&L compounding bugs #48, #49, #59, #60, #67)
|
||
2. **Unit Mismatches:** Tokens vs USD, dollars vs percentages (#24, #54)
|
||
3. **Symbol Format:** TradingView ("SOLUSDT") vs Drift ("SOL-PERP") (#5, #66)
|
||
4. **Deployment Verification:** Declaring "fixed" without checking container timestamp (#31)
|
||
5. **SDK Behavior:** Documentation doesn't match reality (#2, #24, #45)
|
||
6. **Async Timing:** Operations completing out of expected order (#13, #28, #60)
|
||
|
||
### Prevention Strategies
|
||
|
||
1. **Use atomic operations** for state changes (Map.delete() returns boolean)
|
||
2. **Always normalize symbols** at integration boundaries
|
||
3. **Verify deployment** with container timestamp vs commit time
|
||
4. **Never mutate shared state** during calculation phases
|
||
5. **Add explicit checks** in ALL code paths, not just happy path
|
||
6. **Test with real infrastructure** before trusting provider claims
|
||
|
||
---
|
||
|
||
## Cross-Reference Index
|
||
|
||
- **See Also:** `.github/copilot-instructions.md` - Main AI agent instructions with Top 10 Critical Pitfalls
|
||
- **Related:** `docs/bugs/` - Additional bug documentation
|
||
- **Related:** `docs/architecture/` - System design context
|
||
|
||
---
|
||
|
||
**Last Updated:** December 4, 2025
|
||
**Maintainer:** AI Agent team following "NOTHING gets lost" principle
|