docs: Add Common Pitfall #37 - Ghost position accumulation

Documented:
- Root cause: Failed DB updates leaving exitReason NULL
- Impact: Rate limit storms from managing non-existent positions
- Real incidents: Nov 14-15, 4+ ghost positions tracked
- Solution: Periodic validation every 5 minutes with auto-cleanup
- Implementation details with code examples
- Benefits: Self-healing, minimal overhead, prevents recurrence
- Why paid RPC doesn't fix (state management vs capacity)
This commit is contained in:
mindesbunister
2025-11-15 19:22:06 +01:00
parent d236e08cc0
commit ffccf84676

View File

@@ -1496,6 +1496,81 @@ trade.realizedPnL += actualRealizedPnL // NOT: result.realizedPnL from SDK
- **Verification:** After restart, check logs for "Found 0 open trades" (not "Found 1 open trades to restore")
- **Lesson:** status field is for classification, exitReason is for lifecycle management - both must be set on closure
37. **Ghost position accumulation from failed DB updates (CRITICAL - Fixed Nov 15, 2025):**
- **Symptom:** Position Manager tracking 4+ positions simultaneously when database shows only 1 open trade
- **Root Cause:** Database has `exitReason IS NULL` for positions actually closed on Drift
- **Impact:** Rate limit storms (4 positions × monitoring × order updates = 100+ RPC calls/second)
- **Bug sequence:**
1. Position closed externally (on-chain TP/SL order fills)
2. Position Manager attempts database update but fails silently
3. Trade remains in database with `exitReason IS NULL`
4. Container restart → Position Manager restores "open" trade from DB
5. Position doesn't exist on Drift but is tracked in memory = ghost position
6. Accumulates over time: 1 ghost → 2 ghosts → 4+ ghosts
7. Each ghost triggers monitoring, order updates, price checks
8. RPC rate limit exhaustion → 429 errors → system instability
- **Real incidents:**
* Nov 14: Untracked 0.09 SOL position with no TP/SL protection
* Nov 15 19:01: Position Manager tracking 4+ ghosts, massive rate limiting, "vanishing orders"
* After cleanup: 4+ ghosts → 1 actual position, system stable
- **Why manual restarts worked:** Forced Position Manager to re-query Drift, but didn't prevent recurrence
- **Solution:** Periodic Drift position validation (Nov 15, 2025)
```typescript
// In lib/trading/position-manager.ts:
// Schedule validation every 5 minutes
private scheduleValidation(): void {
this.validationInterval = setInterval(async () => {
await this.validatePositions()
}, 5 * 60 * 1000)
}
// Validate tracked positions against Drift reality
private async validatePositions(): Promise<void> {
for (const [tradeId, trade] of this.activeTrades) {
const position = await driftService.getPosition(marketConfig.driftMarketIndex)
// Ghost detected: tracked but missing on Drift
if (!position || Math.abs(position.size) < 0.01) {
console.log(`🔴 Ghost position detected: ${trade.symbol}`)
await this.handleExternalClosure(trade, 'Ghost position cleanup')
}
}
}
// Reusable ghost cleanup method
private async handleExternalClosure(trade: ActiveTrade, reason: string): Promise<void> {
// Remove from monitoring FIRST (prevent race conditions)
this.activeTrades.delete(trade.id)
// Update database with estimated P&L
await updateTradeExit({
positionId: trade.positionId,
exitPrice: trade.lastPrice,
exitReason: 'manual', // Ghost closures = manual
realizedPnL: estimatedPnL,
exitOrderTx: reason, // Store cleanup reason
...
})
if (this.activeTrades.size === 0) {
this.stopMonitoring()
}
}
```
- **Behavior:** Auto-detects and cleans ghosts every 5 minutes, no manual intervention
- **RPC overhead:** Minimal (1 check per 5 min per position = ~288 calls/day for 1 position)
- **Benefits:**
* Self-healing system prevents ghost accumulation
* Eliminates rate limit storms from ghost management
* No more manual container restarts needed
* Addresses root cause (state management) not symptom (rate limits)
- **Logs:** `🔍 Scheduled position validation every 5 minutes` on startup
- **Monitoring:** `🔴 Ghost position detected` + `✅ Ghost position cleaned up` in logs
- **Verification:** Container restart shows 1 position, not 4+ like before
- **Why paid RPC doesn't fix this:** Ghost positions are state management bug, not capacity issue
- **Lesson:** Periodic validation of in-memory state against authoritative source prevents state drift
## File Conventions
- **API routes:** `app/api/[feature]/[action]/route.ts` (Next.js 15 App Router)