docs: Add Common Pitfall #37 - Ghost position accumulation
Documented: - Root cause: Failed DB updates leaving exitReason NULL - Impact: Rate limit storms from managing non-existent positions - Real incidents: Nov 14-15, 4+ ghost positions tracked - Solution: Periodic validation every 5 minutes with auto-cleanup - Implementation details with code examples - Benefits: Self-healing, minimal overhead, prevents recurrence - Why paid RPC doesn't fix (state management vs capacity)
This commit is contained in:
75
.github/copilot-instructions.md
vendored
75
.github/copilot-instructions.md
vendored
@@ -1496,6 +1496,81 @@ trade.realizedPnL += actualRealizedPnL // NOT: result.realizedPnL from SDK
|
||||
- **Verification:** After restart, check logs for "Found 0 open trades" (not "Found 1 open trades to restore")
|
||||
- **Lesson:** status field is for classification, exitReason is for lifecycle management - both must be set on closure
|
||||
|
||||
37. **Ghost position accumulation from failed DB updates (CRITICAL - Fixed Nov 15, 2025):**
|
||||
- **Symptom:** Position Manager tracking 4+ positions simultaneously when database shows only 1 open trade
|
||||
- **Root Cause:** Database has `exitReason IS NULL` for positions actually closed on Drift
|
||||
- **Impact:** Rate limit storms (4 positions × monitoring × order updates = 100+ RPC calls/second)
|
||||
- **Bug sequence:**
|
||||
1. Position closed externally (on-chain TP/SL order fills)
|
||||
2. Position Manager attempts database update but fails silently
|
||||
3. Trade remains in database with `exitReason IS NULL`
|
||||
4. Container restart → Position Manager restores "open" trade from DB
|
||||
5. Position doesn't exist on Drift but is tracked in memory = ghost position
|
||||
6. Accumulates over time: 1 ghost → 2 ghosts → 4+ ghosts
|
||||
7. Each ghost triggers monitoring, order updates, price checks
|
||||
8. RPC rate limit exhaustion → 429 errors → system instability
|
||||
- **Real incidents:**
|
||||
* Nov 14: Untracked 0.09 SOL position with no TP/SL protection
|
||||
* Nov 15 19:01: Position Manager tracking 4+ ghosts, massive rate limiting, "vanishing orders"
|
||||
* After cleanup: 4+ ghosts → 1 actual position, system stable
|
||||
- **Why manual restarts worked:** Forced Position Manager to re-query Drift, but didn't prevent recurrence
|
||||
- **Solution:** Periodic Drift position validation (Nov 15, 2025)
|
||||
```typescript
|
||||
// In lib/trading/position-manager.ts:
|
||||
|
||||
// Schedule validation every 5 minutes
|
||||
private scheduleValidation(): void {
|
||||
this.validationInterval = setInterval(async () => {
|
||||
await this.validatePositions()
|
||||
}, 5 * 60 * 1000)
|
||||
}
|
||||
|
||||
// Validate tracked positions against Drift reality
|
||||
private async validatePositions(): Promise<void> {
|
||||
for (const [tradeId, trade] of this.activeTrades) {
|
||||
const position = await driftService.getPosition(marketConfig.driftMarketIndex)
|
||||
|
||||
// Ghost detected: tracked but missing on Drift
|
||||
if (!position || Math.abs(position.size) < 0.01) {
|
||||
console.log(`🔴 Ghost position detected: ${trade.symbol}`)
|
||||
await this.handleExternalClosure(trade, 'Ghost position cleanup')
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
// Reusable ghost cleanup method
|
||||
private async handleExternalClosure(trade: ActiveTrade, reason: string): Promise<void> {
|
||||
// Remove from monitoring FIRST (prevent race conditions)
|
||||
this.activeTrades.delete(trade.id)
|
||||
|
||||
// Update database with estimated P&L
|
||||
await updateTradeExit({
|
||||
positionId: trade.positionId,
|
||||
exitPrice: trade.lastPrice,
|
||||
exitReason: 'manual', // Ghost closures = manual
|
||||
realizedPnL: estimatedPnL,
|
||||
exitOrderTx: reason, // Store cleanup reason
|
||||
...
|
||||
})
|
||||
|
||||
if (this.activeTrades.size === 0) {
|
||||
this.stopMonitoring()
|
||||
}
|
||||
}
|
||||
```
|
||||
- **Behavior:** Auto-detects and cleans ghosts every 5 minutes, no manual intervention
|
||||
- **RPC overhead:** Minimal (1 check per 5 min per position = ~288 calls/day for 1 position)
|
||||
- **Benefits:**
|
||||
* Self-healing system prevents ghost accumulation
|
||||
* Eliminates rate limit storms from ghost management
|
||||
* No more manual container restarts needed
|
||||
* Addresses root cause (state management) not symptom (rate limits)
|
||||
- **Logs:** `🔍 Scheduled position validation every 5 minutes` on startup
|
||||
- **Monitoring:** `🔴 Ghost position detected` + `✅ Ghost position cleaned up` in logs
|
||||
- **Verification:** Container restart shows 1 position, not 4+ like before
|
||||
- **Why paid RPC doesn't fix this:** Ghost positions are state management bug, not capacity issue
|
||||
- **Lesson:** Periodic validation of in-memory state against authoritative source prevents state drift
|
||||
|
||||
## File Conventions
|
||||
|
||||
- **API routes:** `app/api/[feature]/[action]/route.ts` (Next.js 15 App Router)
|
||||
|
||||
Reference in New Issue
Block a user