docs: Document closePosition retry logic bug (Common Pitfall #36)
CRITICAL BUG: Missing retry wrapper caused rate limit storm Real Incident (Nov 15, 16:49 CET): - Trade cmi0il8l30000r607l8aec701 triggered close attempt - closePosition() had NO retryWithBackoff() wrapper - Failed with 429 → Position Manager retried EVERY 2 SECONDS - 100+ close attempts exhausted Helius rate limit - On-chain TP2 filled during storm - External closure detected 8 times: $0.14 → $0.51 (compounding bug) Why This Was Missed: - placeExitOrders() got retry wrapper on Nov 14 - openPosition() still has no wrapper (less critical - runs once) - closePosition() overlooked - MOST CRITICAL because runs in monitoring loop - Position Manager executeExit() catches 429 and returns early - But monitoring continues, retries close every 2s = infinite loop The Fix: - Wrapped closePosition() placePerpOrder() with retryWithBackoff() - 8s base delay, 3 max retries (same as placeExitOrders) - Reduces RPC load by 30-50x during close operations - Container deployed 18:05 CET Nov 15 Impact: Prevents rate limit exhaustion + duplicate external closure updates Files: .github/copilot-instructions.md (added Common Pitfall #36)
This commit is contained in:
48
.github/copilot-instructions.md
vendored
48
.github/copilot-instructions.md
vendored
@@ -1448,6 +1448,54 @@ trade.realizedPnL += actualRealizedPnL // NOT: result.realizedPnL from SDK
|
||||
- **Verification:** After restart, check logs for "Found 0 open trades" (not "Found 1 open trades to restore")
|
||||
- **Lesson:** status field is for classification, exitReason is for lifecycle management - both must be set on closure
|
||||
|
||||
36. **closePosition() missing retry logic causes rate limit storm (CRITICAL - Fixed Nov 15, 2025):**
|
||||
- **Symptom:** Position Manager tries to close trade, gets 429 error, retries EVERY 2 SECONDS → 100+ failed attempts → rate limit exhaustion
|
||||
- **Root Cause:** `placeExitOrders()` has `retryWithBackoff()` wrapper (Nov 14 fix), but `closePosition()` did NOT
|
||||
- **Real incident:** Trade cmi0il8l30000r607l8aec701 (Nov 15, 16:49 CET)
|
||||
1. Position Manager tried to close (SL or TP trigger)
|
||||
2. closePosition() called raw `placePerpOrder()` → 429 error
|
||||
3. executeExit() caught 429, returned early (line 935-940)
|
||||
4. Position Manager kept monitoring, retried close EVERY 2 seconds
|
||||
5. Logs show 100+ "❌ Failed to close position: 429" + "⚠️ Rate limited while closing SOL-PERP"
|
||||
6. Meanwhile: On-chain TP2 limit order filled (unaffected by SDK rate limits)
|
||||
7. External closure detected, DB updated 8 TIMES: $0.14 → $0.20 → $0.26 → ... → $0.51
|
||||
8. Container eventually restarted (likely from rate limit exhaustion)
|
||||
- **Why duplicate updates:** Common Pitfall #27 fix (remove from Map before DB update) works UNLESS rate limits cause tons of retries before external closure detection
|
||||
- **Impact:** User saw $0.51 profit in DB, $0.03 on Drift UI (8× compounding vs 1 actual fill)
|
||||
- **Fix:** Wrapped closePosition() with retryWithBackoff() in lib/drift/orders.ts:
|
||||
```typescript
|
||||
// Line ~567 (BEFORE):
|
||||
const txSig = await driftClient.placePerpOrder(orderParams)
|
||||
|
||||
// Line ~567 (AFTER):
|
||||
const txSig = await retryWithBackoff(async () => {
|
||||
return await driftClient.placePerpOrder(orderParams)
|
||||
}, 3, 8000) // 8s base delay, 3 max retries (8s → 16s → 32s)
|
||||
```
|
||||
- **Behavior now:** 3 SDK retries over 56s (8+16+32) + Position Manager natural retry on next monitoring cycle = robust without spam
|
||||
- **RPC load reduction:** 30-50× fewer requests during close operations (3 retries vs 100+ attempts)
|
||||
- **Verification:** Container restarted 18:05 CET Nov 15, code deployed
|
||||
- **Lesson:** EVERY SDK order operation (open, close, cancel, place) MUST have retry wrapper - Position Manager monitoring creates infinite retry loop without it
|
||||
- **Root Cause:** Phantom auto-closure sets `status='phantom'` but leaves `exitReason=NULL`
|
||||
- **Bug:** Startup validator checks `exitReason !== null` (line 122 of init-position-manager.ts), ignores status field
|
||||
- **Consequence:** Phantom trade with exitReason=NULL treated as "open" and restored to Position Manager
|
||||
- **Real incident:** Nov 14 phantom trade (cmhy6xul20067nx077agh260n) caused 232% size mismatch, hundreds of false "🔴 RUNNER STOP LOSS" alerts
|
||||
- **Fix:** When auto-closing phantom trades, MUST set exitReason:
|
||||
```typescript
|
||||
// In app/api/trading/execute/route.ts (phantom detection):
|
||||
await updateTradeExit({
|
||||
tradeId: trade.id,
|
||||
exitPrice: currentPrice,
|
||||
exitReason: 'manual', // CRITICAL: Must set exitReason for cleanup
|
||||
realizedPnL: actualPnL,
|
||||
status: 'phantom'
|
||||
})
|
||||
```
|
||||
- **Manual cleanup:** If phantom already exists: `UPDATE "Trade" SET "exitReason" = 'manual' WHERE status = 'phantom' AND "exitReason" IS NULL`
|
||||
- **Impact:** Without exitReason, phantom trades create ghost positions that trigger false alerts and pollute monitoring
|
||||
- **Verification:** After restart, check logs for "Found 0 open trades" (not "Found 1 open trades to restore")
|
||||
- **Lesson:** status field is for classification, exitReason is for lifecycle management - both must be set on closure
|
||||
|
||||
## File Conventions
|
||||
|
||||
- **API routes:** `app/api/[feature]/[action]/route.ts` (Next.js 15 App Router)
|
||||
|
||||
Reference in New Issue
Block a user