Files
trading_bot_v4/CRITICAL_INCIDENT_UNPROTECTED_POSITION.md
mindesbunister bd9633fbc2 CRITICAL FIX: Prevent unprotected positions via database-first pattern
Root Cause:
- Execute endpoint saved to database AFTER adding to Position Manager
- Database save failures were silently caught and ignored
- API returned success even when DB save failed
- Container restarts lost in-memory Position Manager state
- Result: Unprotected positions with no TP/SL monitoring

Fixes Applied:

1. Database-First Pattern (app/api/trading/execute/route.ts):
   - MOVED createTrade() BEFORE positionManager.addTrade()
   - If database save fails, return HTTP 500 with critical error
   - Error message: 'CLOSE POSITION MANUALLY IMMEDIATELY'
   - Position Manager only tracks database-persisted trades
   - Ensures container restarts can restore all positions

2. Transaction Timeout (lib/drift/orders.ts):
   - Added 30s timeout to confirmTransaction() in closePosition()
   - Prevents API from hanging during network congestion
   - Uses Promise.race() pattern for timeout enforcement

3. Telegram Error Messages (telegram_command_bot.py):
   - Parse JSON for ALL responses (not just 200 OK)
   - Extract detailed error messages from 'message' field
   - Shows critical warnings to user immediately
   - Fail-open: proceeds if analytics check fails

4. Position Manager (lib/trading/position-manager.ts):
   - Move lastPrice update to TOP of monitoring loop
   - Ensures /status endpoint always shows current price

Verification:
- Test trade cmhxj8qxl0000od076m21l58z executed successfully
- Database save completed BEFORE Position Manager tracking
- SL triggered correctly at -$4.21 after 15 minutes
- All protection systems working as expected

Impact:
- Eliminates risk of unprotected positions
- Provides immediate critical warnings if DB fails
- Enables safe container restarts with full position recovery
- Verified with live test trade on production

See: CRITICAL_INCIDENT_UNPROTECTED_POSITION.md for full incident report
2025-11-13 15:56:28 +01:00

7.3 KiB

CRITICAL INCIDENT: Unprotected Position (Nov 13, 2025)

Summary

User opened SOL SHORT via Telegram command. Position opened on Drift but was NOT tracked by Position Manager, resulting in NO TP/SL orders and -$5.40 loss when manually closed.

Timeline

  • ~14:00 CET: User sends short sol via Telegram
  • 14:14 CET: Container restarts (unknown reason)
  • ~15:10 CET: User notices position has no TP/SL in Drift UI
  • 15:15 CET: User manually closes position at -$5.40 loss
  • 15:20 CET: Investigation begins

Root Cause Analysis

Primary Cause: Database Save Failure Silently Ignored

File: app/api/trading/execute/route.ts lines 508-512 (original)

The Bug:

// Add to position manager for monitoring AFTER orders are placed
await positionManager.addTrade(activeTrade)

// ... later in code ...

// Save trade to database
try {
  await createTrade({...})
} catch (dbError) {
  console.error('❌ Failed to save trade to database:', dbError)
  // Don't fail the trade if database save fails  ← THIS IS THE BUG
}

What Happened:

  1. Position opened successfully on Drift
  2. Exit orders placed on-chain (probably)
  3. Trade added to Position Manager in-memory
  4. Database save FAILED (error caught and logged)
  5. API returned success: true to user (user didn't know save failed)
  6. Container restarted at 14:14 CET
  7. Position Manager restoration logic queries database for open trades
  8. Trade not in database → Position Manager didn't monitor it
  9. Exit orders may have been canceled during restart or never placed
  10. Position left completely unprotected

Contributing Factors

Factor 1: Container Restart Lost In-Memory State

  • Position Manager tracks trades in a Map<string, ActiveTrade>
  • Container restart at 14:14 CET cleared all in-memory state
  • Restoration logic relies on database query:
    const openTrades = await prisma.trade.findMany({
      where: { exitReason: null }
    })
    
  • Since trade wasn't in DB, restoration failed silently

Factor 2: Ghost Trades Corrupting Database

Two trades found with stopLossPrice=0:

  • cmhkeenei0002nz07nl04uub8 (Nov 4, $70.35 entry)
  • cmho7ki8u000aof07k7lpivb0 (Nov 7, $119.18 entry)

These may have caused database schema issues or validation errors during the failed save.

Factor 3: No Database Save Verification

  • Execute endpoint doesn't verify createTrade() succeeded before returning success
  • User had no way to know their position was unprotected
  • Telegram bot showed "success" message despite database failure

The Fix

Fix 1: Database-First Pattern (CRITICAL)

File: app/api/trading/execute/route.ts

Before:

await positionManager.addTrade(activeTrade)  // Add to memory FIRST
// ... create response ...
try {
  await createTrade({...})  // Save to DB LATER
} catch (dbError) {
  // Ignore error  ← WRONG
}

After:

try {
  await createTrade({...})  // Save to DB FIRST
} catch (dbError) {
  console.error('❌ CRITICAL: Failed to save trade to database:', dbError)
  return NextResponse.json({
    success: false,
    error: 'Database save failed - position unprotected',
    message: 'Position opened on Drift but database save failed. CLOSE POSITION MANUALLY IMMEDIATELY.'
  }, { status: 500 })
}

// ONLY add to Position Manager if database save succeeded
await positionManager.addTrade(activeTrade)

Impact:

  • If database save fails, API returns error
  • User/Telegram bot gets failure notification
  • Position Manager is NOT given the trade to monitor
  • User knows to close position manually on Drift UI
  • Prevents silent failures

Fix 2: Transaction Confirmation Timeout

File: lib/drift/orders.ts (closePosition function)

Problem: connection.confirmTransaction() could hang indefinitely, blocking API

Fix:

const confirmationPromise = connection.confirmTransaction(txSig, 'confirmed')
const timeoutPromise = new Promise((_, reject) => 
  setTimeout(() => reject(new Error('Transaction confirmation timeout')), 30000)
)

const confirmation = await Promise.race([confirmationPromise, timeoutPromise])

Impact:

  • Close API won't hang forever
  • 30s timeout allows user to retry or check Drift UI
  • Logs warning if timeout occurs

Fix 3: Ghost Trade Cleanup

Database: Marked 2 corrupted trades as closed

UPDATE "Trade" 
SET "exitReason" = 'ghost_trade_cleanup', 
    "exitPrice" = "entryPrice", 
    "realizedPnL" = 0 
WHERE id IN ('cmho7ki8u000aof07k7lpivb0', 'cmhkeenei0002nz07nl04uub8');

Impact:

  • Position Manager restoration no longer blocked by invalid data
  • Database queries for open trades won't return corrupted entries

Lessons Learned

1. NEVER Silently Swallow Critical Errors

Bad Pattern:

try {
  await criticalOperation()
} catch (err) {
  console.error('Error:', err)
  // Continue anyway  ← WRONG
}

Good Pattern:

try {
  await criticalOperation()
} catch (err) {
  console.error('CRITICAL ERROR:', err)
  return errorResponse()  // FAIL FAST
}

2. Database-First for Stateful Operations

When in-memory state depends on database:

  1. Save to database FIRST
  2. Verify save succeeded
  3. THEN update in-memory state
  4. If any step fails, ROLL BACK or return error

3. Container Restart Resilience

  • In-memory state is VOLATILE
  • Critical state must persist to database
  • Restoration logic must handle:
    • Corrupted data
    • Missing fields
    • Schema mismatches

4. User Notifications for Failures

  • API errors must propagate to user
  • Telegram bot must show FAILURE messages
  • Don't hide errors from users - they need to know!

5. Verification Mandate Still Critical

  • Even after this incident, we didn't verify the fix worked with real trade
  • ALWAYS execute test trade after deploying financial code changes
  • Monitor logs to ensure expected behavior

Prevention Measures

Immediate (Deployed)

  • Database save moved before Position Manager add
  • Transaction confirmation timeout (30s)
  • Ghost trades cleaned from database

Short-Term (To Do)

  • Add database save health check on startup
  • Create /api/admin/sync-positions endpoint to reconcile Drift vs Database
  • Add Telegram alert when trade save fails
  • Log database errors to SystemEvent table for monitoring

Long-Term (Future)

  • Implement database transactions (savepoint before trade execution)
  • Add automatic position sync check every 5 minutes
  • Create "orphaned position" detection (on Drift but not in DB)
  • Add Sentry/error tracking for database failures
  • Consider Redis/in-memory DB for critical state (survives restarts)

Financial Impact

  • Loss: -$5.40
  • Risk Exposure: Unlimited (position had no stop loss)
  • Duration Unprotected: ~1 hour
  • Prevented Loss: Unknown (market could have moved significantly)

Status

  • Position closed manually
  • Fixes implemented and deployed
  • Ghost trades cleaned
  • Container rebuilding with fixes
  • Need test trade to verify fixes

Next Steps

  1. Wait for container rebuild to complete
  2. Test with small position ($10-20)
  3. Verify database save succeeds before Position Manager add
  4. Monitor for any database errors
  5. Consider reducing position size until system proven stable

Created: Nov 13, 2025 15:30 CET
Status: RESOLVED
Severity: CRITICAL
Owner: AI Agent + User