Files
trading_bot_v4/docs/bugs/CRITICAL_INCIDENT_UNPROTECTED_POSITION.md
mindesbunister 4c36fa2bc3 docs: Major documentation reorganization + ENV variable reference
**Documentation Structure:**
- Created docs/ subdirectory organization (analysis/, architecture/, bugs/,
  cluster/, deployments/, roadmaps/, setup/, archived/)
- Moved 68 root markdown files to appropriate categories
- Root directory now clean (only README.md remains)
- Total: 83 markdown files now organized by purpose

**New Content:**
- Added comprehensive Environment Variable Reference to copilot-instructions.md
- 100+ ENV variables documented with types, defaults, purpose, notes
- Organized by category: Required (Drift/RPC/Pyth), Trading Config (quality/
  leverage/sizing), ATR System, Runner System, Risk Limits, Notifications, etc.
- Includes usage examples (correct vs wrong patterns)

**File Distribution:**
- docs/analysis/ - Performance analyses, blocked signals, profit projections
- docs/architecture/ - Adaptive leverage, ATR trailing, indicator tracking
- docs/bugs/ - CRITICAL_*.md, FIXES_*.md bug reports (7 files)
- docs/cluster/ - EPYC setup, distributed computing docs (3 files)
- docs/deployments/ - *_COMPLETE.md, DEPLOYMENT_*.md status (12 files)
- docs/roadmaps/ - All *ROADMAP*.md strategic planning files (7 files)
- docs/setup/ - TradingView guides, signal quality, n8n setup (8 files)
- docs/archived/2025_pre_nov/ - Obsolete verification checklist (1 file)

**Key Improvements:**
- ENV variable reference: Single source of truth for all configuration
- Common Pitfalls #68-71: Already complete, verified during audit
- Better findability: Category-based navigation vs 68 files in root
- Preserves history: All files git mv (rename), not copy/delete
- Zero broken functionality: Only documentation moved, no code changes

**Verification:**
- 83 markdown files now in docs/ subdirectories
- Root directory cleaned: 68 files → 0 files (except README.md)
- Git history preserved for all moved files
- Container running: trading-bot-v4 (no restart needed)

**Next Steps:**
- Create README.md files in each docs subdirectory
- Add navigation index
- Update main README.md with new structure
- Consolidate duplicate deployment docs
- Archive truly obsolete files (old SQL backups)

See: docs/analysis/CLEANUP_PLAN.md for complete reorganization strategy
2025-12-04 08:29:59 +01:00

7.3 KiB

CRITICAL INCIDENT: Unprotected Position (Nov 13, 2025)

Summary

User opened SOL SHORT via Telegram command. Position opened on Drift but was NOT tracked by Position Manager, resulting in NO TP/SL orders and -$5.40 loss when manually closed.

Timeline

  • ~14:00 CET: User sends short sol via Telegram
  • 14:14 CET: Container restarts (unknown reason)
  • ~15:10 CET: User notices position has no TP/SL in Drift UI
  • 15:15 CET: User manually closes position at -$5.40 loss
  • 15:20 CET: Investigation begins

Root Cause Analysis

Primary Cause: Database Save Failure Silently Ignored

File: app/api/trading/execute/route.ts lines 508-512 (original)

The Bug:

// Add to position manager for monitoring AFTER orders are placed
await positionManager.addTrade(activeTrade)

// ... later in code ...

// Save trade to database
try {
  await createTrade({...})
} catch (dbError) {
  console.error('❌ Failed to save trade to database:', dbError)
  // Don't fail the trade if database save fails  ← THIS IS THE BUG
}

What Happened:

  1. Position opened successfully on Drift
  2. Exit orders placed on-chain (probably)
  3. Trade added to Position Manager in-memory
  4. Database save FAILED (error caught and logged)
  5. API returned success: true to user (user didn't know save failed)
  6. Container restarted at 14:14 CET
  7. Position Manager restoration logic queries database for open trades
  8. Trade not in database → Position Manager didn't monitor it
  9. Exit orders may have been canceled during restart or never placed
  10. Position left completely unprotected

Contributing Factors

Factor 1: Container Restart Lost In-Memory State

  • Position Manager tracks trades in a Map<string, ActiveTrade>
  • Container restart at 14:14 CET cleared all in-memory state
  • Restoration logic relies on database query:
    const openTrades = await prisma.trade.findMany({
      where: { exitReason: null }
    })
    
  • Since trade wasn't in DB, restoration failed silently

Factor 2: Ghost Trades Corrupting Database

Two trades found with stopLossPrice=0:

  • cmhkeenei0002nz07nl04uub8 (Nov 4, $70.35 entry)
  • cmho7ki8u000aof07k7lpivb0 (Nov 7, $119.18 entry)

These may have caused database schema issues or validation errors during the failed save.

Factor 3: No Database Save Verification

  • Execute endpoint doesn't verify createTrade() succeeded before returning success
  • User had no way to know their position was unprotected
  • Telegram bot showed "success" message despite database failure

The Fix

Fix 1: Database-First Pattern (CRITICAL)

File: app/api/trading/execute/route.ts

Before:

await positionManager.addTrade(activeTrade)  // Add to memory FIRST
// ... create response ...
try {
  await createTrade({...})  // Save to DB LATER
} catch (dbError) {
  // Ignore error  ← WRONG
}

After:

try {
  await createTrade({...})  // Save to DB FIRST
} catch (dbError) {
  console.error('❌ CRITICAL: Failed to save trade to database:', dbError)
  return NextResponse.json({
    success: false,
    error: 'Database save failed - position unprotected',
    message: 'Position opened on Drift but database save failed. CLOSE POSITION MANUALLY IMMEDIATELY.'
  }, { status: 500 })
}

// ONLY add to Position Manager if database save succeeded
await positionManager.addTrade(activeTrade)

Impact:

  • If database save fails, API returns error
  • User/Telegram bot gets failure notification
  • Position Manager is NOT given the trade to monitor
  • User knows to close position manually on Drift UI
  • Prevents silent failures

Fix 2: Transaction Confirmation Timeout

File: lib/drift/orders.ts (closePosition function)

Problem: connection.confirmTransaction() could hang indefinitely, blocking API

Fix:

const confirmationPromise = connection.confirmTransaction(txSig, 'confirmed')
const timeoutPromise = new Promise((_, reject) => 
  setTimeout(() => reject(new Error('Transaction confirmation timeout')), 30000)
)

const confirmation = await Promise.race([confirmationPromise, timeoutPromise])

Impact:

  • Close API won't hang forever
  • 30s timeout allows user to retry or check Drift UI
  • Logs warning if timeout occurs

Fix 3: Ghost Trade Cleanup

Database: Marked 2 corrupted trades as closed

UPDATE "Trade" 
SET "exitReason" = 'ghost_trade_cleanup', 
    "exitPrice" = "entryPrice", 
    "realizedPnL" = 0 
WHERE id IN ('cmho7ki8u000aof07k7lpivb0', 'cmhkeenei0002nz07nl04uub8');

Impact:

  • Position Manager restoration no longer blocked by invalid data
  • Database queries for open trades won't return corrupted entries

Lessons Learned

1. NEVER Silently Swallow Critical Errors

Bad Pattern:

try {
  await criticalOperation()
} catch (err) {
  console.error('Error:', err)
  // Continue anyway  ← WRONG
}

Good Pattern:

try {
  await criticalOperation()
} catch (err) {
  console.error('CRITICAL ERROR:', err)
  return errorResponse()  // FAIL FAST
}

2. Database-First for Stateful Operations

When in-memory state depends on database:

  1. Save to database FIRST
  2. Verify save succeeded
  3. THEN update in-memory state
  4. If any step fails, ROLL BACK or return error

3. Container Restart Resilience

  • In-memory state is VOLATILE
  • Critical state must persist to database
  • Restoration logic must handle:
    • Corrupted data
    • Missing fields
    • Schema mismatches

4. User Notifications for Failures

  • API errors must propagate to user
  • Telegram bot must show FAILURE messages
  • Don't hide errors from users - they need to know!

5. Verification Mandate Still Critical

  • Even after this incident, we didn't verify the fix worked with real trade
  • ALWAYS execute test trade after deploying financial code changes
  • Monitor logs to ensure expected behavior

Prevention Measures

Immediate (Deployed)

  • Database save moved before Position Manager add
  • Transaction confirmation timeout (30s)
  • Ghost trades cleaned from database

Short-Term (To Do)

  • Add database save health check on startup
  • Create /api/admin/sync-positions endpoint to reconcile Drift vs Database
  • Add Telegram alert when trade save fails
  • Log database errors to SystemEvent table for monitoring

Long-Term (Future)

  • Implement database transactions (savepoint before trade execution)
  • Add automatic position sync check every 5 minutes
  • Create "orphaned position" detection (on Drift but not in DB)
  • Add Sentry/error tracking for database failures
  • Consider Redis/in-memory DB for critical state (survives restarts)

Financial Impact

  • Loss: -$5.40
  • Risk Exposure: Unlimited (position had no stop loss)
  • Duration Unprotected: ~1 hour
  • Prevented Loss: Unknown (market could have moved significantly)

Status

  • Position closed manually
  • Fixes implemented and deployed
  • Ghost trades cleaned
  • Container rebuilding with fixes
  • Need test trade to verify fixes

Next Steps

  1. Wait for container rebuild to complete
  2. Test with small position ($10-20)
  3. Verify database save succeeds before Position Manager add
  4. Monitor for any database errors
  5. Consider reducing position size until system proven stable

Created: Nov 13, 2025 15:30 CET
Status: RESOLVED
Severity: CRITICAL
Owner: AI Agent + User