Commit Graph

4 Commits

Author SHA1 Message Date
mindesbunister
e5714e4376 critical: Bug #82 EMERGENCY FIX - Disable Drift State Verifier automatic close
Problem: Verifier can't distinguish OLD positions from NEW positions at same symbol
- User opened manual trade with SL working
- Verifier detected 6 old closed DB records (150-1064 min ago)
- All showed "15.45 tokens open on Drift" (user's CURRENT trade!)
- Automatic retry close removed user's SL orders

Root Cause: Lines 279-283 call closePosition() for every mismatch
- No verification if Drift position is OLD (should close) or NEW (active trade)
- No position ID/timestamp matching
- Result: Closes ACTIVE trades when cleaning up old database records

Solution: DISABLED automatic retry close (lines 276-298)
- Added BUG #82 warning logs
- Requires manual intervention if true orphan detected
- Will add proper position verification in follow-up fix

Impact: Stops SL removal on active trades
User incident: After Bug #81 fix deployed, THIS bug was killing SLs
Deployment: Dec 10, 2025 11:06 CET
2025-12-10 11:05:53 +01:00
copilot-swe-agent[bot]
63b94016fe fix: Implement critical risk management fixes for bugs #76, #77, #78, #80
Co-authored-by: mindesbunister <32161838+mindesbunister@users.noreply.github.com>
2025-12-09 22:23:43 +00:00
mindesbunister
1ed909c661 fix: Stop Drift verifier retry loop cancelling orders (Bug #80)
CRITICAL FIX (Dec 9, 2025): Drift state verifier now stops retry loop when close transaction confirms, preventing infinite retries that cancel orders.

Problem:
- Drift state verifier detected 'closed' positions still open on Drift
- Sent close transaction which CONFIRMED on-chain
- But Drift API still showed position (5-minute propagation delay)
- Verifier thought close failed, retried immediately
- Infinite loop: close → confirm → Drift still shows position → retry
- Eventually Position Manager gave up, cancelled ALL orders
- User's position left completely unprotected

Root Cause (Bug #80):
- Solana transaction confirms in ~400ms on-chain
- Drift.getPosition() caches state, takes 5+ minutes to update
- Verifier didn't account for propagation delay
- Kept retrying every 10 minutes because Drift API lagged behind
- Each retry attempt potentially cancelled orders as side effect

Solution:
- Check configSnapshot.retryCloseTime before retrying
- If last retry was <5 minutes ago, SKIP (wait for Drift to catch up)
- Log: 'Skipping retry - last attempt Xs ago (Drift propagation delay)'
- Prevents retry loop while Drift state propagates
- After 5 minutes, can retry if position truly stuck

Impact:
- Orders no longer disappear repeatedly due to retry loop
- Position stays protected with TP1/TP2/SL between retries
- User doesn't need to manually replace orders every 3 minutes
- System respects Drift API propagation delay

Testing:
- Deployed fix, orders placed successfully
- Database synced: tp1OrderTx and tp2OrderTx populated
- Monitoring logs for 'Skipping retry' messages on next verifier run
- Position tracking: 1 active trade, monitoring active

Note: This fixes the symptom (retry loop). Root cause is Drift SDK caching getPosition() results. Real fix would be to query on-chain state directly or increase cache TTL.

Files changed:
- lib/monitoring/drift-state-verifier.ts (added 5-minute skip window)
2025-12-09 21:04:29 +01:00
mindesbunister
4ab7bf58da feat: Drift state verifier double-checking system (WIP - build issues)
CRITICAL: Position Manager stops monitoring randomly
User had to manually close SOL-PERP position after PM stopped at 23:21.

Implemented double-checking system to detect when positions marked
closed in DB are still open on Drift (and vice versa):

1. DriftStateVerifier service (lib/monitoring/drift-state-verifier.ts)
   - Runs every 10 minutes automatically
   - Checks closed trades (24h) vs actual Drift positions
   - Retries close if mismatch found
   - Sends Telegram alerts

2. Manual verification API (app/api/monitoring/verify-drift-state)
   - POST: Force immediate verification check
   - GET: Service status

3. Integrated into startup (lib/startup/init-position-manager.ts)
   - Auto-starts on container boot
   - First check after 2min, then every 10min

STATUS: Build failing due to TypeScript compilation timeout
Need to fix and deploy, then investigate WHY Position Manager stops.

This addresses symptom (stuck positions) but not root cause (PM stopping).
2025-12-07 02:28:10 +01:00