REPLACES emergency disable with intelligent verification:
1. Position Identity Verification:
- Compares DB exitTime vs active trade timestamps
- Verifies size matches within 15% tolerance
- Verifies direction matches (long/short)
- Checks entry price matches within 2%
2. Grace Period Enforcement:
- 10-minute wait after DB exit before attempting close
- Allows Drift state propagation
3. Safety Checks:
- Cooldown (5 min) prevents retry loops
- Protection logging when position skipped
- Fail-open bias: when uncertain, do nothing
4. Test Coverage:
- 8 test scenarios covering active position protection
- Verified ghost closure tests
- Edge case handling tests
- Fail-open bias validation
Files:
- lib/monitoring/drift-state-verifier.ts (276 lines added)
- tests/integration/drift-state-verifier/position-verification.test.ts (420 lines)
User can now rely on automatic orphan cleanup without risk of
accidentally closing active positions. System protects newer trades
when old database records exist for same symbol.
Deployed: Dec 10, 2025 ~11:25 CET
Problem: Verifier can't distinguish OLD positions from NEW positions at same symbol
- User opened manual trade with SL working
- Verifier detected 6 old closed DB records (150-1064 min ago)
- All showed "15.45 tokens open on Drift" (user's CURRENT trade!)
- Automatic retry close removed user's SL orders
Root Cause: Lines 279-283 call closePosition() for every mismatch
- No verification if Drift position is OLD (should close) or NEW (active trade)
- No position ID/timestamp matching
- Result: Closes ACTIVE trades when cleaning up old database records
Solution: DISABLED automatic retry close (lines 276-298)
- Added BUG #82 warning logs
- Requires manual intervention if true orphan detected
- Will add proper position verification in follow-up fix
Impact: Stops SL removal on active trades
User incident: After Bug #81 fix deployed, THIS bug was killing SLs
Deployment: Dec 10, 2025 11:06 CET
CRITICAL FIX (Dec 9, 2025): Drift state verifier now stops retry loop when close transaction confirms, preventing infinite retries that cancel orders.
Problem:
- Drift state verifier detected 'closed' positions still open on Drift
- Sent close transaction which CONFIRMED on-chain
- But Drift API still showed position (5-minute propagation delay)
- Verifier thought close failed, retried immediately
- Infinite loop: close → confirm → Drift still shows position → retry
- Eventually Position Manager gave up, cancelled ALL orders
- User's position left completely unprotected
Root Cause (Bug #80):
- Solana transaction confirms in ~400ms on-chain
- Drift.getPosition() caches state, takes 5+ minutes to update
- Verifier didn't account for propagation delay
- Kept retrying every 10 minutes because Drift API lagged behind
- Each retry attempt potentially cancelled orders as side effect
Solution:
- Check configSnapshot.retryCloseTime before retrying
- If last retry was <5 minutes ago, SKIP (wait for Drift to catch up)
- Log: 'Skipping retry - last attempt Xs ago (Drift propagation delay)'
- Prevents retry loop while Drift state propagates
- After 5 minutes, can retry if position truly stuck
Impact:
- Orders no longer disappear repeatedly due to retry loop
- Position stays protected with TP1/TP2/SL between retries
- User doesn't need to manually replace orders every 3 minutes
- System respects Drift API propagation delay
Testing:
- Deployed fix, orders placed successfully
- Database synced: tp1OrderTx and tp2OrderTx populated
- Monitoring logs for 'Skipping retry' messages on next verifier run
- Position tracking: 1 active trade, monitoring active
Note: This fixes the symptom (retry loop). Root cause is Drift SDK caching getPosition() results. Real fix would be to query on-chain state directly or increase cache TTL.
Files changed:
- lib/monitoring/drift-state-verifier.ts (added 5-minute skip window)
CRITICAL: Position Manager stops monitoring randomly
User had to manually close SOL-PERP position after PM stopped at 23:21.
Implemented double-checking system to detect when positions marked
closed in DB are still open on Drift (and vice versa):
1. DriftStateVerifier service (lib/monitoring/drift-state-verifier.ts)
- Runs every 10 minutes automatically
- Checks closed trades (24h) vs actual Drift positions
- Retries close if mismatch found
- Sends Telegram alerts
2. Manual verification API (app/api/monitoring/verify-drift-state)
- POST: Force immediate verification check
- GET: Service status
3. Integrated into startup (lib/startup/init-position-manager.ts)
- Auto-starts on container boot
- First check after 2min, then every 10min
STATUS: Build failing due to TypeScript compilation timeout
Need to fix and deploy, then investigate WHY Position Manager stops.
This addresses symptom (stuck positions) but not root cause (PM stopping).
User Request: Replace blind 2-hour restart timer with smart monitoring that only restarts when accountUnsubscribe errors actually occur
Changes:
. Health Monitor (NEW):
- Created lib/monitoring/drift-health-monitor.ts
- Tracks accountUnsubscribe errors in 30-second sliding window
- Triggers container restart via flag file when 50+ errors detected
- Prevents unnecessary restarts when SDK healthy
. Drift Client:
- Removed blind scheduleReconnection() and 2-hour timer
- Added interceptWebSocketErrors() to catch SDK errors
- Patches console.error to monitor for accountUnsubscribe patterns
- Starts health monitor after successful initialization
- Removed unused reconnect() method and reconnectTimer field
. Health API (NEW):
- GET /api/drift/health - Check current error count and health status
- Returns: healthy boolean, errorCount, threshold, message
- Useful for external monitoring and debugging
Impact:
- System only restarts when actual memory leak detected
- Prevents unnecessary downtime every 2 hours
- More targeted response to SDK issues
- Better operational stability
Files:
- lib/monitoring/drift-health-monitor.ts (NEW - 165 lines)
- lib/drift/client.ts (removed timer, added error interception)
- app/api/drift/health/route.ts (NEW - health check endpoint)
Testing:
- Health monitor starts on initialization: ✅
- API endpoint returns healthy status: ✅
- No blind reconnection scheduled: ✅