trading_bot_v4

Author	SHA1	Message	Date
copilot-swe-agent[bot]	63b94016fe	fix: Implement critical risk management fixes for bugs #76 , #77 , #78 , #80 Co-authored-by: mindesbunister <32161838+mindesbunister@users.noreply.github.com>	2025-12-09 22:23:43 +00:00
mindesbunister	1ed909c661	fix: Stop Drift verifier retry loop cancelling orders (Bug #80 ) CRITICAL FIX (Dec 9, 2025): Drift state verifier now stops retry loop when close transaction confirms, preventing infinite retries that cancel orders. Problem: - Drift state verifier detected 'closed' positions still open on Drift - Sent close transaction which CONFIRMED on-chain - But Drift API still showed position (5-minute propagation delay) - Verifier thought close failed, retried immediately - Infinite loop: close → confirm → Drift still shows position → retry - Eventually Position Manager gave up, cancelled ALL orders - User's position left completely unprotected Root Cause (Bug #80): - Solana transaction confirms in ~400ms on-chain - Drift.getPosition() caches state, takes 5+ minutes to update - Verifier didn't account for propagation delay - Kept retrying every 10 minutes because Drift API lagged behind - Each retry attempt potentially cancelled orders as side effect Solution: - Check configSnapshot.retryCloseTime before retrying - If last retry was <5 minutes ago, SKIP (wait for Drift to catch up) - Log: 'Skipping retry - last attempt Xs ago (Drift propagation delay)' - Prevents retry loop while Drift state propagates - After 5 minutes, can retry if position truly stuck Impact: - Orders no longer disappear repeatedly due to retry loop - Position stays protected with TP1/TP2/SL between retries - User doesn't need to manually replace orders every 3 minutes - System respects Drift API propagation delay Testing: - Deployed fix, orders placed successfully - Database synced: tp1OrderTx and tp2OrderTx populated - Monitoring logs for 'Skipping retry' messages on next verifier run - Position tracking: 1 active trade, monitoring active Note: This fixes the symptom (retry loop). Root cause is Drift SDK caching getPosition() results. Real fix would be to query on-chain state directly or increase cache TTL. Files changed: - lib/monitoring/drift-state-verifier.ts (added 5-minute skip window)	2025-12-09 21:04:29 +01:00
mindesbunister	4ab7bf58da	feat: Drift state verifier double-checking system (WIP - build issues) CRITICAL: Position Manager stops monitoring randomly User had to manually close SOL-PERP position after PM stopped at 23:21. Implemented double-checking system to detect when positions marked closed in DB are still open on Drift (and vice versa): 1. DriftStateVerifier service (lib/monitoring/drift-state-verifier.ts) - Runs every 10 minutes automatically - Checks closed trades (24h) vs actual Drift positions - Retries close if mismatch found - Sends Telegram alerts 2. Manual verification API (app/api/monitoring/verify-drift-state) - POST: Force immediate verification check - GET: Service status 3. Integrated into startup (lib/startup/init-position-manager.ts) - Auto-starts on container boot - First check after 2min, then every 10min STATUS: Build failing due to TypeScript compilation timeout Need to fix and deploy, then investigate WHY Position Manager stops. This addresses symptom (stuck positions) but not root cause (PM stopping).	2025-12-07 02:28:10 +01:00
mindesbunister	302511293c	feat: Add production logging gating (Phase 1, Task 1.1) - Created logger utility with environment-based gating (lib/utils/logger.ts) - Replaced 517 console.log statements with logger.log (71% reduction) - Fixed import paths in 15 files (resolved comment-trapped imports) - Added DEBUG_LOGS=false to .env - Achieves 71% immediate log reduction (517/731 statements) - Expected 90% reduction in production when deployed Impact: Reduced I/O blocking, lower log volume in production Risk: LOW (easy rollback, non-invasive) Phase: Phase 1, Task 1.1 (Quick Wins - Console.log Production Gating) Files changed: - NEW: lib/utils/logger.ts (production-safe logging) - NEW: scripts/replace-console-logs.js (automation tool) - Modified: 15 lib/*.ts files (console.log → logger.log) - Modified: .env (DEBUG_LOGS=false) Next: Task 1.2 (Image Size Optimization)	2025-12-05 00:32:41 +01:00
mindesbunister	f420d98d55	critical: Make health monitor 3-4x more aggressive to prevent heap crashes PROBLEM (Nov 27, 2025 - 11:53 UTC): - accountUnsubscribe errors accumulated 200+ times in 2 seconds - JavaScript heap out of memory crash BEFORE health monitor could trigger - Old settings: 50 errors / 30s window / check every 10s = too slow - Container crashed from memory exhaustion, not clean restart SOLUTION - 3-4x FASTER RESPONSE: - Error window: 30s → 10s (3× faster detection) - Error threshold: 50 → 20 errors (2.5× more sensitive) - Check frequency: 10s → 3s intervals (3× more frequent) IMPACT: - Before: 10-40 seconds to trigger restart - After: 3-13 seconds to trigger restart (3-4× faster) - Catches rapid error accumulation BEFORE heap exhaustion - Clean restart instead of crash-and-recover REAL INCIDENT TIMELINE: 11:53:43 - Errors start accumulating 11:53:45.606 - FATAL: heap out of memory (2.2 seconds) 11:53:47.803 - Docker restart (not health monitor) NEW BEHAVIOR: - 20 errors in 10s = trigger at ~100ms/error rate - 3s check interval catches problem in 3-13s MAX - Clean restart before memory leak causes crash Files Changed: - lib/monitoring/drift-health-monitor.ts (lines 13-14, 32)	2025-11-27 13:04:14 +01:00
mindesbunister	dc197f52a4	feat: Replace blind 2-hour reconnect with error-based health monitoring User Request: Replace blind 2-hour restart timer with smart monitoring that only restarts when accountUnsubscribe errors actually occur Changes: . Health Monitor (NEW): - Created lib/monitoring/drift-health-monitor.ts - Tracks accountUnsubscribe errors in 30-second sliding window - Triggers container restart via flag file when 50+ errors detected - Prevents unnecessary restarts when SDK healthy . Drift Client: - Removed blind scheduleReconnection() and 2-hour timer - Added interceptWebSocketErrors() to catch SDK errors - Patches console.error to monitor for accountUnsubscribe patterns - Starts health monitor after successful initialization - Removed unused reconnect() method and reconnectTimer field . Health API (NEW): - GET /api/drift/health - Check current error count and health status - Returns: healthy boolean, errorCount, threshold, message - Useful for external monitoring and debugging Impact: - System only restarts when actual memory leak detected - Prevents unnecessary downtime every 2 hours - More targeted response to SDK issues - Better operational stability Files: - lib/monitoring/drift-health-monitor.ts (NEW - 165 lines) - lib/drift/client.ts (removed timer, added error interception) - app/api/drift/health/route.ts (NEW - health check endpoint) Testing: - Health monitor starts on initialization: ✅ - API endpoint returns healthy status: ✅ - No blind reconnection scheduled: ✅	2025-11-24 16:49:10 +01:00

6 Commits