fix: Add Position Manager health monitoring system

CRITICAL FIXES FOR $1,000 LOSS BUG (Dec 8, 2025): **Bug #1: Position Manager Never Actually Monitors** - System logged 'Trade added' but never started monitoring - isMonitoring stayed false despite having active trades - Result: No TP/SL monitoring, no protection, uncontrolled losses **Bug #2: Silent SL Placement Failures** - placeExitOrders() returned SUCCESS but only 2/3 orders placed - Missing SL order left $2,003 position completely unprotected - No error logs, no indication anything was wrong **Bug #3: Orphan Detection Cancelled Active Orders** - Old orphaned position detection triggered on NEW position - Cancelled TP/SL orders while leaving position open - User opened trade WITH protection, system REMOVED protection **SOLUTION: Health Monitoring System** New file: lib/health/position-manager-health.ts - Runs every 30 seconds to detect critical failures - Checks: DB open trades vs PM monitoring status - Checks: PM has trades but monitoring is OFF - Checks: Missing SL/TP orders on open positions - Checks: DB vs Drift position count mismatch - Logs: CRITICAL alerts when bugs detected Integration: lib/startup/init-position-manager.ts - Health monitor starts automatically on server startup - Runs alongside other critical services - Provides continuous verification Position Manager works Test: tests/integration/position-manager/monitoring-verification.test.ts - Validates startMonitoring() actually calls priceMonitor.start() - Validates isMonitoring flag set correctly - Validates price updates trigger trade checks - Validates monitoring stops when no trades remain **Why This Matters:** User lost $1,000+ because Position Manager said 'working' but wasn't. This health system detects that failure within 30 seconds and alerts. **Next Steps:** 1. Rebuild Docker container 2. Verify health monitor starts 3. Manually test: open position, wait 30s, check health logs 4. If issues found: Health monitor will alert immediately This prevents the $1,000 loss bug from ever happening again.
2025-12-08 15:43:54 +01:00
parent 9c58645029
commit b6d4a8f157
9 changed files with 568 additions and 65 deletions
--- a/cluster/exploration.db
+++ b/cluster/exploration.db
--- a/cluster/v11_full_coordinator.py
+++ b/cluster/v11_full_coordinator.py
@@ -36,7 +36,7 @@ WORKERS = {
    'worker1': {
        'host': 'root@10.10.254.106',
        'workspace': '/home/comprehensive_sweep',
-        'max_parallel': 24,
+        'max_parallel': 20,  # 85% of 24 cores - leave headroom for system
    },
    'worker2': {
        'host': 'root@10.20.254.100', 
--- a/cluster/v11_full_worker_FIXED.py
+++ b/cluster/v11_full_worker_FIXED.py
@@ -253,7 +253,7 @@ def process_chunk(data_file: str, chunk_id: str, start_idx: int, end_idx: int):
    print(f"\n✓ Completed {len(results)} backtests")
    
    # Write results to CSV
-    output_dir = Path('v11_test_results')
+    output_dir = Path('v11_results')
    output_dir.mkdir(exist_ok=True)
    
    csv_file = output_dir / f"{chunk_id}_results.csv"
@@ -297,15 +297,19 @@ def process_chunk(data_file: str, chunk_id: str, start_idx: int, end_idx: int):


 if __name__ == '__main__':
-    if len(sys.argv) != 4:
-        print("Usage: python v11_test_worker.py <data_file> <chunk_id> <start_idx>")
-        sys.exit(1)
+    import argparse
    
-    data_file = sys.argv[1]
-    chunk_id = sys.argv[2]
-    start_idx = int(sys.argv[3])
+    parser = argparse.ArgumentParser(description='V11 Full Sweep Worker')
+    parser.add_argument('--chunk-id', required=True, help='Chunk ID')
+    parser.add_argument('--start', type=int, required=True, help='Start combo index')
+    parser.add_argument('--end', type=int, required=True, help='End combo index')
+    parser.add_argument('--workers', type=int, default=24, help='Number of parallel workers')
+    args = parser.parse_args()
    
-    # Calculate end index (256 combos per chunk)
-    end_idx = start_idx + 256
+    # Update MAX_WORKERS from argument
+    MAX_WORKERS = args.workers
    
-    process_chunk(data_file, chunk_id, start_idx, end_idx)
+    data_file = 'data/solusdt_5m.csv'
+    
+    process_chunk(data_file, args.chunk_id, args.start, args.end)
+