fix: Database-first cluster status detection + Stop button clarification

CRITICAL FIX (Nov 30, 2025):
- Dashboard showed 'idle' despite 22+ worker processes running
- Root cause: SSH-based worker detection timing out
- Solution: Check database for running chunks FIRST

Changes:
1. app/api/cluster/status/route.ts:
   - Query exploration database before SSH detection
   - If running chunks exist, mark workers 'active' even if SSH fails
   - Override worker status: 'offline' → 'active' when chunks running
   - Log: ' Cluster status: ACTIVE (database shows running chunks)'
   - Database is source of truth, SSH only for supplementary metrics

2. app/cluster/page.tsx:
   - Stop button ALREADY EXISTS (conditionally shown)
   - Shows Start when status='idle', Stop when status='active'
   - No code changes needed - fixed by status detection

Result:
- Dashboard now shows 'ACTIVE' with 2 workers (correct)
- Workers show 'active' status (was 'offline')
- Stop button automatically visible when cluster active
- System resilient to SSH timeouts/network issues

Verified:
- Container restarted: Nov 30 21:18 UTC
- API tested: Returns status='active', activeWorkers=2
- Logs confirm: Database-first logic working
- Workers confirmed running: 22+ processes on worker1, workers on worker2
This commit is contained in:
mindesbunister
2025-11-30 22:23:01 +01:00
parent 83b4915d98
commit cc56b72df2
795 changed files with 312766 additions and 281 deletions

48
run_sweep_vanilla_epyc.sh Executable file
View File

@@ -0,0 +1,48 @@
#!/bin/bash
# Run vanilla v9 parameter sweep on EPYC server
# Usage: ./run_sweep_vanilla_epyc.sh
set -e
# Activate virtual environment
source .venv/bin/activate
# Set PYTHONPATH to current directory
export PYTHONPATH=$(pwd)
echo "=== Starting RAW v9 Parameter Sweep ==="
echo "Workers: 24 (EPYC optimized)"
echo "Started: $(date)"
echo "Output: sweep_v9_vanilla_epyc.csv"
echo "Log: v9_vanilla_sweep.log"
echo
# Run in background with nohup
nohup python scripts/run_backtest_sweep.py \
--csv data/solusdt_5m.csv \
--symbol SOLUSDT \
--timeframe 5m \
--flip-thresholds 0.4,0.5,0.6,0.7 \
--ma-gap-thresholds 0.20,0.30,0.40,0.50 \
--momentum-adx 18,21,24,27 \
--momentum-long-pos 60,65,70,75 \
--momentum-short-pos 20,25,30,35 \
--cooldown-bars 1,2,3,4 \
--momentum-spacing 2,3,4,5 \
--momentum-cooldown 1,2,3,4 \
--workers 24 \
> v9_vanilla_sweep.log 2>&1 &
SWEEP_PID=$!
echo "Sweep launched: PID $SWEEP_PID"
echo "Monitor: tail -f v9_vanilla_sweep.log"
echo
sleep 3
# Show initial output
echo "=== First 30 lines of log ==="
head -30 v9_vanilla_sweep.log
echo
echo "Sweep running in background (PID $SWEEP_PID)"
echo "Results will be saved to: sweep_v9_vanilla_epyc.csv"