fix: Database-first cluster status detection + Stop button clarification
CRITICAL FIX (Nov 30, 2025):
- Dashboard showed 'idle' despite 22+ worker processes running
- Root cause: SSH-based worker detection timing out
- Solution: Check database for running chunks FIRST
Changes:
1. app/api/cluster/status/route.ts:
- Query exploration database before SSH detection
- If running chunks exist, mark workers 'active' even if SSH fails
- Override worker status: 'offline' → 'active' when chunks running
- Log: '✅ Cluster status: ACTIVE (database shows running chunks)'
- Database is source of truth, SSH only for supplementary metrics
2. app/cluster/page.tsx:
- Stop button ALREADY EXISTS (conditionally shown)
- Shows Start when status='idle', Stop when status='active'
- No code changes needed - fixed by status detection
Result:
- Dashboard now shows 'ACTIVE' with 2 workers (correct)
- Workers show 'active' status (was 'offline')
- Stop button automatically visible when cluster active
- System resilient to SSH timeouts/network issues
Verified:
- Container restarted: Nov 30 21:18 UTC
- API tested: Returns status='active', activeWorkers=2
- Logs confirm: Database-first logic working
- Workers confirmed running: 22+ processes on worker1, workers on worker2
This commit is contained in:
45
run_sweep_epyc.sh
Executable file
45
run_sweep_epyc.sh
Executable file
@@ -0,0 +1,45 @@
|
||||
#!/bin/bash
|
||||
# v9 EXHAUSTIVE Parameter Sweep for AMD EPYC 7282 (16 cores)
|
||||
# Optimized with 24 workers for maximum throughput
|
||||
# Testing ALL 65,536 combinations in ~29 hours
|
||||
|
||||
echo "🚀 Starting EXHAUSTIVE v9 parameter sweep with 24 workers on EPYC..."
|
||||
echo "📊 Testing ALL 65,536 parameter combinations (4×4×4×4×4×4×4×4 grid)"
|
||||
echo "⏱️ Estimated completion: ~29 hours"
|
||||
echo ""
|
||||
echo "Parameter ranges:"
|
||||
echo " flip_thresholds: 0.4, 0.5, 0.6, 0.7"
|
||||
echo " ma_gap: 0.20, 0.30, 0.40, 0.50"
|
||||
echo " momentum_adx: 18, 21, 24, 27"
|
||||
echo " long_pos: 60, 65, 70, 75"
|
||||
echo " short_pos: 20, 25, 30, 35"
|
||||
echo " cooldown_bars: 1, 2, 3, 4"
|
||||
echo " momentum_spacing: 2, 3, 4, 5"
|
||||
echo " momentum_cooldown: 1, 2, 3, 4"
|
||||
echo ""
|
||||
echo "🎯 EXHAUSTIVE SEARCH - Every possible combination will be tested!"
|
||||
|
||||
# CRITICAL: Activate virtual environment before running Python
|
||||
source .venv/bin/activate
|
||||
|
||||
nohup python3 scripts/run_backtest_sweep.py \
|
||||
--csv data/solusdt_5m.csv \
|
||||
--symbol SOL-PERP \
|
||||
--timeframe 5 \
|
||||
--position-size 8100 \
|
||||
--flip-thresholds "0.4,0.5,0.6,0.7" \
|
||||
--ma-gap-thresholds "0.20,0.30,0.40,0.50" \
|
||||
--momentum-adx "18,21,24,27" \
|
||||
--momentum-long-pos "60,65,70,75" \
|
||||
--momentum-short-pos "20,25,30,35" \
|
||||
--cooldown-bars "1,2,3,4" \
|
||||
--momentum-spacing "2,3,4,5" \
|
||||
--momentum-cooldown "1,2,3,4" \
|
||||
--workers 24 \
|
||||
--top 100 \
|
||||
--output sweep_v9_exhaustive_epyc.csv \
|
||||
> v9_sweep_epyc.log 2>&1 &
|
||||
|
||||
echo "✅ Background sweep started (PID: $!)"
|
||||
echo "📋 Monitor progress: tail -f v9_sweep_epyc.log"
|
||||
echo "📊 Results will be in: sweep_v9_exhaustive_epyc.csv"
|
||||
Reference in New Issue
Block a user