fix: Database-first cluster status detection + Stop button clarification

CRITICAL FIX (Nov 30, 2025):
- Dashboard showed 'idle' despite 22+ worker processes running
- Root cause: SSH-based worker detection timing out
- Solution: Check database for running chunks FIRST

Changes:
1. app/api/cluster/status/route.ts:
   - Query exploration database before SSH detection
   - If running chunks exist, mark workers 'active' even if SSH fails
   - Override worker status: 'offline' → 'active' when chunks running
   - Log: ' Cluster status: ACTIVE (database shows running chunks)'
   - Database is source of truth, SSH only for supplementary metrics

2. app/cluster/page.tsx:
   - Stop button ALREADY EXISTS (conditionally shown)
   - Shows Start when status='idle', Stop when status='active'
   - No code changes needed - fixed by status detection

Result:
- Dashboard now shows 'ACTIVE' with 2 workers (correct)
- Workers show 'active' status (was 'offline')
- Stop button automatically visible when cluster active
- System resilient to SSH timeouts/network issues

Verified:
- Container restarted: Nov 30 21:18 UTC
- API tested: Returns status='active', activeWorkers=2
- Logs confirm: Database-first logic working
- Workers confirmed running: 22+ processes on worker1, workers on worker2
This commit is contained in:
mindesbunister
2025-11-30 22:23:01 +01:00
parent 83b4915d98
commit cc56b72df2
795 changed files with 312766 additions and 281 deletions

149
EPYC_SETUP_COMPREHENSIVE.md Normal file
View File

@@ -0,0 +1,149 @@
# Running Comprehensive Sweep on EPYC Server
## Transfer Package to EPYC
```bash
# From your local machine
scp comprehensive_sweep_package.tar.gz root@72.62.39.24:/root/
```
## Setup on EPYC
```bash
# SSH to EPYC
ssh root@72.62.39.24
# Extract package
cd /root
tar -xzf comprehensive_sweep_package.tar.gz
cd comprehensive_sweep
# Setup Python environment
python3 -m venv .venv
source .venv/bin/activate
pip install pandas numpy
# Create logs directory
mkdir -p backtester/logs
# Make scripts executable
chmod +x run_comprehensive_sweep.sh
chmod +x backtester/scripts/comprehensive_sweep.py
```
## Run the Sweep
```bash
# Start the sweep in background
./run_comprehensive_sweep.sh
# Or manually with more control:
cd /root/comprehensive_sweep
source .venv/bin/activate
nohup python3 backtester/scripts/comprehensive_sweep.py > sweep.log 2>&1 &
# Get the PID
echo $! > sweep.pid
```
## Monitor Progress
```bash
# Watch live progress (updates every 100 configs)
tail -f backtester/logs/sweep_comprehensive_*.log
# Or if using manual method:
tail -f sweep.log
# See current best result
grep 'Best so far' backtester/logs/sweep_comprehensive_*.log | tail -5
# Check if still running
ps aux | grep comprehensive_sweep
# Check CPU usage
htop
```
## Stop if Needed
```bash
# Using PID file:
kill $(cat sweep.pid)
# Or by name:
pkill -f comprehensive_sweep
```
## EPYC Performance Estimate
- **Your EPYC:** 16 cores/32 threads
- **Local Server:** 6 cores
- **Speedup:** ~5-6× faster on EPYC
**Total combinations:** 14,929,920
**Estimated times:**
- Local (6 cores): ~30-40 hours
- EPYC (16 cores): ~6-8 hours 🚀
## Retrieve Results
```bash
# After completion, download results
scp root@72.62.39.24:/root/comprehensive_sweep/sweep_comprehensive.csv .
# Check top results on server first:
head -21 /root/comprehensive_sweep/sweep_comprehensive.csv
```
## Results Format
CSV columns:
- rank
- trades
- win_rate
- total_pnl
- pnl_per_1k (most important - profitability per $1000)
- flip_threshold
- ma_gap
- adx_min
- long_pos_max
- short_pos_min
- cooldown
- position_size
- tp1_mult
- tp2_mult
- sl_mult
- tp1_close_pct
- trailing_mult
- vol_min
- max_bars
## Quick Test
Before running full sweep, test that everything works:
```bash
cd /root/comprehensive_sweep
source .venv/bin/activate
# Quick test with just 10 combinations
python3 -c "
from pathlib import Path
from backtester.data_loader import load_csv
from backtester.simulator import simulate_money_line, TradeConfig
from backtester.indicators.money_line import MoneyLineInputs
data_slice = load_csv(Path('backtester/data/solusdt_5m_aug_nov.csv'), 'SOL-PERP', '5m')
print(f'Loaded {len(data_slice.data)} candles')
inputs = MoneyLineInputs(flip_threshold_percent=0.6)
config = TradeConfig(position_size=210.0)
results = simulate_money_line(data_slice.data, 'SOL-PERP', inputs, config)
print(f'Test: {len(results.trades)} trades, {results.win_rate*100:.1f}% WR, \${results.total_pnl:.2f} P&L')
print('✅ Everything working!')
"
```
If test passes, run the full sweep!