fix: Database-first cluster status detection + Stop button clarification
CRITICAL FIX (Nov 30, 2025):
- Dashboard showed 'idle' despite 22+ worker processes running
- Root cause: SSH-based worker detection timing out
- Solution: Check database for running chunks FIRST
Changes:
1. app/api/cluster/status/route.ts:
- Query exploration database before SSH detection
- If running chunks exist, mark workers 'active' even if SSH fails
- Override worker status: 'offline' → 'active' when chunks running
- Log: '✅ Cluster status: ACTIVE (database shows running chunks)'
- Database is source of truth, SSH only for supplementary metrics
2. app/cluster/page.tsx:
- Stop button ALREADY EXISTS (conditionally shown)
- Shows Start when status='idle', Stop when status='active'
- No code changes needed - fixed by status detection
Result:
- Dashboard now shows 'ACTIVE' with 2 workers (correct)
- Workers show 'active' status (was 'offline')
- Stop button automatically visible when cluster active
- System resilient to SSH timeouts/network issues
Verified:
- Container restarted: Nov 30 21:18 UTC
- API tested: Returns status='active', activeWorkers=2
- Logs confirm: Database-first logic working
- Workers confirmed running: 22+ processes on worker1, workers on worker2
This commit is contained in:
149
EPYC_SETUP_COMPREHENSIVE.md
Normal file
149
EPYC_SETUP_COMPREHENSIVE.md
Normal file
@@ -0,0 +1,149 @@
|
||||
# Running Comprehensive Sweep on EPYC Server
|
||||
|
||||
## Transfer Package to EPYC
|
||||
|
||||
```bash
|
||||
# From your local machine
|
||||
scp comprehensive_sweep_package.tar.gz root@72.62.39.24:/root/
|
||||
```
|
||||
|
||||
## Setup on EPYC
|
||||
|
||||
```bash
|
||||
# SSH to EPYC
|
||||
ssh root@72.62.39.24
|
||||
|
||||
# Extract package
|
||||
cd /root
|
||||
tar -xzf comprehensive_sweep_package.tar.gz
|
||||
cd comprehensive_sweep
|
||||
|
||||
# Setup Python environment
|
||||
python3 -m venv .venv
|
||||
source .venv/bin/activate
|
||||
pip install pandas numpy
|
||||
|
||||
# Create logs directory
|
||||
mkdir -p backtester/logs
|
||||
|
||||
# Make scripts executable
|
||||
chmod +x run_comprehensive_sweep.sh
|
||||
chmod +x backtester/scripts/comprehensive_sweep.py
|
||||
```
|
||||
|
||||
## Run the Sweep
|
||||
|
||||
```bash
|
||||
# Start the sweep in background
|
||||
./run_comprehensive_sweep.sh
|
||||
|
||||
# Or manually with more control:
|
||||
cd /root/comprehensive_sweep
|
||||
source .venv/bin/activate
|
||||
nohup python3 backtester/scripts/comprehensive_sweep.py > sweep.log 2>&1 &
|
||||
|
||||
# Get the PID
|
||||
echo $! > sweep.pid
|
||||
```
|
||||
|
||||
## Monitor Progress
|
||||
|
||||
```bash
|
||||
# Watch live progress (updates every 100 configs)
|
||||
tail -f backtester/logs/sweep_comprehensive_*.log
|
||||
|
||||
# Or if using manual method:
|
||||
tail -f sweep.log
|
||||
|
||||
# See current best result
|
||||
grep 'Best so far' backtester/logs/sweep_comprehensive_*.log | tail -5
|
||||
|
||||
# Check if still running
|
||||
ps aux | grep comprehensive_sweep
|
||||
|
||||
# Check CPU usage
|
||||
htop
|
||||
```
|
||||
|
||||
## Stop if Needed
|
||||
|
||||
```bash
|
||||
# Using PID file:
|
||||
kill $(cat sweep.pid)
|
||||
|
||||
# Or by name:
|
||||
pkill -f comprehensive_sweep
|
||||
```
|
||||
|
||||
## EPYC Performance Estimate
|
||||
|
||||
- **Your EPYC:** 16 cores/32 threads
|
||||
- **Local Server:** 6 cores
|
||||
- **Speedup:** ~5-6× faster on EPYC
|
||||
|
||||
**Total combinations:** 14,929,920
|
||||
|
||||
**Estimated times:**
|
||||
- Local (6 cores): ~30-40 hours
|
||||
- EPYC (16 cores): ~6-8 hours 🚀
|
||||
|
||||
## Retrieve Results
|
||||
|
||||
```bash
|
||||
# After completion, download results
|
||||
scp root@72.62.39.24:/root/comprehensive_sweep/sweep_comprehensive.csv .
|
||||
|
||||
# Check top results on server first:
|
||||
head -21 /root/comprehensive_sweep/sweep_comprehensive.csv
|
||||
```
|
||||
|
||||
## Results Format
|
||||
|
||||
CSV columns:
|
||||
- rank
|
||||
- trades
|
||||
- win_rate
|
||||
- total_pnl
|
||||
- pnl_per_1k (most important - profitability per $1000)
|
||||
- flip_threshold
|
||||
- ma_gap
|
||||
- adx_min
|
||||
- long_pos_max
|
||||
- short_pos_min
|
||||
- cooldown
|
||||
- position_size
|
||||
- tp1_mult
|
||||
- tp2_mult
|
||||
- sl_mult
|
||||
- tp1_close_pct
|
||||
- trailing_mult
|
||||
- vol_min
|
||||
- max_bars
|
||||
|
||||
## Quick Test
|
||||
|
||||
Before running full sweep, test that everything works:
|
||||
|
||||
```bash
|
||||
cd /root/comprehensive_sweep
|
||||
source .venv/bin/activate
|
||||
|
||||
# Quick test with just 10 combinations
|
||||
python3 -c "
|
||||
from pathlib import Path
|
||||
from backtester.data_loader import load_csv
|
||||
from backtester.simulator import simulate_money_line, TradeConfig
|
||||
from backtester.indicators.money_line import MoneyLineInputs
|
||||
|
||||
data_slice = load_csv(Path('backtester/data/solusdt_5m_aug_nov.csv'), 'SOL-PERP', '5m')
|
||||
print(f'Loaded {len(data_slice.data)} candles')
|
||||
|
||||
inputs = MoneyLineInputs(flip_threshold_percent=0.6)
|
||||
config = TradeConfig(position_size=210.0)
|
||||
results = simulate_money_line(data_slice.data, 'SOL-PERP', inputs, config)
|
||||
print(f'Test: {len(results.trades)} trades, {results.win_rate*100:.1f}% WR, \${results.total_pnl:.2f} P&L')
|
||||
print('✅ Everything working!')
|
||||
"
|
||||
```
|
||||
|
||||
If test passes, run the full sweep!
|
||||
Reference in New Issue
Block a user