CRITICAL FIX (Nov 30, 2025):
- Dashboard showed 'idle' despite 22+ worker processes running
- Root cause: SSH-based worker detection timing out
- Solution: Check database for running chunks FIRST
Changes:
1. app/api/cluster/status/route.ts:
- Query exploration database before SSH detection
- If running chunks exist, mark workers 'active' even if SSH fails
- Override worker status: 'offline' → 'active' when chunks running
- Log: '✅ Cluster status: ACTIVE (database shows running chunks)'
- Database is source of truth, SSH only for supplementary metrics
2. app/cluster/page.tsx:
- Stop button ALREADY EXISTS (conditionally shown)
- Shows Start when status='idle', Stop when status='active'
- No code changes needed - fixed by status detection
Result:
- Dashboard now shows 'ACTIVE' with 2 workers (correct)
- Workers show 'active' status (was 'offline')
- Stop button automatically visible when cluster active
- System resilient to SSH timeouts/network issues
Verified:
- Container restarted: Nov 30 21:18 UTC
- API tested: Returns status='active', activeWorkers=2
- Logs confirm: Database-first logic working
- Workers confirmed running: 22+ processes on worker1, workers on worker2
150 lines
3.0 KiB
Markdown
150 lines
3.0 KiB
Markdown
# Running Comprehensive Sweep on EPYC Server
|
||
|
||
## Transfer Package to EPYC
|
||
|
||
```bash
|
||
# From your local machine
|
||
scp comprehensive_sweep_package.tar.gz root@72.62.39.24:/root/
|
||
```
|
||
|
||
## Setup on EPYC
|
||
|
||
```bash
|
||
# SSH to EPYC
|
||
ssh root@72.62.39.24
|
||
|
||
# Extract package
|
||
cd /root
|
||
tar -xzf comprehensive_sweep_package.tar.gz
|
||
cd comprehensive_sweep
|
||
|
||
# Setup Python environment
|
||
python3 -m venv .venv
|
||
source .venv/bin/activate
|
||
pip install pandas numpy
|
||
|
||
# Create logs directory
|
||
mkdir -p backtester/logs
|
||
|
||
# Make scripts executable
|
||
chmod +x run_comprehensive_sweep.sh
|
||
chmod +x backtester/scripts/comprehensive_sweep.py
|
||
```
|
||
|
||
## Run the Sweep
|
||
|
||
```bash
|
||
# Start the sweep in background
|
||
./run_comprehensive_sweep.sh
|
||
|
||
# Or manually with more control:
|
||
cd /root/comprehensive_sweep
|
||
source .venv/bin/activate
|
||
nohup python3 backtester/scripts/comprehensive_sweep.py > sweep.log 2>&1 &
|
||
|
||
# Get the PID
|
||
echo $! > sweep.pid
|
||
```
|
||
|
||
## Monitor Progress
|
||
|
||
```bash
|
||
# Watch live progress (updates every 100 configs)
|
||
tail -f backtester/logs/sweep_comprehensive_*.log
|
||
|
||
# Or if using manual method:
|
||
tail -f sweep.log
|
||
|
||
# See current best result
|
||
grep 'Best so far' backtester/logs/sweep_comprehensive_*.log | tail -5
|
||
|
||
# Check if still running
|
||
ps aux | grep comprehensive_sweep
|
||
|
||
# Check CPU usage
|
||
htop
|
||
```
|
||
|
||
## Stop if Needed
|
||
|
||
```bash
|
||
# Using PID file:
|
||
kill $(cat sweep.pid)
|
||
|
||
# Or by name:
|
||
pkill -f comprehensive_sweep
|
||
```
|
||
|
||
## EPYC Performance Estimate
|
||
|
||
- **Your EPYC:** 16 cores/32 threads
|
||
- **Local Server:** 6 cores
|
||
- **Speedup:** ~5-6× faster on EPYC
|
||
|
||
**Total combinations:** 14,929,920
|
||
|
||
**Estimated times:**
|
||
- Local (6 cores): ~30-40 hours
|
||
- EPYC (16 cores): ~6-8 hours 🚀
|
||
|
||
## Retrieve Results
|
||
|
||
```bash
|
||
# After completion, download results
|
||
scp root@72.62.39.24:/root/comprehensive_sweep/sweep_comprehensive.csv .
|
||
|
||
# Check top results on server first:
|
||
head -21 /root/comprehensive_sweep/sweep_comprehensive.csv
|
||
```
|
||
|
||
## Results Format
|
||
|
||
CSV columns:
|
||
- rank
|
||
- trades
|
||
- win_rate
|
||
- total_pnl
|
||
- pnl_per_1k (most important - profitability per $1000)
|
||
- flip_threshold
|
||
- ma_gap
|
||
- adx_min
|
||
- long_pos_max
|
||
- short_pos_min
|
||
- cooldown
|
||
- position_size
|
||
- tp1_mult
|
||
- tp2_mult
|
||
- sl_mult
|
||
- tp1_close_pct
|
||
- trailing_mult
|
||
- vol_min
|
||
- max_bars
|
||
|
||
## Quick Test
|
||
|
||
Before running full sweep, test that everything works:
|
||
|
||
```bash
|
||
cd /root/comprehensive_sweep
|
||
source .venv/bin/activate
|
||
|
||
# Quick test with just 10 combinations
|
||
python3 -c "
|
||
from pathlib import Path
|
||
from backtester.data_loader import load_csv
|
||
from backtester.simulator import simulate_money_line, TradeConfig
|
||
from backtester.indicators.money_line import MoneyLineInputs
|
||
|
||
data_slice = load_csv(Path('backtester/data/solusdt_5m_aug_nov.csv'), 'SOL-PERP', '5m')
|
||
print(f'Loaded {len(data_slice.data)} candles')
|
||
|
||
inputs = MoneyLineInputs(flip_threshold_percent=0.6)
|
||
config = TradeConfig(position_size=210.0)
|
||
results = simulate_money_line(data_slice.data, 'SOL-PERP', inputs, config)
|
||
print(f'Test: {len(results.trades)} trades, {results.win_rate*100:.1f}% WR, \${results.total_pnl:.2f} P&L')
|
||
print('✅ Everything working!')
|
||
"
|
||
```
|
||
|
||
If test passes, run the full sweep!
|