CRITICAL FIX (Nov 30, 2025):
- Dashboard showed 'idle' despite 22+ worker processes running
- Root cause: SSH-based worker detection timing out
- Solution: Check database for running chunks FIRST
Changes:
1. app/api/cluster/status/route.ts:
- Query exploration database before SSH detection
- If running chunks exist, mark workers 'active' even if SSH fails
- Override worker status: 'offline' → 'active' when chunks running
- Log: '✅ Cluster status: ACTIVE (database shows running chunks)'
- Database is source of truth, SSH only for supplementary metrics
2. app/cluster/page.tsx:
- Stop button ALREADY EXISTS (conditionally shown)
- Shows Start when status='idle', Stop when status='active'
- No code changes needed - fixed by status detection
Result:
- Dashboard now shows 'ACTIVE' with 2 workers (correct)
- Workers show 'active' status (was 'offline')
- Stop button automatically visible when cluster active
- System resilient to SSH timeouts/network issues
Verified:
- Container restarted: Nov 30 21:18 UTC
- API tested: Returns status='active', activeWorkers=2
- Logs confirm: Database-first logic working
- Workers confirmed running: 22+ processes on worker1, workers on worker2
38 lines
1.0 KiB
Bash
Executable File
38 lines
1.0 KiB
Bash
Executable File
#!/bin/bash
|
|
# Setup script for v9 backtesting on EPYC machine
|
|
# Run this after extracting backtest_v9_sweep.tar.gz
|
|
|
|
echo "🔧 Setting up v9 backtest environment on EPYC..."
|
|
|
|
# Check Python version
|
|
PYTHON_VERSION=$(python3 --version 2>&1 | awk '{print $2}')
|
|
echo "✅ Python version: $PYTHON_VERSION"
|
|
|
|
# Check if python3-venv is installed
|
|
if ! dpkg -l | grep -q python3.*-venv; then
|
|
echo "📦 Installing python3-venv package..."
|
|
apt update
|
|
apt install -y python3-venv
|
|
fi
|
|
|
|
# Create virtual environment
|
|
echo "📦 Creating Python virtual environment..."
|
|
python3 -m venv .venv
|
|
|
|
# Activate and install dependencies
|
|
echo "📥 Installing dependencies (pandas, numpy)..."
|
|
source .venv/bin/activate
|
|
pip3 install --upgrade pip
|
|
pip3 install pandas numpy
|
|
|
|
echo ""
|
|
echo "✅ Setup complete!"
|
|
echo ""
|
|
echo "🚀 To run the EXHAUSTIVE sweep:"
|
|
echo " source .venv/bin/activate"
|
|
echo " ./run_sweep_epyc.sh"
|
|
echo ""
|
|
echo "📊 65,536 combinations with 24 workers"
|
|
echo "⏱️ Expected completion time: ~29 hours"
|
|
echo "📋 Monitor progress: tail -f v9_sweep_epyc.log"
|