fix: Database-first cluster status detection + Stop button clarification
CRITICAL FIX (Nov 30, 2025):
- Dashboard showed 'idle' despite 22+ worker processes running
- Root cause: SSH-based worker detection timing out
- Solution: Check database for running chunks FIRST
Changes:
1. app/api/cluster/status/route.ts:
- Query exploration database before SSH detection
- If running chunks exist, mark workers 'active' even if SSH fails
- Override worker status: 'offline' → 'active' when chunks running
- Log: '✅ Cluster status: ACTIVE (database shows running chunks)'
- Database is source of truth, SSH only for supplementary metrics
2. app/cluster/page.tsx:
- Stop button ALREADY EXISTS (conditionally shown)
- Shows Start when status='idle', Stop when status='active'
- No code changes needed - fixed by status detection
Result:
- Dashboard now shows 'ACTIVE' with 2 workers (correct)
- Workers show 'active' status (was 'offline')
- Stop button automatically visible when cluster active
- System resilient to SSH timeouts/network issues
Verified:
- Container restarted: Nov 30 21:18 UTC
- API tested: Returns status='active', activeWorkers=2
- Logs confirm: Database-first logic working
- Workers confirmed running: 22+ processes on worker1, workers on worker2
This commit is contained in:
52
run_comprehensive_sweep.sh
Executable file
52
run_comprehensive_sweep.sh
Executable file
@@ -0,0 +1,52 @@
|
||||
#!/bin/bash
|
||||
# Run comprehensive parameter sweep in background with logging
|
||||
|
||||
cd /home/icke/traderv4
|
||||
|
||||
# Activate virtual environment
|
||||
source .backtester/bin/activate
|
||||
|
||||
# Create logs directory if not exists
|
||||
mkdir -p backtester/logs
|
||||
|
||||
# Generate timestamp for log file
|
||||
TIMESTAMP=$(date +"%Y%m%d_%H%M%S")
|
||||
LOGFILE="backtester/logs/sweep_comprehensive_${TIMESTAMP}.log"
|
||||
|
||||
echo "Starting comprehensive parameter sweep..."
|
||||
echo "Log file: $LOGFILE"
|
||||
echo ""
|
||||
echo "To monitor progress in real-time, run:"
|
||||
echo " tail -f $LOGFILE"
|
||||
echo ""
|
||||
echo "To check if still running:"
|
||||
echo " ps aux | grep comprehensive_sweep"
|
||||
echo ""
|
||||
echo "To stop the sweep:"
|
||||
echo " pkill -f comprehensive_sweep"
|
||||
echo ""
|
||||
|
||||
# Run sweep in background with output redirection
|
||||
nohup python3 backtester/scripts/comprehensive_sweep.py > "$LOGFILE" 2>&1 &
|
||||
|
||||
# Get PID
|
||||
SWEEP_PID=$!
|
||||
|
||||
echo "Sweep started with PID: $SWEEP_PID"
|
||||
echo "Process running in background with nohup"
|
||||
echo ""
|
||||
echo "Quick commands:"
|
||||
echo " tail -f $LOGFILE # Watch progress"
|
||||
echo " tail -100 $LOGFILE # Last 100 lines"
|
||||
echo " grep 'Best so far' $LOGFILE # See current best"
|
||||
echo " kill $SWEEP_PID # Stop sweep"
|
||||
echo ""
|
||||
|
||||
# Wait a moment and check if process started
|
||||
sleep 2
|
||||
if ps -p $SWEEP_PID > /dev/null 2>&1; then
|
||||
echo "✅ Sweep running successfully (PID: $SWEEP_PID)"
|
||||
else
|
||||
echo "❌ Sweep may have failed to start. Check log:"
|
||||
echo " tail -50 $LOGFILE"
|
||||
fi
|
||||
Reference in New Issue
Block a user