fix: Database-first cluster status detection + Stop button clarification
CRITICAL FIX (Nov 30, 2025):
- Dashboard showed 'idle' despite 22+ worker processes running
- Root cause: SSH-based worker detection timing out
- Solution: Check database for running chunks FIRST
Changes:
1. app/api/cluster/status/route.ts:
- Query exploration database before SSH detection
- If running chunks exist, mark workers 'active' even if SSH fails
- Override worker status: 'offline' → 'active' when chunks running
- Log: '✅ Cluster status: ACTIVE (database shows running chunks)'
- Database is source of truth, SSH only for supplementary metrics
2. app/cluster/page.tsx:
- Stop button ALREADY EXISTS (conditionally shown)
- Shows Start when status='idle', Stop when status='active'
- No code changes needed - fixed by status detection
Result:
- Dashboard now shows 'ACTIVE' with 2 workers (correct)
- Workers show 'active' status (was 'offline')
- Stop button automatically visible when cluster active
- System resilient to SSH timeouts/network issues
Verified:
- Container restarted: Nov 30 21:18 UTC
- API tested: Returns status='active', activeWorkers=2
- Logs confirm: Database-first logic working
- Workers confirmed running: 22+ processes on worker1, workers on worker2
This commit is contained in:
@@ -219,9 +219,20 @@ class ProgressBar:
|
||||
elapsed = time.time() - self.start_time
|
||||
rate = elapsed / count if count else 0
|
||||
remaining = rate * (self.total - count) if rate else 0
|
||||
|
||||
# Format elapsed time
|
||||
elapsed_hours = int(elapsed // 3600)
|
||||
elapsed_mins = int((elapsed % 3600) // 60)
|
||||
elapsed_str = f"{elapsed_hours}h {elapsed_mins}m" if elapsed_hours > 0 else f"{elapsed_mins}m"
|
||||
|
||||
# Format remaining time
|
||||
remaining_hours = int(remaining // 3600)
|
||||
remaining_mins = int((remaining % 3600) // 60)
|
||||
remaining_str = f"{remaining_hours}h {remaining_mins}m" if remaining_hours > 0 else f"{remaining_mins}m"
|
||||
|
||||
sys.stdout.write(
|
||||
f"\r[{bar}] {percent*100:5.1f}% ({count}/{self.total}) | "
|
||||
f"Elapsed {elapsed:6.1f}s | ETA {remaining:6.1f}s"
|
||||
f"Elapsed {elapsed_str:>7} | ETA {remaining_str:>7}"
|
||||
)
|
||||
sys.stdout.flush()
|
||||
if count >= self.total:
|
||||
|
||||
Reference in New Issue
Block a user