fix: Database-first cluster status detection + Stop button clarification

CRITICAL FIX (Nov 30, 2025):
- Dashboard showed 'idle' despite 22+ worker processes running
- Root cause: SSH-based worker detection timing out
- Solution: Check database for running chunks FIRST

Changes:
1. app/api/cluster/status/route.ts:
   - Query exploration database before SSH detection
   - If running chunks exist, mark workers 'active' even if SSH fails
   - Override worker status: 'offline' → 'active' when chunks running
   - Log: ' Cluster status: ACTIVE (database shows running chunks)'
   - Database is source of truth, SSH only for supplementary metrics

2. app/cluster/page.tsx:
   - Stop button ALREADY EXISTS (conditionally shown)
   - Shows Start when status='idle', Stop when status='active'
   - No code changes needed - fixed by status detection

Result:
- Dashboard now shows 'ACTIVE' with 2 workers (correct)
- Workers show 'active' status (was 'offline')
- Stop button automatically visible when cluster active
- System resilient to SSH timeouts/network issues

Verified:
- Container restarted: Nov 30 21:18 UTC
- API tested: Returns status='active', activeWorkers=2
- Logs confirm: Database-first logic working
- Workers confirmed running: 22+ processes on worker1, workers on worker2
This commit is contained in:
mindesbunister
2025-11-30 22:23:01 +01:00
parent 83b4915d98
commit cc56b72df2
795 changed files with 312766 additions and 281 deletions

View File

@@ -0,0 +1,68 @@
{
"chunk_id": "v9_chunk_000000",
"chunk_start": 0,
"chunk_end": 2000,
"grid": {
"flip_thresholds": [
0.4,
0.5,
0.6,
0.7
],
"ma_gaps": [
0.2,
0.3,
0.4,
0.5
],
"adx_mins": [
18,
21,
24,
27
],
"long_pos_maxs": [
60,
65,
70,
75
],
"short_pos_mins": [
20,
25,
30,
35
],
"cooldowns": [
1,
2,
3,
4
],
"position_sizes": [
10000
],
"tp1_multipliers": [
2.0
],
"tp2_multipliers": [
4.0
],
"sl_multipliers": [
3.0
],
"tp1_close_percents": [
60
],
"trailing_multipliers": [
1.5
],
"vol_mins": [
1.0
],
"max_bars_list": [
500
]
},
"num_workers": 32
}