Problem:
- Start button showed 'already running' when cluster wasn't actually running
- Database had stale chunks in 'running' state from crashed/killed coordinator
- Control endpoint checked process but not database state
Solution:
1. Reset stale 'running' chunks to 'pending' before starting coordinator
2. Verify coordinator not running before starting (prevent duplicates)
3. Add database cleanup to stop action as well (prevent future stale states)
4. Enhanced error reporting with coordinator log output
Changes:
- app/api/cluster/control/route.ts
- Added database cleanup in start action (reset running chunks)
- Added process check before start (prevent duplicates)
- Added database cleanup in stop action (cleanup orphaned state)
- Added coordinator log output on start failure
- Improved error messages and logging
Impact:
- Start button now works correctly even after unclean coordinator shutdown
- Prevents false 'already running' reports
- Automatic cleanup of stale database state
- Better error diagnostics
Verified:
- Container rebuilt and restarted successfully
- Cluster status shows 'idle' after database cleanup
- Ready for user to test start button functionality