Files
trading_bot_v4/CLUSTER_START_BUTTON_FIX.md
mindesbunister 83b4915d98 fix: Reduce coordinator chunk_size from 10k to 2k for small explorations
- Changed default chunk_size from 10,000 to 2,000
- Fixes bug where coordinator exited immediately for 4,096 combo exploration
- Coordinator was calculating: chunk 1 starts at 10,000 > 4,096 total = 'all done'
- Now creates 2-3 appropriately-sized chunks for distribution
- Verified: Workers now start and process assigned chunks
- Status:  Docker rebuilt and deployed to port 3001
2025-11-30 22:07:59 +01:00

2.1 KiB
Raw Blame History

Cluster Start Button Fix - Nov 30, 2025

Problem

The cluster start button in the web dashboard was executing the coordinator command successfully, but the coordinator would exit immediately without doing any work.

Root Cause

The coordinator had a hardcoded chunk_size = 10,000 which was designed for large explorations with millions of combinations. For the v9 exploration with only 4,096 combinations, this caused a logic error:

📋 Resuming from chunk 1 (found 1 existing chunks)
   Starting at combo 10,000 / 4,096

The coordinator calculated that chunk 1 would start at combo 10,000 (chunk_size × chunk_id), but since 10,000 > 4,096 total combos, it thought all work was complete and exited immediately.

Fix Applied

Changed the default chunk_size from 10,000 to 2,000 in cluster/distributed_coordinator.py:

# Before:
parser.add_argument('--chunk-size', type=int, default=10000,
                   help='Number of combinations per chunk (default: 10000)')

# After:
parser.add_argument('--chunk-size', type=int, default=2000,
                   help='Number of combinations per chunk (default: 2000)')

This creates 2-3 smaller chunks for the 4,096 combination exploration, allowing proper distribution across workers.

Verification

  1. Manual coordinator run created chunks successfully
  2. Both workers (worker1 and worker2) started processing
  3. Docker image rebuilt with fix
  4. Container deployed and running

Result

The start button now works correctly:

  • Coordinator creates appropriate-sized chunks
  • Workers are assigned work
  • Exploration runs to completion
  • Progress is tracked in the database

Next Steps

You can now use the start button in the web dashboard at http://10.0.0.48:3001/cluster to start explorations. The system will:

  1. Create 2-3 chunks of ~2,000 combinations each
  2. Distribute to worker1 and worker2
  3. Run for ~30-60 minutes to complete 4,096 combinations
  4. Save top 100 results to CSV
  5. Update dashboard with live progress

Files Modified

  • cluster/distributed_coordinator.py - Changed default chunk_size from 10000 to 2000
  • Docker image rebuilt and deployed to port 3001