Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
simulate_money_line() expects callable function.
Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.
Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
4.4 KiB
4.4 KiB
CRITICAL BUG FIX - Distributed Worker Quality Filter (Dec 1, 2025)
🔥 Critical Bug Discovered
Date: December 1, 2025, 14:40 UTC
Impact: ALL 2,096 backtests failed with 'dict' object is not callable error
Severity: CRITICAL - Blocked all distributed work
Symptom
All parameter combinations tested returned 0 trades:
- Chunk 0: 2,000 configs, all with
trades=0 - Chunk 2: 96 configs, all with
trades=0 - Worker logs showed:
Error testing config X: 'dict' object is not callable(repeated 2,096 times)
Root Cause
File: cluster/distributed_worker.py
Lines: 67-70
BROKEN CODE:
# Quality filter (matches comprehensive_sweep.py)
quality_filter = {
'min_adx': 15,
'min_volume_ratio': vol_min,
}
Problem: Passing a dict object when simulate_money_line() expects a callable function.
Investigation Timeline
- 14:35 - User reported "something finished"
- 14:40 - Discovered all 2,096 results had 0 trades
- 14:45 - Found error in worker logs:
'dict' object is not callable - 14:50 - Compared to
comprehensive_sweep.py(working version) - 14:52 - ROOT CAUSE IDENTIFIED: dict vs lambda function
- 14:55 - Fix applied and deployed
- 15:00 - Fix verified working (workers at 100% CPU, no errors)
The Fix
BEFORE (BROKEN):
quality_filter = {
'min_adx': 15,
'min_volume_ratio': vol_min,
}
AFTER (FIXED):
# CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict!
# Bug was passing dict which caused "'dict' object is not callable" error
if vol_min > 0:
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
else:
quality_filter = None
Why It Broke
In backtester/simulator.py (line 118):
if not quality_filter(signal):
continue
The code calls quality_filter() as a function. When we passed a dict, Python tried to call a dict object, causing 'dict' object is not callable.
How It Was Missed
- Coordinator and worker infrastructure all worked correctly
- Data loaded successfully (34,273 rows)
- Multiprocessing started without errors
- Worker's exception handler caught the error and returned zeros
- Silent failure: No crash, just invalid results
- Files created looked successful (183KB)
Verification Steps
- ✅ Deployed fixed code to worker1
- ✅ Cleaned up invalid results and database
- ✅ Restarted coordinator with fixed worker
- ✅ Verified no
'dict' object is not callableerrors in logs - ✅ Confirmed 24 Python processes running at 100% CPU
- ✅ Workers actively computing (no immediate errors for 2+ minutes)
Lessons Learned
- Type matters: Dict vs callable - subtle but critical difference
- Silent failures are dangerous: Exception handler hid the severity
- Compare to working code:
comprehensive_sweep.pyhad correct pattern - Verify results quality: All zeros = red flag, investigate immediately
- Test fixes locally first: Would have caught this earlier
- Add validation: Should detect all-zero results and abort
Files Changed
cluster/distributed_worker.py- Fixed quality_filter (dict → lambda)
Commit
git add cluster/distributed_worker.py cluster/CRITICAL_BUG_FIX_DEC1_2025.md
git commit -m "critical: Fix distributed worker quality_filter - dict to lambda function
Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
simulate_money_line() expects callable function.
Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.
Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
"
git push
Status
- ✅ Bug identified and fixed
- ✅ Code deployed to worker1
- ✅ Coordinator restarted
- ✅ Workers actively processing (100% CPU, no errors)
- ⏳ Awaiting completion of chunk 0 (2,000 configs, ~22 minutes estimated)
- ⏳ Full sweep restart: 4,096 configs total
Expected Timeline
- Chunk 0: ~22 minutes (2,000 configs)
- Chunk 1: ~22 minutes (2,000 configs)
- Chunk 2: ~1 minute (96 configs)
- Total: ~45 minutes for complete sweep
Next Steps
- Monitor chunk 0 completion (~10 minutes remaining)
- Verify results have trades > 0 (not all zeros)
- Import successful results to database
- Analyze top performers
- Deploy to worker2 for parallel processing