Files
trading_bot_v4/cluster/CRITICAL_BUG_FIX_DEC1_2025.md
mindesbunister 11a0ea324b critical: Fix distributed worker quality_filter - dict to lambda function
Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
simulate_money_line() expects callable function.

Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.

Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
  quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min

Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
2025-12-01 14:59:08 +01:00

4.4 KiB

CRITICAL BUG FIX - Distributed Worker Quality Filter (Dec 1, 2025)

🔥 Critical Bug Discovered

Date: December 1, 2025, 14:40 UTC Impact: ALL 2,096 backtests failed with 'dict' object is not callable error Severity: CRITICAL - Blocked all distributed work

Symptom

All parameter combinations tested returned 0 trades:

  • Chunk 0: 2,000 configs, all with trades=0
  • Chunk 2: 96 configs, all with trades=0
  • Worker logs showed: Error testing config X: 'dict' object is not callable (repeated 2,096 times)

Root Cause

File: cluster/distributed_worker.py Lines: 67-70

BROKEN CODE:

# Quality filter (matches comprehensive_sweep.py)
quality_filter = {
    'min_adx': 15,
    'min_volume_ratio': vol_min,
}

Problem: Passing a dict object when simulate_money_line() expects a callable function.

Investigation Timeline

  1. 14:35 - User reported "something finished"
  2. 14:40 - Discovered all 2,096 results had 0 trades
  3. 14:45 - Found error in worker logs: 'dict' object is not callable
  4. 14:50 - Compared to comprehensive_sweep.py (working version)
  5. 14:52 - ROOT CAUSE IDENTIFIED: dict vs lambda function
  6. 14:55 - Fix applied and deployed
  7. 15:00 - Fix verified working (workers at 100% CPU, no errors)

The Fix

BEFORE (BROKEN):

quality_filter = {
    'min_adx': 15,
    'min_volume_ratio': vol_min,
}

AFTER (FIXED):

# CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict!
# Bug was passing dict which caused "'dict' object is not callable" error
if vol_min > 0:
    quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
else:
    quality_filter = None

Why It Broke

In backtester/simulator.py (line 118):

if not quality_filter(signal):
    continue

The code calls quality_filter() as a function. When we passed a dict, Python tried to call a dict object, causing 'dict' object is not callable.

How It Was Missed

  • Coordinator and worker infrastructure all worked correctly
  • Data loaded successfully (34,273 rows)
  • Multiprocessing started without errors
  • Worker's exception handler caught the error and returned zeros
  • Silent failure: No crash, just invalid results
  • Files created looked successful (183KB)

Verification Steps

  1. Deployed fixed code to worker1
  2. Cleaned up invalid results and database
  3. Restarted coordinator with fixed worker
  4. Verified no 'dict' object is not callable errors in logs
  5. Confirmed 24 Python processes running at 100% CPU
  6. Workers actively computing (no immediate errors for 2+ minutes)

Lessons Learned

  1. Type matters: Dict vs callable - subtle but critical difference
  2. Silent failures are dangerous: Exception handler hid the severity
  3. Compare to working code: comprehensive_sweep.py had correct pattern
  4. Verify results quality: All zeros = red flag, investigate immediately
  5. Test fixes locally first: Would have caught this earlier
  6. Add validation: Should detect all-zero results and abort

Files Changed

  • cluster/distributed_worker.py - Fixed quality_filter (dict → lambda)

Commit

git add cluster/distributed_worker.py cluster/CRITICAL_BUG_FIX_DEC1_2025.md
git commit -m "critical: Fix distributed worker quality_filter - dict to lambda function

Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
simulate_money_line() expects callable function.

Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.

Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
  quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min

Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
"
git push

Status

  • Bug identified and fixed
  • Code deployed to worker1
  • Coordinator restarted
  • Workers actively processing (100% CPU, no errors)
  • Awaiting completion of chunk 0 (2,000 configs, ~22 minutes estimated)
  • Full sweep restart: 4,096 configs total

Expected Timeline

  • Chunk 0: ~22 minutes (2,000 configs)
  • Chunk 1: ~22 minutes (2,000 configs)
  • Chunk 2: ~1 minute (96 configs)
  • Total: ~45 minutes for complete sweep

Next Steps

  1. Monitor chunk 0 completion (~10 minutes remaining)
  2. Verify results have trades > 0 (not all zeros)
  3. Import successful results to database
  4. Analyze top performers
  5. Deploy to worker2 for parallel processing