# CRITICAL BUG FIX - Distributed Worker Quality Filter (Dec 1, 2025) ## 🔥 Critical Bug Discovered **Date:** December 1, 2025, 14:40 UTC **Impact:** ALL 2,096 backtests failed with `'dict' object is not callable` error **Severity:** CRITICAL - Blocked all distributed work ## Symptom All parameter combinations tested returned 0 trades: - Chunk 0: 2,000 configs, all with `trades=0` - Chunk 2: 96 configs, all with `trades=0` - Worker logs showed: `Error testing config X: 'dict' object is not callable` (repeated 2,096 times) ## Root Cause **File:** `cluster/distributed_worker.py` **Lines:** 67-70 **BROKEN CODE:** ```python # Quality filter (matches comprehensive_sweep.py) quality_filter = { 'min_adx': 15, 'min_volume_ratio': vol_min, } ``` **Problem:** Passing a `dict` object when `simulate_money_line()` expects a **callable function**. ## Investigation Timeline 1. **14:35** - User reported "something finished" 2. **14:40** - Discovered all 2,096 results had 0 trades 3. **14:45** - Found error in worker logs: `'dict' object is not callable` 4. **14:50** - Compared to `comprehensive_sweep.py` (working version) 5. **14:52** - **ROOT CAUSE IDENTIFIED**: dict vs lambda function 6. **14:55** - Fix applied and deployed 7. **15:00** - Fix verified working (workers at 100% CPU, no errors) ## The Fix **BEFORE (BROKEN):** ```python quality_filter = { 'min_adx': 15, 'min_volume_ratio': vol_min, } ``` **AFTER (FIXED):** ```python # CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict! # Bug was passing dict which caused "'dict' object is not callable" error if vol_min > 0: quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min else: quality_filter = None ``` ## Why It Broke In `backtester/simulator.py` (line 118): ```python if not quality_filter(signal): continue ``` The code calls `quality_filter()` as a **function**. When we passed a dict, Python tried to call a dict object, causing `'dict' object is not callable`. ## How It Was Missed - Coordinator and worker infrastructure all worked correctly - Data loaded successfully (34,273 rows) - Multiprocessing started without errors - Worker's exception handler caught the error and returned zeros - **Silent failure:** No crash, just invalid results - Files created looked successful (183KB) ## Verification Steps 1. ✅ Deployed fixed code to worker1 2. ✅ Cleaned up invalid results and database 3. ✅ Restarted coordinator with fixed worker 4. ✅ Verified no `'dict' object is not callable` errors in logs 5. ✅ Confirmed 24 Python processes running at 100% CPU 6. ✅ Workers actively computing (no immediate errors for 2+ minutes) ## Lessons Learned 1. **Type matters:** Dict vs callable - subtle but critical difference 2. **Silent failures are dangerous:** Exception handler hid the severity 3. **Compare to working code:** `comprehensive_sweep.py` had correct pattern 4. **Verify results quality:** All zeros = red flag, investigate immediately 5. **Test fixes locally first:** Would have caught this earlier 6. **Add validation:** Should detect all-zero results and abort ## Files Changed - `cluster/distributed_worker.py` - Fixed quality_filter (dict → lambda) ## Commit ```bash git add cluster/distributed_worker.py cluster/CRITICAL_BUG_FIX_DEC1_2025.md git commit -m "critical: Fix distributed worker quality_filter - dict to lambda function Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when simulate_money_line() expects callable function. Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable. Fix: Changed to lambda function matching comprehensive_sweep.py pattern: quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes. " git push ``` ## Status - ✅ Bug identified and fixed - ✅ Code deployed to worker1 - ✅ Coordinator restarted - ✅ Workers actively processing (100% CPU, no errors) - ⏳ Awaiting completion of chunk 0 (2,000 configs, ~22 minutes estimated) - ⏳ Full sweep restart: 4,096 configs total ## Expected Timeline - **Chunk 0:** ~22 minutes (2,000 configs) - **Chunk 1:** ~22 minutes (2,000 configs) - **Chunk 2:** ~1 minute (96 configs) - **Total:** ~45 minutes for complete sweep ## Next Steps 1. Monitor chunk 0 completion (~10 minutes remaining) 2. Verify results have trades > 0 (not all zeros) 3. Import successful results to database 4. Analyze top performers 5. Deploy to worker2 for parallel processing