From 11a0ea324b851ed256429cfcf168d51f0a83a53a Mon Sep 17 00:00:00 2001 From: mindesbunister Date: Mon, 1 Dec 2025 14:59:08 +0100 Subject: [PATCH] critical: Fix distributed worker quality_filter - dict to lambda function Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when simulate_money_line() expects callable function. Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable. Fix: Changed to lambda function matching comprehensive_sweep.py pattern: quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes. --- cluster/CRITICAL_BUG_FIX_DEC1_2025.md | 144 ++++++++++++++++++++++++++ cluster/distributed_worker.py | 12 ++- 2 files changed, 151 insertions(+), 5 deletions(-) create mode 100644 cluster/CRITICAL_BUG_FIX_DEC1_2025.md diff --git a/cluster/CRITICAL_BUG_FIX_DEC1_2025.md b/cluster/CRITICAL_BUG_FIX_DEC1_2025.md new file mode 100644 index 0000000..8235df4 --- /dev/null +++ b/cluster/CRITICAL_BUG_FIX_DEC1_2025.md @@ -0,0 +1,144 @@ +# CRITICAL BUG FIX - Distributed Worker Quality Filter (Dec 1, 2025) + +## 🔥 Critical Bug Discovered + +**Date:** December 1, 2025, 14:40 UTC +**Impact:** ALL 2,096 backtests failed with `'dict' object is not callable` error +**Severity:** CRITICAL - Blocked all distributed work + +## Symptom + +All parameter combinations tested returned 0 trades: +- Chunk 0: 2,000 configs, all with `trades=0` +- Chunk 2: 96 configs, all with `trades=0` +- Worker logs showed: `Error testing config X: 'dict' object is not callable` (repeated 2,096 times) + +## Root Cause + +**File:** `cluster/distributed_worker.py` +**Lines:** 67-70 + +**BROKEN CODE:** +```python +# Quality filter (matches comprehensive_sweep.py) +quality_filter = { + 'min_adx': 15, + 'min_volume_ratio': vol_min, +} +``` + +**Problem:** Passing a `dict` object when `simulate_money_line()` expects a **callable function**. + +## Investigation Timeline + +1. **14:35** - User reported "something finished" +2. **14:40** - Discovered all 2,096 results had 0 trades +3. **14:45** - Found error in worker logs: `'dict' object is not callable` +4. **14:50** - Compared to `comprehensive_sweep.py` (working version) +5. **14:52** - **ROOT CAUSE IDENTIFIED**: dict vs lambda function +6. **14:55** - Fix applied and deployed +7. **15:00** - Fix verified working (workers at 100% CPU, no errors) + +## The Fix + +**BEFORE (BROKEN):** +```python +quality_filter = { + 'min_adx': 15, + 'min_volume_ratio': vol_min, +} +``` + +**AFTER (FIXED):** +```python +# CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict! +# Bug was passing dict which caused "'dict' object is not callable" error +if vol_min > 0: + quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min +else: + quality_filter = None +``` + +## Why It Broke + +In `backtester/simulator.py` (line 118): +```python +if not quality_filter(signal): + continue +``` + +The code calls `quality_filter()` as a **function**. When we passed a dict, Python tried to call a dict object, causing `'dict' object is not callable`. + +## How It Was Missed + +- Coordinator and worker infrastructure all worked correctly +- Data loaded successfully (34,273 rows) +- Multiprocessing started without errors +- Worker's exception handler caught the error and returned zeros +- **Silent failure:** No crash, just invalid results +- Files created looked successful (183KB) + +## Verification Steps + +1. ✅ Deployed fixed code to worker1 +2. ✅ Cleaned up invalid results and database +3. ✅ Restarted coordinator with fixed worker +4. ✅ Verified no `'dict' object is not callable` errors in logs +5. ✅ Confirmed 24 Python processes running at 100% CPU +6. ✅ Workers actively computing (no immediate errors for 2+ minutes) + +## Lessons Learned + +1. **Type matters:** Dict vs callable - subtle but critical difference +2. **Silent failures are dangerous:** Exception handler hid the severity +3. **Compare to working code:** `comprehensive_sweep.py` had correct pattern +4. **Verify results quality:** All zeros = red flag, investigate immediately +5. **Test fixes locally first:** Would have caught this earlier +6. **Add validation:** Should detect all-zero results and abort + +## Files Changed + +- `cluster/distributed_worker.py` - Fixed quality_filter (dict → lambda) + +## Commit + +```bash +git add cluster/distributed_worker.py cluster/CRITICAL_BUG_FIX_DEC1_2025.md +git commit -m "critical: Fix distributed worker quality_filter - dict to lambda function + +Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when +simulate_money_line() expects callable function. + +Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable. + +Fix: Changed to lambda function matching comprehensive_sweep.py pattern: + quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min + +Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes. +" +git push +``` + +## Status + +- ✅ Bug identified and fixed +- ✅ Code deployed to worker1 +- ✅ Coordinator restarted +- ✅ Workers actively processing (100% CPU, no errors) +- ⏳ Awaiting completion of chunk 0 (2,000 configs, ~22 minutes estimated) +- ⏳ Full sweep restart: 4,096 configs total + +## Expected Timeline + +- **Chunk 0:** ~22 minutes (2,000 configs) +- **Chunk 1:** ~22 minutes (2,000 configs) +- **Chunk 2:** ~1 minute (96 configs) +- **Total:** ~45 minutes for complete sweep + +## Next Steps + +1. Monitor chunk 0 completion (~10 minutes remaining) +2. Verify results have trades > 0 (not all zeros) +3. Import successful results to database +4. Analyze top performers +5. Deploy to worker2 for parallel processing diff --git a/cluster/distributed_worker.py b/cluster/distributed_worker.py index eca498b..a120ec3 100644 --- a/cluster/distributed_worker.py +++ b/cluster/distributed_worker.py @@ -63,11 +63,13 @@ def test_config(args): max_bars_per_trade=max_bars, ) - # Quality filter (matches comprehensive_sweep.py) - quality_filter = { - 'min_adx': 15, - 'min_volume_ratio': vol_min, - } + # Quality filter (matches comprehensive_sweep.py signature) + # CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict! + # Bug was passing dict which caused "'dict' object is not callable" error + if vol_min > 0: + quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min + else: + quality_filter = None # Run simulation try: