critical: Fix distributed worker quality_filter - dict to lambda function

Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
simulate_money_line() expects callable function.

Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.

Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
  quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min

Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
This commit is contained in:
mindesbunister
2025-12-01 14:59:08 +01:00
parent a886555d44
commit 11a0ea324b
2 changed files with 151 additions and 5 deletions

View File

@@ -0,0 +1,144 @@
# CRITICAL BUG FIX - Distributed Worker Quality Filter (Dec 1, 2025)
## 🔥 Critical Bug Discovered
**Date:** December 1, 2025, 14:40 UTC
**Impact:** ALL 2,096 backtests failed with `'dict' object is not callable` error
**Severity:** CRITICAL - Blocked all distributed work
## Symptom
All parameter combinations tested returned 0 trades:
- Chunk 0: 2,000 configs, all with `trades=0`
- Chunk 2: 96 configs, all with `trades=0`
- Worker logs showed: `Error testing config X: 'dict' object is not callable` (repeated 2,096 times)
## Root Cause
**File:** `cluster/distributed_worker.py`
**Lines:** 67-70
**BROKEN CODE:**
```python
# Quality filter (matches comprehensive_sweep.py)
quality_filter = {
'min_adx': 15,
'min_volume_ratio': vol_min,
}
```
**Problem:** Passing a `dict` object when `simulate_money_line()` expects a **callable function**.
## Investigation Timeline
1. **14:35** - User reported "something finished"
2. **14:40** - Discovered all 2,096 results had 0 trades
3. **14:45** - Found error in worker logs: `'dict' object is not callable`
4. **14:50** - Compared to `comprehensive_sweep.py` (working version)
5. **14:52** - **ROOT CAUSE IDENTIFIED**: dict vs lambda function
6. **14:55** - Fix applied and deployed
7. **15:00** - Fix verified working (workers at 100% CPU, no errors)
## The Fix
**BEFORE (BROKEN):**
```python
quality_filter = {
'min_adx': 15,
'min_volume_ratio': vol_min,
}
```
**AFTER (FIXED):**
```python
# CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict!
# Bug was passing dict which caused "'dict' object is not callable" error
if vol_min > 0:
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
else:
quality_filter = None
```
## Why It Broke
In `backtester/simulator.py` (line 118):
```python
if not quality_filter(signal):
continue
```
The code calls `quality_filter()` as a **function**. When we passed a dict, Python tried to call a dict object, causing `'dict' object is not callable`.
## How It Was Missed
- Coordinator and worker infrastructure all worked correctly
- Data loaded successfully (34,273 rows)
- Multiprocessing started without errors
- Worker's exception handler caught the error and returned zeros
- **Silent failure:** No crash, just invalid results
- Files created looked successful (183KB)
## Verification Steps
1. ✅ Deployed fixed code to worker1
2. ✅ Cleaned up invalid results and database
3. ✅ Restarted coordinator with fixed worker
4. ✅ Verified no `'dict' object is not callable` errors in logs
5. ✅ Confirmed 24 Python processes running at 100% CPU
6. ✅ Workers actively computing (no immediate errors for 2+ minutes)
## Lessons Learned
1. **Type matters:** Dict vs callable - subtle but critical difference
2. **Silent failures are dangerous:** Exception handler hid the severity
3. **Compare to working code:** `comprehensive_sweep.py` had correct pattern
4. **Verify results quality:** All zeros = red flag, investigate immediately
5. **Test fixes locally first:** Would have caught this earlier
6. **Add validation:** Should detect all-zero results and abort
## Files Changed
- `cluster/distributed_worker.py` - Fixed quality_filter (dict → lambda)
## Commit
```bash
git add cluster/distributed_worker.py cluster/CRITICAL_BUG_FIX_DEC1_2025.md
git commit -m "critical: Fix distributed worker quality_filter - dict to lambda function
Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
simulate_money_line() expects callable function.
Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.
Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
"
git push
```
## Status
- ✅ Bug identified and fixed
- ✅ Code deployed to worker1
- ✅ Coordinator restarted
- ✅ Workers actively processing (100% CPU, no errors)
- ⏳ Awaiting completion of chunk 0 (2,000 configs, ~22 minutes estimated)
- ⏳ Full sweep restart: 4,096 configs total
## Expected Timeline
- **Chunk 0:** ~22 minutes (2,000 configs)
- **Chunk 1:** ~22 minutes (2,000 configs)
- **Chunk 2:** ~1 minute (96 configs)
- **Total:** ~45 minutes for complete sweep
## Next Steps
1. Monitor chunk 0 completion (~10 minutes remaining)
2. Verify results have trades > 0 (not all zeros)
3. Import successful results to database
4. Analyze top performers
5. Deploy to worker2 for parallel processing

View File

@@ -63,11 +63,13 @@ def test_config(args):
max_bars_per_trade=max_bars,
)
# Quality filter (matches comprehensive_sweep.py)
quality_filter = {
'min_adx': 15,
'min_volume_ratio': vol_min,
}
# Quality filter (matches comprehensive_sweep.py signature)
# CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict!
# Bug was passing dict which caused "'dict' object is not callable" error
if vol_min > 0:
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
else:
quality_filter = None
# Run simulation
try: