critical: Fix distributed worker quality_filter - dict to lambda function
Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
simulate_money_line() expects callable function.
Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.
Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
This commit is contained in:
144
cluster/CRITICAL_BUG_FIX_DEC1_2025.md
Normal file
144
cluster/CRITICAL_BUG_FIX_DEC1_2025.md
Normal file
@@ -0,0 +1,144 @@
|
||||
# CRITICAL BUG FIX - Distributed Worker Quality Filter (Dec 1, 2025)
|
||||
|
||||
## 🔥 Critical Bug Discovered
|
||||
|
||||
**Date:** December 1, 2025, 14:40 UTC
|
||||
**Impact:** ALL 2,096 backtests failed with `'dict' object is not callable` error
|
||||
**Severity:** CRITICAL - Blocked all distributed work
|
||||
|
||||
## Symptom
|
||||
|
||||
All parameter combinations tested returned 0 trades:
|
||||
- Chunk 0: 2,000 configs, all with `trades=0`
|
||||
- Chunk 2: 96 configs, all with `trades=0`
|
||||
- Worker logs showed: `Error testing config X: 'dict' object is not callable` (repeated 2,096 times)
|
||||
|
||||
## Root Cause
|
||||
|
||||
**File:** `cluster/distributed_worker.py`
|
||||
**Lines:** 67-70
|
||||
|
||||
**BROKEN CODE:**
|
||||
```python
|
||||
# Quality filter (matches comprehensive_sweep.py)
|
||||
quality_filter = {
|
||||
'min_adx': 15,
|
||||
'min_volume_ratio': vol_min,
|
||||
}
|
||||
```
|
||||
|
||||
**Problem:** Passing a `dict` object when `simulate_money_line()` expects a **callable function**.
|
||||
|
||||
## Investigation Timeline
|
||||
|
||||
1. **14:35** - User reported "something finished"
|
||||
2. **14:40** - Discovered all 2,096 results had 0 trades
|
||||
3. **14:45** - Found error in worker logs: `'dict' object is not callable`
|
||||
4. **14:50** - Compared to `comprehensive_sweep.py` (working version)
|
||||
5. **14:52** - **ROOT CAUSE IDENTIFIED**: dict vs lambda function
|
||||
6. **14:55** - Fix applied and deployed
|
||||
7. **15:00** - Fix verified working (workers at 100% CPU, no errors)
|
||||
|
||||
## The Fix
|
||||
|
||||
**BEFORE (BROKEN):**
|
||||
```python
|
||||
quality_filter = {
|
||||
'min_adx': 15,
|
||||
'min_volume_ratio': vol_min,
|
||||
}
|
||||
```
|
||||
|
||||
**AFTER (FIXED):**
|
||||
```python
|
||||
# CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict!
|
||||
# Bug was passing dict which caused "'dict' object is not callable" error
|
||||
if vol_min > 0:
|
||||
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
|
||||
else:
|
||||
quality_filter = None
|
||||
```
|
||||
|
||||
## Why It Broke
|
||||
|
||||
In `backtester/simulator.py` (line 118):
|
||||
```python
|
||||
if not quality_filter(signal):
|
||||
continue
|
||||
```
|
||||
|
||||
The code calls `quality_filter()` as a **function**. When we passed a dict, Python tried to call a dict object, causing `'dict' object is not callable`.
|
||||
|
||||
## How It Was Missed
|
||||
|
||||
- Coordinator and worker infrastructure all worked correctly
|
||||
- Data loaded successfully (34,273 rows)
|
||||
- Multiprocessing started without errors
|
||||
- Worker's exception handler caught the error and returned zeros
|
||||
- **Silent failure:** No crash, just invalid results
|
||||
- Files created looked successful (183KB)
|
||||
|
||||
## Verification Steps
|
||||
|
||||
1. ✅ Deployed fixed code to worker1
|
||||
2. ✅ Cleaned up invalid results and database
|
||||
3. ✅ Restarted coordinator with fixed worker
|
||||
4. ✅ Verified no `'dict' object is not callable` errors in logs
|
||||
5. ✅ Confirmed 24 Python processes running at 100% CPU
|
||||
6. ✅ Workers actively computing (no immediate errors for 2+ minutes)
|
||||
|
||||
## Lessons Learned
|
||||
|
||||
1. **Type matters:** Dict vs callable - subtle but critical difference
|
||||
2. **Silent failures are dangerous:** Exception handler hid the severity
|
||||
3. **Compare to working code:** `comprehensive_sweep.py` had correct pattern
|
||||
4. **Verify results quality:** All zeros = red flag, investigate immediately
|
||||
5. **Test fixes locally first:** Would have caught this earlier
|
||||
6. **Add validation:** Should detect all-zero results and abort
|
||||
|
||||
## Files Changed
|
||||
|
||||
- `cluster/distributed_worker.py` - Fixed quality_filter (dict → lambda)
|
||||
|
||||
## Commit
|
||||
|
||||
```bash
|
||||
git add cluster/distributed_worker.py cluster/CRITICAL_BUG_FIX_DEC1_2025.md
|
||||
git commit -m "critical: Fix distributed worker quality_filter - dict to lambda function
|
||||
|
||||
Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
|
||||
simulate_money_line() expects callable function.
|
||||
|
||||
Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.
|
||||
|
||||
Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
|
||||
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
|
||||
|
||||
Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
|
||||
"
|
||||
git push
|
||||
```
|
||||
|
||||
## Status
|
||||
|
||||
- ✅ Bug identified and fixed
|
||||
- ✅ Code deployed to worker1
|
||||
- ✅ Coordinator restarted
|
||||
- ✅ Workers actively processing (100% CPU, no errors)
|
||||
- ⏳ Awaiting completion of chunk 0 (2,000 configs, ~22 minutes estimated)
|
||||
- ⏳ Full sweep restart: 4,096 configs total
|
||||
|
||||
## Expected Timeline
|
||||
|
||||
- **Chunk 0:** ~22 minutes (2,000 configs)
|
||||
- **Chunk 1:** ~22 minutes (2,000 configs)
|
||||
- **Chunk 2:** ~1 minute (96 configs)
|
||||
- **Total:** ~45 minutes for complete sweep
|
||||
|
||||
## Next Steps
|
||||
|
||||
1. Monitor chunk 0 completion (~10 minutes remaining)
|
||||
2. Verify results have trades > 0 (not all zeros)
|
||||
3. Import successful results to database
|
||||
4. Analyze top performers
|
||||
5. Deploy to worker2 for parallel processing
|
||||
@@ -63,11 +63,13 @@ def test_config(args):
|
||||
max_bars_per_trade=max_bars,
|
||||
)
|
||||
|
||||
# Quality filter (matches comprehensive_sweep.py)
|
||||
quality_filter = {
|
||||
'min_adx': 15,
|
||||
'min_volume_ratio': vol_min,
|
||||
}
|
||||
# Quality filter (matches comprehensive_sweep.py signature)
|
||||
# CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict!
|
||||
# Bug was passing dict which caused "'dict' object is not callable" error
|
||||
if vol_min > 0:
|
||||
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
|
||||
else:
|
||||
quality_filter = None
|
||||
|
||||
# Run simulation
|
||||
try:
|
||||
|
||||
Reference in New Issue
Block a user