critical: Fix distributed worker quality_filter - dict to lambda function
Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
simulate_money_line() expects callable function.
Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.
Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
This commit is contained in:
144
cluster/CRITICAL_BUG_FIX_DEC1_2025.md
Normal file
144
cluster/CRITICAL_BUG_FIX_DEC1_2025.md
Normal file
@@ -0,0 +1,144 @@
|
|||||||
|
# CRITICAL BUG FIX - Distributed Worker Quality Filter (Dec 1, 2025)
|
||||||
|
|
||||||
|
## 🔥 Critical Bug Discovered
|
||||||
|
|
||||||
|
**Date:** December 1, 2025, 14:40 UTC
|
||||||
|
**Impact:** ALL 2,096 backtests failed with `'dict' object is not callable` error
|
||||||
|
**Severity:** CRITICAL - Blocked all distributed work
|
||||||
|
|
||||||
|
## Symptom
|
||||||
|
|
||||||
|
All parameter combinations tested returned 0 trades:
|
||||||
|
- Chunk 0: 2,000 configs, all with `trades=0`
|
||||||
|
- Chunk 2: 96 configs, all with `trades=0`
|
||||||
|
- Worker logs showed: `Error testing config X: 'dict' object is not callable` (repeated 2,096 times)
|
||||||
|
|
||||||
|
## Root Cause
|
||||||
|
|
||||||
|
**File:** `cluster/distributed_worker.py`
|
||||||
|
**Lines:** 67-70
|
||||||
|
|
||||||
|
**BROKEN CODE:**
|
||||||
|
```python
|
||||||
|
# Quality filter (matches comprehensive_sweep.py)
|
||||||
|
quality_filter = {
|
||||||
|
'min_adx': 15,
|
||||||
|
'min_volume_ratio': vol_min,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem:** Passing a `dict` object when `simulate_money_line()` expects a **callable function**.
|
||||||
|
|
||||||
|
## Investigation Timeline
|
||||||
|
|
||||||
|
1. **14:35** - User reported "something finished"
|
||||||
|
2. **14:40** - Discovered all 2,096 results had 0 trades
|
||||||
|
3. **14:45** - Found error in worker logs: `'dict' object is not callable`
|
||||||
|
4. **14:50** - Compared to `comprehensive_sweep.py` (working version)
|
||||||
|
5. **14:52** - **ROOT CAUSE IDENTIFIED**: dict vs lambda function
|
||||||
|
6. **14:55** - Fix applied and deployed
|
||||||
|
7. **15:00** - Fix verified working (workers at 100% CPU, no errors)
|
||||||
|
|
||||||
|
## The Fix
|
||||||
|
|
||||||
|
**BEFORE (BROKEN):**
|
||||||
|
```python
|
||||||
|
quality_filter = {
|
||||||
|
'min_adx': 15,
|
||||||
|
'min_volume_ratio': vol_min,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**AFTER (FIXED):**
|
||||||
|
```python
|
||||||
|
# CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict!
|
||||||
|
# Bug was passing dict which caused "'dict' object is not callable" error
|
||||||
|
if vol_min > 0:
|
||||||
|
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
|
||||||
|
else:
|
||||||
|
quality_filter = None
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why It Broke
|
||||||
|
|
||||||
|
In `backtester/simulator.py` (line 118):
|
||||||
|
```python
|
||||||
|
if not quality_filter(signal):
|
||||||
|
continue
|
||||||
|
```
|
||||||
|
|
||||||
|
The code calls `quality_filter()` as a **function**. When we passed a dict, Python tried to call a dict object, causing `'dict' object is not callable`.
|
||||||
|
|
||||||
|
## How It Was Missed
|
||||||
|
|
||||||
|
- Coordinator and worker infrastructure all worked correctly
|
||||||
|
- Data loaded successfully (34,273 rows)
|
||||||
|
- Multiprocessing started without errors
|
||||||
|
- Worker's exception handler caught the error and returned zeros
|
||||||
|
- **Silent failure:** No crash, just invalid results
|
||||||
|
- Files created looked successful (183KB)
|
||||||
|
|
||||||
|
## Verification Steps
|
||||||
|
|
||||||
|
1. ✅ Deployed fixed code to worker1
|
||||||
|
2. ✅ Cleaned up invalid results and database
|
||||||
|
3. ✅ Restarted coordinator with fixed worker
|
||||||
|
4. ✅ Verified no `'dict' object is not callable` errors in logs
|
||||||
|
5. ✅ Confirmed 24 Python processes running at 100% CPU
|
||||||
|
6. ✅ Workers actively computing (no immediate errors for 2+ minutes)
|
||||||
|
|
||||||
|
## Lessons Learned
|
||||||
|
|
||||||
|
1. **Type matters:** Dict vs callable - subtle but critical difference
|
||||||
|
2. **Silent failures are dangerous:** Exception handler hid the severity
|
||||||
|
3. **Compare to working code:** `comprehensive_sweep.py` had correct pattern
|
||||||
|
4. **Verify results quality:** All zeros = red flag, investigate immediately
|
||||||
|
5. **Test fixes locally first:** Would have caught this earlier
|
||||||
|
6. **Add validation:** Should detect all-zero results and abort
|
||||||
|
|
||||||
|
## Files Changed
|
||||||
|
|
||||||
|
- `cluster/distributed_worker.py` - Fixed quality_filter (dict → lambda)
|
||||||
|
|
||||||
|
## Commit
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add cluster/distributed_worker.py cluster/CRITICAL_BUG_FIX_DEC1_2025.md
|
||||||
|
git commit -m "critical: Fix distributed worker quality_filter - dict to lambda function
|
||||||
|
|
||||||
|
Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
|
||||||
|
simulate_money_line() expects callable function.
|
||||||
|
|
||||||
|
Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.
|
||||||
|
|
||||||
|
Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
|
||||||
|
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
|
||||||
|
|
||||||
|
Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
|
||||||
|
"
|
||||||
|
git push
|
||||||
|
```
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
- ✅ Bug identified and fixed
|
||||||
|
- ✅ Code deployed to worker1
|
||||||
|
- ✅ Coordinator restarted
|
||||||
|
- ✅ Workers actively processing (100% CPU, no errors)
|
||||||
|
- ⏳ Awaiting completion of chunk 0 (2,000 configs, ~22 minutes estimated)
|
||||||
|
- ⏳ Full sweep restart: 4,096 configs total
|
||||||
|
|
||||||
|
## Expected Timeline
|
||||||
|
|
||||||
|
- **Chunk 0:** ~22 minutes (2,000 configs)
|
||||||
|
- **Chunk 1:** ~22 minutes (2,000 configs)
|
||||||
|
- **Chunk 2:** ~1 minute (96 configs)
|
||||||
|
- **Total:** ~45 minutes for complete sweep
|
||||||
|
|
||||||
|
## Next Steps
|
||||||
|
|
||||||
|
1. Monitor chunk 0 completion (~10 minutes remaining)
|
||||||
|
2. Verify results have trades > 0 (not all zeros)
|
||||||
|
3. Import successful results to database
|
||||||
|
4. Analyze top performers
|
||||||
|
5. Deploy to worker2 for parallel processing
|
||||||
@@ -63,11 +63,13 @@ def test_config(args):
|
|||||||
max_bars_per_trade=max_bars,
|
max_bars_per_trade=max_bars,
|
||||||
)
|
)
|
||||||
|
|
||||||
# Quality filter (matches comprehensive_sweep.py)
|
# Quality filter (matches comprehensive_sweep.py signature)
|
||||||
quality_filter = {
|
# CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict!
|
||||||
'min_adx': 15,
|
# Bug was passing dict which caused "'dict' object is not callable" error
|
||||||
'min_volume_ratio': vol_min,
|
if vol_min > 0:
|
||||||
}
|
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
|
||||||
|
else:
|
||||||
|
quality_filter = None
|
||||||
|
|
||||||
# Run simulation
|
# Run simulation
|
||||||
try:
|
try:
|
||||||
|
|||||||
Reference in New Issue
Block a user