# CRITICAL BUG FIX - Distributed Worker Quality Filter (Dec 1, 2025)

## 🔥 Critical Bug Discovered

**Date:** December 1, 2025, 14:40 UTC
**Impact:** ALL 2,096 backtests failed with `'dict' object is not callable` error
**Severity:** CRITICAL - Blocked all distributed work

## Symptom

All parameter combinations tested returned 0 trades:
- Chunk 0: 2,000 configs, all with `trades=0`
- Chunk 2: 96 configs, all with `trades=0`
- Worker logs showed: `Error testing config X: 'dict' object is not callable` (repeated 2,096 times)

## Root Cause

**File:** `cluster/distributed_worker.py`
**Lines:** 67-70

**BROKEN CODE:**
```python
# Quality filter (matches comprehensive_sweep.py)
quality_filter = {
    'min_adx': 15,
    'min_volume_ratio': vol_min,
}
```

**Problem:** Passing a `dict` object when `simulate_money_line()` expects a **callable function**.

## Investigation Timeline

1. **14:35** - User reported "something finished"
2. **14:40** - Discovered all 2,096 results had 0 trades
3. **14:45** - Found error in worker logs: `'dict' object is not callable`
4. **14:50** - Compared to `comprehensive_sweep.py` (working version)
5. **14:52** - **ROOT CAUSE IDENTIFIED**: dict vs lambda function
6. **14:55** - Fix applied and deployed
7. **15:00** - Fix verified working (workers at 100% CPU, no errors)

## The Fix

**BEFORE (BROKEN):**
```python
quality_filter = {
    'min_adx': 15,
    'min_volume_ratio': vol_min,
}
```

**AFTER (FIXED):**
```python
# CRITICAL FIX (Dec 1, 2025): Must be lambda function, not dict!
# Bug was passing dict which caused "'dict' object is not callable" error
if vol_min > 0:
    quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
else:
    quality_filter = None
```

## Why It Broke

In `backtester/simulator.py` (line 118):
```python
if not quality_filter(signal):
    continue
```

The code calls `quality_filter()` as a **function**. When we passed a dict, Python tried to call a dict object, causing `'dict' object is not callable`.

## How It Was Missed

- Coordinator and worker infrastructure all worked correctly
- Data loaded successfully (34,273 rows)
- Multiprocessing started without errors
- Worker's exception handler caught the error and returned zeros
- **Silent failure:** No crash, just invalid results
- Files created looked successful (183KB)

## Verification Steps

1. ✅ Deployed fixed code to worker1
2. ✅ Cleaned up invalid results and database
3. ✅ Restarted coordinator with fixed worker
4. ✅ Verified no `'dict' object is not callable` errors in logs
5. ✅ Confirmed 24 Python processes running at 100% CPU
6. ✅ Workers actively computing (no immediate errors for 2+ minutes)

## Lessons Learned

1. **Type matters:** Dict vs callable - subtle but critical difference
2. **Silent failures are dangerous:** Exception handler hid the severity
3. **Compare to working code:** `comprehensive_sweep.py` had correct pattern
4. **Verify results quality:** All zeros = red flag, investigate immediately
5. **Test fixes locally first:** Would have caught this earlier
6. **Add validation:** Should detect all-zero results and abort

## Files Changed

- `cluster/distributed_worker.py` - Fixed quality_filter (dict → lambda)

## Commit

```bash
git add cluster/distributed_worker.py cluster/CRITICAL_BUG_FIX_DEC1_2025.md
git commit -m "critical: Fix distributed worker quality_filter - dict to lambda function

Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
simulate_money_line() expects callable function.

Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.

Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
  quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min

Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
"
git push
```

## Status

- ✅ Bug identified and fixed
- ✅ Code deployed to worker1
- ✅ Coordinator restarted
- ✅ Workers actively processing (100% CPU, no errors)
- ⏳ Awaiting completion of chunk 0 (2,000 configs, ~22 minutes estimated)
- ⏳ Full sweep restart: 4,096 configs total

## Expected Timeline

- **Chunk 0:** ~22 minutes (2,000 configs)
- **Chunk 1:** ~22 minutes (2,000 configs) 
- **Chunk 2:** ~1 minute (96 configs)
- **Total:** ~45 minutes for complete sweep

## Next Steps

1. Monitor chunk 0 completion (~10 minutes remaining)
2. Verify results have trades > 0 (not all zeros)
3. Import successful results to database
4. Analyze top performers
5. Deploy to worker2 for parallel processing