diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 60c5ff5..607b43f 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -4807,6 +4807,38 @@ trade.realizedPnL += actualRealizedPnL // NOT: result.realizedPnL from SDK * Monitor coordinator logs for timeout patterns, increase if needed * Consider SSH multiplexing (ControlMaster) to speed up nested hops +65. **Distributed Worker Quality Filter - Dict vs Callable (CRITICAL - Fixed Dec 1, 2025):** + - **Symptom:** ALL 2,096 distributed backtests returned 0 trades (expected 500-600 each) + - **Error Message:** `Error testing config X: 'dict' object is not callable` repeated 2,096 times in worker logs + - **Root Cause:** `cluster/distributed_worker.py` lines 67-70 passed dict `{'min_adx': 15, 'min_volume_ratio': vol_min}` instead of lambda function to `simulate_money_line()` + - **Technical Detail:** `backtester/simulator.py:118` calls `quality_filter(signal)` - expects CALLABLE function, not dict configuration + - **Silent Failure Pattern:** Worker's try/except caught exception and returned default values (0 trades, 0 P&L) without crashing + - **Discovery:** Found only when analyzing result content - CSVs looked valid but all trades=0 + - **Fix (Commit 11a0ea3):** + ```python + # BEFORE (BROKEN): + quality_filter = {'min_adx': 15, 'min_volume_ratio': vol_min} + + # AFTER (FIXED): + if vol_min > 0: + quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min + else: + quality_filter = None + ``` + - **Pattern Source:** Matches working `comprehensive_sweep.py:106` implementation + - **Why It Broke:** Developer treated quality_filter as configuration dict, but it's actually a callback function called on every signal + - **Verification:** Workers at 100% CPU, no errors after 2+ minutes, actively computing valid backtests + - **Lessons Learned:** + 1. **Type mismatches (dict vs callable) can cause catastrophic silent failures** - Python's dynamic typing hides the error until runtime + 2. **Always validate result quality** - All zeros = red flag requiring immediate investigation + 3. **Compare to working code when debugging** - comprehensive_sweep.py had correct pattern + 4. **Silent failures more dangerous than crashes** - Exception handler hid severity by returning zeros instead of crashing + 5. **Test single case before running full sweep** - Would have caught error in 30 seconds vs 22 minutes + 6. **Add result assertions** - System should detect all-zero results and abort early + - **Documentation:** `cluster/CRITICAL_BUG_FIX_DEC1_2025.md` + - **Impact:** Blocked 45 minutes of distributed work, wasted cluster compute, caught before Stage 1 analysis + - **Files changed:** `cluster/distributed_worker.py` lines 67-77 + ## File Conventions - **API routes:** `app/api/[feature]/[action]/route.ts` (Next.js 15 App Router)