- Discovery: All TradingView alerts (5min/15min/1H/4H/Daily) attached to OLD v9 version
- Impact: 11,429 records from wrong indicator settings (confirmBars=0 vs current)
- Solution: Marked as DATA_COLLECTION_OLD_V9_VERSION to prevent analysis contamination
- Exception: 1-minute data (11,398) kept as DATA_COLLECTION_ONLY (unaffected)
- Fresh data from corrected alerts will use DATA_COLLECTION_ONLY going forward
- Old data preserved for historical reference, clearly marked
CRITICAL DATA BUG DISCOVERED (Dec 5, 2025):
Previous commits a67a338 and f65aae5 implemented optimizations based on
INCORRECT analysis of maxFavorableExcursion (MFE) data.
Problem: Old Trade records stored MFE in DOLLARS, not PERCENTAGES
- Appeared to show 20%+ average favorable movement
- Actually only 0.76% (long) and 1.20% (short) average movement
- 26× inflation of perceived performance due to unit mismatch
Incorrect Changes Reverted:
- ATR_MULTIPLIER_TP1: 1.5 → back to 2.0
- ATR_MULTIPLIER_TP2: 3.0 → back to 4.0
- ATR_MULTIPLIER_SL: 2.5 → back to 3.0
- TAKE_PROFIT_1_SIZE_PERCENT: 75 → back to 60
- LEVERAGE: 5 → back to 1
- Safety bounds restored to original values
- TRAILING_STOP_ATR_MULTIPLIER: back to 2.5
REAL FINDINGS (after data correction):
- TP1 orders ARE being placed (tp1OrderTx populated)
- TP1 prices NOT being reached (only 2/11 trades in sample)
- Recent trades (6 total): avg MFE 0.74%, only 2/6 reached TP1
- Problem is ENTRY QUALITY, not exit timing
- Quality 90+ signals barely move favorably before reversing
See Common Pitfall #54 - MFE data stored in mixed units
Need to filter by createdAt >= '2025-11-23' for accurate analysis
User requirement: Manual long/short commands via Telegram shall execute
immediately without quality checks.
Changes:
- Execute endpoint now checks for timeframe='manual' flag
- Added isManualTrade bypass alongside isValidatedEntry bypass
- Manual trades skip quality threshold validation completely
- Logs show 'MANUAL TRADE BYPASS' for transparency
Impact: Telegram commands (long sol, short eth) now execute instantly
without being blocked by low quality scores.
Commit: Dec 4, 2025
- Worker2 (bd-host01) now only runs 19:00-06:00 due to noise
- Added is_worker_allowed_to_run() function for time-based control
- Worker1 continues 24/7 operation
- Reset stuck chunk 14 that was blocking progress since Dec 2
- Create moneyline_1min_price_feed.pinescript (70% smaller payload)
- Remove ATR/ADX/RSI/VOL/POS from 1-minute alerts (not used for decisions)
- Keep only price + symbol + timeframe for market data cache
- Document rationale in docs/1MIN_SIMPLIFIED_FEED.md
- Fix: 5-minute trading signals being dropped due to 1-minute flood (60/hour)
- Impact: Preserve priority for actual trading signals
Bug: Multiple monitoring loops detect ghost simultaneously
- Loop 1: has(tradeId) → true → proceeds
- Loop 2: has(tradeId) → true → ALSO proceeds (race condition)
- Both send Telegram notifications with compounding P&L
Real incident (Dec 2, 2025):
- Manual SHORT at $138.84
- 23 duplicate notifications
- P&L compounded: -$47.96 → -$1,129.24 (23× accumulation)
- Database shows single trade with final compounded value
Fix: Map.delete() returns true if key existed, false if already removed
- Call delete() FIRST
- Check return value
proceeds
- All other loops get false → skip immediately
- Atomic operation prevents race condition
Pattern: This is variant of Common Pitfalls #48, #49, #59, #60, #61
- All had "check then delete" pattern
- All vulnerable to async timing issues
- Solution: "delete then check" pattern
- Map.delete() is synchronous and atomic
Files changed:
- lib/trading/position-manager.ts lines 390-410
Related: DUPLICATE PREVENTED message was working but too late
- Bug: Validation queue used TradingView symbol format (SOLUSDT) to lookup market data cache
- Cache uses normalized Drift format (SOL-PERP)
- Result: Cache lookup failed, wrong/stale price shown in Telegram abandonment notifications
- Real incident: Signal at $126.00 showed $98.18 abandonment price (-22.08% impossible drop)
- Fix: Added normalizeTradingViewSymbol() call in check-risk endpoint before passing to validation queue
- Files changed: app/api/trading/check-risk/route.ts (import + symbol normalization)
- Impact: Validation queue now correctly retrieves current price from market data cache
- Deployed: Dec 1, 2025
- CRITICAL FIX: Python output buffering caused silent failure
- Solution: python3 -u flag for unbuffered output
- 70% CPU optimization: int(cpu_count() * 0.7) = 22-24 cores per server
- Current state: 47 workers, load ~22 per server, 16.3 hour timeline
- System operational since Dec 1 22:50:32
- Expected completion: Dec 2 15:15
- ProxyJump (-J) doesn't work from Docker container
- Changed to nested SSH: hop -> target
- Proper command escaping for nested SSH
- Worker2 (srv-bd-host01) only accessible via worker1 (pve-nu-monitor01)
Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
simulate_money_line() expects callable function.
Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.
Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
**Comprehensive documentation including:**
- Root cause analysis for both bugs
- Manual test procedures that validated fixes
- Code changes with before/after comparisons
- Verification results (24 worker processes running)
- Lessons learned for future debugging
- Current cluster state and next steps
Files: cluster/SSH_TIMEOUT_FIX_COMPLETE.md (288 lines)
- Created /api/cluster/logs endpoint to read coordinator.log
- Added real-time log display in cluster UI (updates every 3s)
- Shows last 100 lines of coordinator.log in terminal-style display
- Includes manual refresh button
- Improves debugging experience - no need to SSH for logs
User feedback: 'why dont we add the output of the log at the bottom of the page so i know whats going on'
This addresses poor visibility into coordinator errors and failures.
Next step: Fix SSH timeout issue blocking worker execution.
- Split QUALITY_LEVERAGE_THRESHOLD into separate LONG and SHORT variants
- Added /api/drift/account-health endpoint for real-time collateral data
- Updated settings UI to show separate controls for LONG/SHORT thresholds
- Position size calculations now use dynamic collateral from Drift account
- Updated .env and docker-compose.yml with new environment variables
- LONG threshold: 95, SHORT threshold: 90 (configurable independently)
Files changed:
- app/api/drift/account-health/route.ts (NEW) - Account health API endpoint
- app/settings/page.tsx - Added collateral state, separate threshold inputs
- app/api/settings/route.ts - GET/POST handlers for LONG/SHORT thresholds
- .env - Added QUALITY_LEVERAGE_THRESHOLD_LONG/SHORT variables
- docker-compose.yml - Added new env vars with fallback defaults
Impact:
- Users can now configure quality thresholds independently for LONG vs SHORT signals
- Position size display dynamically updates based on actual Drift account collateral
- More flexible risk management with direction-specific leverage tiers
- Changed default chunk_size from 10,000 to 2,000
- Fixes bug where coordinator exited immediately for 4,096 combo exploration
- Coordinator was calculating: chunk 1 starts at 10,000 > 4,096 total = 'all done'
- Now creates 2-3 appropriately-sized chunks for distribution
- Verified: Workers now start and process assigned chunks
- Status: ✅ Docker rebuilt and deployed to port 3001