CRITICAL BUG FIX: Stop loss and take profit exits were sending duplicate
Telegram notifications with compounding P&L (16 duplicates, 796x inflation).
Real Incident (Dec 2, 2025):
- Manual SOL-PERP SHORT position stopped out
- 16 duplicate Telegram notifications received
- P&L compounding: $0.23 → $12.10 → $24.21 → $183.12 (796× multiplication)
- All showed identical: entry $139.64, hold 4h 5-6m, exit reason SL
- First notification: Ghost detected (handled correctly)
- Next 15 notifications: SL exit (all duplicates with compounding P&L)
Root Cause:
- Multiple monitoring loops detect SL condition simultaneously
- All call executeExit() before any can remove position from tracking
- Race condition: check closingInProgress → both true → both proceed
- Database update happens BEFORE activeTrades.delete()
- Each execution sends Telegram notification
- P&L values compound across notifications
Solution:
Applied same atomic delete pattern as ghost detection fix (commit 93dd950):
- Move activeTrades.delete() to START of executeExit() (before any async operations)
- Check wasInMap return value (only true for first caller, false for duplicates)
- Early return if already deleted (atomic deduplication guard)
- Only first loop proceeds to close, save DB, send notification
- Removed redundant removeTrade() call (already deleted at start)
Impact:
- Prevents duplicate notifications for SL, TP1, TP2, emergency stops
- Ensures accurate P&L reporting (no compounding)
- Database receives correct single exit record
- User receives ONE notification per exit (as intended)
Code Changes:
- Line ~1520: Added atomic delete guard for full closes (percentToClose >= 100)
- Line ~1651: Removed redundant removeTrade() call
- Both changes prevent race condition at function entry
Scope:
- ✅ Stop loss exits: Fixed
- ✅ Take profit 2 exits: Fixed
- ✅ Emergency stops: Fixed
- ✅ Trailing stops: Fixed
- ℹ️ Take profit 1: Not affected (partial close keeps position in monitoring)
Related:
- Ghost detection fix: commit 93dd950 (Dec 2, 2025) - same pattern, different function
- Manual trade enhancement: commit 23277b7 (Dec 2, 2025) - unrelated feature
- P&L compounding series: Common Pitfalls #48-49, #59-61, #67 in docs
PHASE 2 ENHANCED: Manual trades now wait for next 1-minute datapoint
instead of using cached/stale data. Guarantees fresh ATR (<60s old).
User requirement: 'when i send a telegram message to enter the market,
the bot will simply wait for the next 1 minute datapoint'
Implementation:
- Add wait_for_fresh_market_data() async helper function
- Polls market data cache every 5 seconds (max 60s)
- Detects fresh data by timestamp change
- Extracts real ATR/ADX/RSI from 1-minute TradingView data
- User sees waiting message + confirmation when fresh data arrives
- Falls back to preset ATR 0.43 on timeout (fail-safe)
Benefits:
- Adaptive targets match CURRENT volatility (not historical)
- No stale data risk (guaranteed <60s old)
- Better than Phase 2 v1 (5-minute tolerance)
- Consistent with automated trades (same 1-min data source)
User Experience:
1. User: /long sol
2. Bot: ⏳ Waiting for next 1-minute datapoint...
3. [Wait 15-45 seconds typically]
4. Bot: ✅ Fresh ATR: 0.4523 | ADX: 34.2 | RSI: 56.8
5. Bot: ✅ Position opened with adaptive targets
Changes:
- Add asyncio import for async sleep
- Add wait_for_fresh_market_data() before manual_trade_handler
- Replace Phase 2 v1 (5min tolerance) with polling logic
- Add 3 user messages (waiting, confirmation, timeout)
- Extract ATR/ADX/RSI from fresh data or fallback
Files:
- telegram_command_bot.py: +70 lines polling logic
Bug: 23 duplicate Telegram notifications with P&L compounding (-7.96 to -,129.24)
Cause: Multiple monitoring loops passed has() check before any deleted from Map
Fix: Use Map.delete() atomic return value as deduplication lock
Result: First caller deletes and proceeds, subsequent callers return immediately
Related: #48-49 (TP1 P&L compound), #59-61 (external closure duplicates)
Deployed: Dec 2, 2025 17:32:52 UTC (commit 93dd950)
Bug: Multiple monitoring loops detect ghost simultaneously
- Loop 1: has(tradeId) → true → proceeds
- Loop 2: has(tradeId) → true → ALSO proceeds (race condition)
- Both send Telegram notifications with compounding P&L
Real incident (Dec 2, 2025):
- Manual SHORT at $138.84
- 23 duplicate notifications
- P&L compounded: -$47.96 → -$1,129.24 (23× accumulation)
- Database shows single trade with final compounded value
Fix: Map.delete() returns true if key existed, false if already removed
- Call delete() FIRST
- Check return value
proceeds
- All other loops get false → skip immediately
- Atomic operation prevents race condition
Pattern: This is variant of Common Pitfalls #48, #49, #59, #60, #61
- All had "check then delete" pattern
- All vulnerable to async timing issues
- Solution: "delete then check" pattern
- Map.delete() is synchronous and atomic
Files changed:
- lib/trading/position-manager.ts lines 390-410
Related: DUPLICATE PREVENTED message was working but too late
CRITICAL UPDATES to AI assistant instructions:
1. MANDATORY GIT WORKFLOW (DO NOT SKIP):
- Added explicit requirement: implement → test → verify → document → commit → push
- Made git commits NON-OPTIONAL for all significant changes
- Added to both general prompt and copilot-instructions.md
- Rationale: Agent has pattern of skipping documentation/commits
2. CHALLENGE USER IDEAS:
- Added requirement to think critically about user requests
- Instruction: "Think freely and don't hold back"
- Goal: Find BEST solution, not just A solution
- Push back on ideas that don't make sense
- Ask "is there a simpler/faster/safer way?"
3. COMPREHENSIVE DOCUMENTATION SECTION:
- Replaced brief documentation note with full workflow guide
- Added 80+ lines of detailed documentation requirements
- Includes examples, red flags, mindset principles
- Emphasizes: "Git commit + Documentation = Complete work"
Files modified:
- .github/prompts/general prompt.prompt.md (added sections 5a, 6, updated 7-8)
- .github/copilot-instructions.md (comprehensive documentation workflow)
User mandate: "I am sick and tired of reminding you" - this makes it automatic.
Impact: Future implementations will ALWAYS include documentation and git commits as part of standard workflow, not as afterthoughts.
- Bug: Validation queue used TradingView symbol format (SOLUSDT) to lookup market data cache
- Cache uses normalized Drift format (SOL-PERP)
- Result: Cache lookup failed, wrong/stale price shown in Telegram abandonment notifications
- Real incident: Signal at $126.00 showed $98.18 abandonment price (-22.08% impossible drop)
- Fix: Added normalizeTradingViewSymbol() call in check-risk endpoint before passing to validation queue
- Files changed: app/api/trading/check-risk/route.ts (import + symbol normalization)
- Impact: Validation queue now correctly retrieves current price from market data cache
- Deployed: Dec 1, 2025
- CRITICAL FIX: Python output buffering caused silent failure
- Solution: python3 -u flag for unbuffered output
- 70% CPU optimization: int(cpu_count() * 0.7) = 22-24 cores per server
- Current state: 47 workers, load ~22 per server, 16.3 hour timeline
- System operational since Dec 1 22:50:32
- Expected completion: Dec 2 15:15
- ProxyJump (-J) doesn't work from Docker container
- Changed to nested SSH: hop -> target
- Proper command escaping for nested SSH
- Worker2 (srv-bd-host01) only accessible via worker1 (pve-nu-monitor01)
Root cause: Passing dict {'min_adx': 15, 'min_volume_ratio': vol_min} when
simulate_money_line() expects callable function.
Bug caused ALL 2,096 backtests to fail with 'dict' object is not callable.
Fix: Changed to lambda function matching comprehensive_sweep.py pattern:
quality_filter = lambda s: s.adx >= 15 and s.volume_ratio >= vol_min
Verified fix working: Workers running at 100% CPU, no errors after 2+ minutes.
**Comprehensive documentation including:**
- Root cause analysis for both bugs
- Manual test procedures that validated fixes
- Code changes with before/after comparisons
- Verification results (24 worker processes running)
- Lessons learned for future debugging
- Current cluster state and next steps
Files: cluster/SSH_TIMEOUT_FIX_COMPLETE.md (288 lines)
- Created /api/cluster/logs endpoint to read coordinator.log
- Added real-time log display in cluster UI (updates every 3s)
- Shows last 100 lines of coordinator.log in terminal-style display
- Includes manual refresh button
- Improves debugging experience - no need to SSH for logs
User feedback: 'why dont we add the output of the log at the bottom of the page so i know whats going on'
This addresses poor visibility into coordinator errors and failures.
Next step: Fix SSH timeout issue blocking worker execution.
USER MANDATE (Dec 1, 2025): Documentation MUST go hand-in-hand with EVERY git commit.
This is NOT optional. This is NOT a suggestion. This is MANDATORY.
Changes:
- Elevated documentation section to #1 PRIORITY status
- Added user's direct quote: 'this HAS to go hand in hand'
- Expanded from 15 lines to 100+ lines with comprehensive guidelines
- Added 'Why This is #1 Priority' section with user's frustration quote
- Added explicit 'When Documentation is MANDATORY' checklist
- Added 'The Correct Mindset' section emphasizing it's part of the work
- Added 4 scenario examples showing what MUST be documented
- Added 'Red Flags' section to catch missing documentation
- Added 'Integration with Existing Sections' guide
- Made it crystal clear: Code without documentation = INCOMPLETE WORK
This addresses user's repeated reminders about documentation being mandatory.
Future AI agents will now see this as the #1 priority it is.
NO MORE PUSHING CODE WITHOUT DOCUMENTATION UPDATES.
- Split QUALITY_LEVERAGE_THRESHOLD into separate LONG and SHORT variants
- Added /api/drift/account-health endpoint for real-time collateral data
- Updated settings UI to show separate controls for LONG/SHORT thresholds
- Position size calculations now use dynamic collateral from Drift account
- Updated .env and docker-compose.yml with new environment variables
- LONG threshold: 95, SHORT threshold: 90 (configurable independently)
Files changed:
- app/api/drift/account-health/route.ts (NEW) - Account health API endpoint
- app/settings/page.tsx - Added collateral state, separate threshold inputs
- app/api/settings/route.ts - GET/POST handlers for LONG/SHORT thresholds
- .env - Added QUALITY_LEVERAGE_THRESHOLD_LONG/SHORT variables
- docker-compose.yml - Added new env vars with fallback defaults
Impact:
- Users can now configure quality thresholds independently for LONG vs SHORT signals
- Position size display dynamically updates based on actual Drift account collateral
- More flexible risk management with direction-specific leverage tiers
Added 4 adaptive leverage environment variables to docker-compose.yml
so they are properly passed to the container:
- USE_ADAPTIVE_LEVERAGE (default: true)
- HIGH_QUALITY_LEVERAGE (default: 5)
- LOW_QUALITY_LEVERAGE (default: 1)
- QUALITY_LEVERAGE_THRESHOLD (default: 95)
Without these in the environment section, the container couldn't
access them via process.env, causing the settings API to return null.
Now the settings UI can properly load and save adaptive leverage
configuration via the web interface.
Complete implementation of adaptive leverage configuration via web interface:
Frontend (app/settings/page.tsx):
- Added 4 fields to TradingSettings interface:
* USE_ADAPTIVE_LEVERAGE: boolean
* HIGH_QUALITY_LEVERAGE: number
* LOW_QUALITY_LEVERAGE: number
* QUALITY_LEVERAGE_THRESHOLD: number
- Added complete Adaptive Leverage section with:
* Purple-themed informational box explaining quality-based leverage
* Toggle switch for enabling/disabling (🎯 Enable Adaptive Leverage)
* Number inputs for high leverage (1-20), low leverage (1-20), threshold (80-100)
* Visual tier display showing leverage multipliers and position sizes
* Dynamic calculation based on $560 free collateral
Backend (app/api/settings/route.ts):
- GET handler: Load 4 adaptive leverage fields from environment variables
- POST handler: Save 4 adaptive leverage fields to .env file
- Proper type conversion (boolean from 'true', numbers from parseInt/parseFloat)
Visual Tier Display Example:
Below Threshold: Blocked (no trade)
Changes enable users to adjust leverage settings via web UI instead of
manually editing .env file and restarting container.
Problem:
- Start button showed 'already running' when cluster wasn't actually running
- Database had stale chunks in 'running' state from crashed/killed coordinator
- Control endpoint checked process but not database state
Solution:
1. Reset stale 'running' chunks to 'pending' before starting coordinator
2. Verify coordinator not running before starting (prevent duplicates)
3. Add database cleanup to stop action as well (prevent future stale states)
4. Enhanced error reporting with coordinator log output
Changes:
- app/api/cluster/control/route.ts
- Added database cleanup in start action (reset running chunks)
- Added process check before start (prevent duplicates)
- Added database cleanup in stop action (cleanup orphaned state)
- Added coordinator log output on start failure
- Improved error messages and logging
Impact:
- Start button now works correctly even after unclean coordinator shutdown
- Prevents false 'already running' reports
- Automatic cleanup of stale database state
- Better error diagnostics
Verified:
- Container rebuilt and restarted successfully
- Cluster status shows 'idle' after database cleanup
- Ready for user to test start button functionality
- Created lib/trading/smart-validation-queue.ts (270 lines)
- Queue marginal quality signals (50-89) for validation
- Monitor 1-minute price action for 10 minutes
- Enter if +0.3% confirms direction (LONG up, SHORT down)
- Abandon if -0.4% invalidates direction
- Auto-execute via /api/trading/execute when confirmed
- Integrated into check-risk endpoint (queues blocked signals)
- Integrated into startup initialization (boots with container)
- Expected: Catch ~30% of blocked winners, filter ~70% of losers
- Estimated profit recovery: +$1,823/month
Files changed:
- lib/trading/smart-validation-queue.ts (NEW - 270 lines)
- app/api/trading/check-risk/route.ts (import + queue call)
- lib/startup/init-position-manager.ts (import + startup call)
User approval: 'sounds like we can not loose anymore with this system. go for it'
CRITICAL BUG FIXED (Nov 30, 2025):
Position Manager was setting tp1Hit=true based ONLY on size mismatch,
without verifying price actually reached TP1 target. This caused:
- Premature order cancellation (on-chain TP1 removed before fill)
- Lost profit potential (optimal exits missed)
- Ghost orders after container restarts
ROOT CAUSE (line 1086 in position-manager.ts):
trade.tp1Hit = true // Set without checking this.shouldTakeProfit1()
FIX IMPLEMENTED:
- Added price verification: this.shouldTakeProfit1(currentPrice, trade)
- Only set tp1Hit when BOTH conditions met:
1. Size reduced by 5%+ (positionSizeUSD < trade.currentSize * 0.95)
2. Price crossed TP1 target (this.shouldTakeProfit1 returns true)
- Verbose logging for debugging (shows price vs target, size ratio)
- Fallback: Update tracked size but don't trigger TP1 logic
REAL INCIDENT:
- Trade cmim4ggkr00canv07pgve2to9 (SHORT SOL-PERP Nov 30)
- TP1 target: $137.07, actual exit: $136.84
- False detection triggered premature order cancellation
- Position closed successfully but system integrity compromised
FILES CHANGED:
- lib/trading/position-manager.ts (lines 1082-1111)
- CRITICAL_TP1_FALSE_DETECTION_BUG.md (comprehensive incident report)
TESTING REQUIRED:
- Monitor next trade with TP1 for correct detection
- Verify logs show TP1 VERIFIED or TP1 price NOT reached
- Confirm no premature order cancellation
ALSO FIXED:
- Restarted telegram-trade-bot to fix /status command conflict
See: Common Pitfall #63 in copilot-instructions.md (to be added)
- Document database-first architecture pattern
- Include problem, root cause, and solution details
- Add verification methodology with before/after examples
- Document cluster control system (Start/Stop buttons)
- Include database schema and operational state
- Add lessons learned about infrastructure vs business logic
- Reference STATUS_DETECTION_FIX_COMPLETE.md for full details
- Current state: 2 workers active, processing 4000 combinations
- Changed default chunk_size from 10,000 to 2,000
- Fixes bug where coordinator exited immediately for 4,096 combo exploration
- Coordinator was calculating: chunk 1 starts at 10,000 > 4,096 total = 'all done'
- Now creates 2-3 appropriately-sized chunks for distribution
- Verified: Workers now start and process assigned chunks
- Status: ✅ Docker rebuilt and deployed to port 3001
- Removed v10 TradingView indicator (moneyline_v10_momentum_dots.pinescript)
- Removed v10 penalty system from signal-quality.ts (-30/-25 point penalties)
- Removed backtest result files (sweep_*.csv)
- Updated copilot-instructions.md to remove v10 references
- Simplified direction-specific quality thresholds (LONG 90+, SHORT 80+)
Rationale:
- 1,944 parameter combinations tested in backtest
- All top results IDENTICAL (568 trades, $498 P&L, 61.09% WR)
- Momentum parameters had ZERO impact on trade selection
- Profit factor 1.027 too low (barely profitable after fees)
- Max drawdown -$1,270 vs +$498 profit = terrible risk-reward
- v10 penalties were blocking good trades (bug: applied to wrong positions)
Keeping v9 as production system - simpler, proven, effective.
- Bug: Execute endpoint calculated quality but never validated it
- Three trades executed at quality 30/50/50 (threshold: 90/95)
- All three stopped out, confirming low quality = losing trades
- Root cause: TradingView sent incomplete data (metrics=0, old v5) + missing validation after timeframe check
- Fix: Added validation block lines 193-213 in execute/route.ts
- Returns HTTP 400 if quality < minQualityScore
- Deployed: Nov 27, 2025 23:16 UTC (commit cefa3e6)
- Lesson: Calculate ≠ Validate - minQualityScore must be enforced at ALL execution pathways
This documents the CRITICAL FIX from commit cefa3e6.
Per Nov 27 mandatory documentation rules, work is INCOMPLETE without copilot-instructions.md updates.
ROOT CAUSE:
- Execute endpoint calculated quality score but NEVER checked it
- After timeframe='5' validation, proceeded directly to execution
- TradingView sent signal with all metrics=0 (ADX, ATR, RSI, etc.)
- Quality scored as 30, but no threshold check existed
- Position opened with 909.77 size at quality 30 (need 90+ for LONG)
THE FIX:
- Added MANDATORY quality check after timeframe validation
- Blocks execution if score < minQualityScore (90 LONG, 95 SHORT)
- Returns HTTP 400 with detailed error message
- Logs Quality check passed OR ❌ QUALITY TOO LOW:
AFFECTED TRADES:
- cmihwkjmb0088m407lqd8mmbb: Quality 30 LONG (stopped out)
- cmih6ghn20002ql07zxfvna1l: Quality 50 LONG (stopped out)
- cmih5vrpu0001ql076mj3nm63: Quality 50 LONG (stopped out)
This is a FINANCIAL SAFETY critical fix - prevents low-quality trades.
CRITICAL: Added iron-clad rule that copilot-instructions.md MUST be updated
for every significant change. User is 'sick and tired' of reminding.
New mandatory section explains:
- When to update this file (8 specific scenarios)
- Why it's the primary knowledge base for future developers
- Automatic workflow: Change → Code → Test → Update Docs → Commit
1-Minute Data Collection documented:
- Direction field is meaningless (TradingView artifact)
- Analysis should ignore direction for timeframe='1'
- Focus on ADX/ATR/RSI/volume/price position metrics
- Example correct vs wrong SQL queries
This is NON-NEGOTIABLE going forward.