Files
trading_bot_v4/docs/bugs/README.md
mindesbunister dc674ec6d5 docs: Add 1-minute simplified price feed to reduce TradingView alert queue pressure
- Create moneyline_1min_price_feed.pinescript (70% smaller payload)
- Remove ATR/ADX/RSI/VOL/POS from 1-minute alerts (not used for decisions)
- Keep only price + symbol + timeframe for market data cache
- Document rationale in docs/1MIN_SIMPLIFIED_FEED.md
- Fix: 5-minute trading signals being dropped due to 1-minute flood (60/hour)
- Impact: Preserve priority for actual trading signals
2025-12-04 11:19:04 +01:00

7.3 KiB
Raw Blame History

Bug Reports & Critical Fixes

Historical record of critical incidents, bugs, and their resolutions.

This directory contains CRITICAL_.md and FIXES_.md bug reports. Every file documents a real incident that cost money, time, or could have caused financial loss if not caught.


🚨 Critical Bug Reports

Position Management

  • CRITICAL_FIX_POSITION_SIZE_BUG.md - Smart Entry using webhook percentage as signal price

    • Impact: $89 position sizes instead of $2,300, 97% pullback calculations
    • Root Cause: TradingView webhook sent percentage (70.80) not price ($142.50)
    • Fix: Use Pyth oracle price instead of webhook signal.price
    • Date: Dec 3, 2025
  • CRITICAL_INCIDENT_UNPROTECTED_POSITION.md - Database-first pattern violation

    • Impact: Position opened with NO database record, NO Position Manager tracking
    • Root Cause: Position Manager added before database save
    • Fix: Database save FIRST, then Position Manager add
    • Date: Nov 13, 2025
  • CRITICAL_TP1_FALSE_DETECTION_BUG.md - TP1 detection fails when on-chain orders fill fast

    • Impact: Winners marked as "SL" exits, analytics incorrect
    • Root Cause: External closure detected after both TP1 + runner closed
    • Fix: Percentage-based exit reason inference
    • Date: Nov 19, 2025

Trade Execution

  • CRITICAL_MISSING_TRADES_NOV19.md - Trades executed but not in database

    • Impact: 3 trades missing from database, P&L values inflated 5-14×
    • Root Cause: P&L compounding in external closure detection
    • Fix: Don't mutate trade.realizedPnL during calculation
    • Date: Nov 19, 2025
  • CRITICAL_ISSUES_FOUND.md - Multiple critical discovery incidents

    • Container restart killing positions
    • Phantom detection killing runners
    • Field name mismatches in startup validation
    • Dates: Various Nov 2025

🔧 Fix Documentation

Runner System

  • FIXES_RUNNER_AND_CANCELLATION.md - Runner stop loss gap fixes
    • Problem: No SL protection between TP1 and TP2
    • Solution: Explicit runner SL check in monitoring loop
    • Impact: Prevented unlimited risk exposure on 25-40% runner
    • Date: Nov 15, 2025

Applied Fixes

  • FIXES_APPLIED.md - Collection of multiple bug fixes
    • TP1 detection
    • P&L calculation
    • External closure handling
    • Order cancellation
    • Period: Nov 2025

📋 Common Bug Patterns

1. P&L Compounding (Multiple Incidents)

Pattern: Monitoring loop detects closure multiple times, accumulates P&L Symptoms: Database shows 5-20× actual P&L, duplicate Telegram notifications Root Cause: Async operations + monitoring loop = race condition Solution: Delete from Map IMMEDIATELY (atomic operation) before any async work

Example Fixes:

  • Common Pitfall #49 (Nov 17, 2025)
  • Common Pitfall #61 (Nov 22, 2025)
  • Common Pitfall #67 (Dec 2, 2025)

2. Database-First Violations

Pattern: In-memory state updated before database write Symptoms: Container restart loses tracking, positions unprotected Root Cause: Developer assumes database always succeeds Solution: Always database write FIRST, then update in-memory state

Example Fixes:

  • Common Pitfall #29 (Nov 13, 2025)
  • CRITICAL_INCIDENT_UNPROTECTED_POSITION.md

3. External Closure Detection

Pattern: Position closed on-chain, Position Manager detects late Symptoms: Duplicate updates, wrong exit reasons, ghost positions Root Cause: Drift state propagation delay (5-10 seconds) Solution: Verification wait + closingInProgress flag + ghost detection

Example Fixes:

  • Common Pitfall #47 (Nov 16, 2025) - Close verification gap
  • Common Pitfall #56 (Nov 20, 2025) - Ghost orders
  • Common Pitfall #57 (Nov 20, 2025) - P&L accuracy

4. Unit Conversion Errors

Pattern: Tokens vs USD, percentage vs decimal, on-chain vs display units Symptoms: Position sizes off by 100×+, TP/SL at wrong prices Root Cause: SDK returns different units than expected Solution: Always log raw values, verify units explicitly

Example Fixes:

  • Common Pitfall #22 (Nov 12, 2025) - position.size is TOKENS not USD
  • Common Pitfall #68 (Dec 3, 2025) - Signal price is percentage not USD

🔍 Debugging Checklist

When investigating bugs:

  1. Check logs first:
docker logs -f trading-bot-v4 | grep -E "CRITICAL|ERROR|Failed"
  1. Verify deployment:
# Container start time MUST be > commit timestamp
docker inspect trading-bot-v4 --format='{{.State.StartedAt}}'
git log -1 --format='%ai %s'
  1. Query database state:
-- Check for inconsistencies
SELECT * FROM "Trade" 
WHERE exitReason IS NULL 
  AND (createdAt < NOW() - INTERVAL '1 hour')
ORDER BY createdAt DESC;
  1. Compare to Drift:
# Get actual position from Drift
curl -X GET http://localhost:3001/api/trading/positions \
  -H "Authorization: Bearer $API_SECRET_KEY"
  1. Search Common Pitfalls:
grep -n "symptom_keyword" .github/copilot-instructions.md | head -20

📝 Creating Bug Reports

Required Sections:

# [Bug Title] (CRITICAL - Fixed [Date])

**Symptom:** [What user observed]

**Root Cause:** [Technical explanation]

**Real Incident ([Date]):**
* [Specific trade or event]
* [Expected behavior]
* [Actual behavior]
* [Financial impact]

**Impact:** [Scope of problem]

**Fix ([Date]):**
```typescript
// Code showing the fix

Files changed: [List of files]

Git commit: [Commit hash]

Deployed: [Deployment timestamp]

Verification Required:

  • [How to test fix works]
  • [Expected logs/behavior]

Lessons Learned:

  1. [Key insight #1]
  2. [Key insight #2]

**Naming Convention:**
- `CRITICAL_[TOPIC]_BUG.md` - Single critical bug
- `CRITICAL_[TOPIC]_[DATE].md` - Dated incident report
- `FIXES_[SYSTEM].md` - Multiple related fixes

---

## ⚠️ Prevention Guidelines

**From Past Incidents:**

1. **Always verify deployment before declaring "fixed"**
   - Container restart timestamp > commit timestamp
   - Test with actual trade if possible
   - Check logs for expected behavior change

2. **Never trust SDK data formats**
   - Log raw values first
   - Verify units explicitly (tokens vs USD, % vs decimal)
   - Check SDK docs are often wrong

3. **Database writes before in-memory updates**
   - Save to database FIRST
   - Only then update Position Manager, caches, etc.
   - If DB fails, don't proceed

4. **Async operations need serialization**
   - Delete from Map IMMEDIATELY (atomic lock)
   - Don't check Map.has() then delete later (race condition)
   - Use Map.delete() return value as lock

5. **External closures need verification**
   - Wait 5-10 seconds for Drift state propagation
   - Query Drift to confirm position actually closed
   - Keep monitoring if verification fails

---

## 📞 Escalation

**When to declare "CRITICAL":**
- Financial loss occurred or was narrowly avoided
- System produced incorrect data (P&L, positions, exits)
- Position left unprotected (no monitoring, no TP/SL)
- Bug could recur and cause future losses

**Response Required:**
1. Stop trading immediately if data integrity affected
2. Document incident in this directory
3. Add to Common Pitfalls in copilot-instructions.md
4. Fix + deploy + verify within 24 hours
5. Update relevant architecture docs

---

See `../README.md` for overall documentation structure.