Files
trading_bot_v4/docs/bugs/README.md
mindesbunister dc674ec6d5 docs: Add 1-minute simplified price feed to reduce TradingView alert queue pressure
- Create moneyline_1min_price_feed.pinescript (70% smaller payload)
- Remove ATR/ADX/RSI/VOL/POS from 1-minute alerts (not used for decisions)
- Keep only price + symbol + timeframe for market data cache
- Document rationale in docs/1MIN_SIMPLIFIED_FEED.md
- Fix: 5-minute trading signals being dropped due to 1-minute flood (60/hour)
- Impact: Preserve priority for actual trading signals
2025-12-04 11:19:04 +01:00

243 lines
7.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Bug Reports & Critical Fixes
**Historical record of critical incidents, bugs, and their resolutions.**
This directory contains CRITICAL_*.md and FIXES_*.md bug reports. Every file documents a real incident that cost money, time, or could have caused financial loss if not caught.
---
## 🚨 Critical Bug Reports
### **Position Management**
- `CRITICAL_FIX_POSITION_SIZE_BUG.md` - Smart Entry using webhook percentage as signal price
- **Impact:** $89 position sizes instead of $2,300, 97% pullback calculations
- **Root Cause:** TradingView webhook sent percentage (70.80) not price ($142.50)
- **Fix:** Use Pyth oracle price instead of webhook signal.price
- **Date:** Dec 3, 2025
- `CRITICAL_INCIDENT_UNPROTECTED_POSITION.md` - Database-first pattern violation
- **Impact:** Position opened with NO database record, NO Position Manager tracking
- **Root Cause:** Position Manager added before database save
- **Fix:** Database save FIRST, then Position Manager add
- **Date:** Nov 13, 2025
- `CRITICAL_TP1_FALSE_DETECTION_BUG.md` - TP1 detection fails when on-chain orders fill fast
- **Impact:** Winners marked as "SL" exits, analytics incorrect
- **Root Cause:** External closure detected after both TP1 + runner closed
- **Fix:** Percentage-based exit reason inference
- **Date:** Nov 19, 2025
### **Trade Execution**
- `CRITICAL_MISSING_TRADES_NOV19.md` - Trades executed but not in database
- **Impact:** 3 trades missing from database, P&L values inflated 5-14×
- **Root Cause:** P&L compounding in external closure detection
- **Fix:** Don't mutate trade.realizedPnL during calculation
- **Date:** Nov 19, 2025
- `CRITICAL_ISSUES_FOUND.md` - Multiple critical discovery incidents
- Container restart killing positions
- Phantom detection killing runners
- Field name mismatches in startup validation
- **Dates:** Various Nov 2025
---
## 🔧 Fix Documentation
### **Runner System**
- `FIXES_RUNNER_AND_CANCELLATION.md` - Runner stop loss gap fixes
- **Problem:** No SL protection between TP1 and TP2
- **Solution:** Explicit runner SL check in monitoring loop
- **Impact:** Prevented unlimited risk exposure on 25-40% runner
- **Date:** Nov 15, 2025
### **Applied Fixes**
- `FIXES_APPLIED.md` - Collection of multiple bug fixes
- TP1 detection
- P&L calculation
- External closure handling
- Order cancellation
- **Period:** Nov 2025
---
## 📋 Common Bug Patterns
### **1. P&L Compounding (Multiple Incidents)**
**Pattern:** Monitoring loop detects closure multiple times, accumulates P&L
**Symptoms:** Database shows 5-20× actual P&L, duplicate Telegram notifications
**Root Cause:** Async operations + monitoring loop = race condition
**Solution:** Delete from Map IMMEDIATELY (atomic operation) before any async work
**Example Fixes:**
- Common Pitfall #49 (Nov 17, 2025)
- Common Pitfall #61 (Nov 22, 2025)
- Common Pitfall #67 (Dec 2, 2025)
### **2. Database-First Violations**
**Pattern:** In-memory state updated before database write
**Symptoms:** Container restart loses tracking, positions unprotected
**Root Cause:** Developer assumes database always succeeds
**Solution:** Always database write FIRST, then update in-memory state
**Example Fixes:**
- Common Pitfall #29 (Nov 13, 2025)
- CRITICAL_INCIDENT_UNPROTECTED_POSITION.md
### **3. External Closure Detection**
**Pattern:** Position closed on-chain, Position Manager detects late
**Symptoms:** Duplicate updates, wrong exit reasons, ghost positions
**Root Cause:** Drift state propagation delay (5-10 seconds)
**Solution:** Verification wait + closingInProgress flag + ghost detection
**Example Fixes:**
- Common Pitfall #47 (Nov 16, 2025) - Close verification gap
- Common Pitfall #56 (Nov 20, 2025) - Ghost orders
- Common Pitfall #57 (Nov 20, 2025) - P&L accuracy
### **4. Unit Conversion Errors**
**Pattern:** Tokens vs USD, percentage vs decimal, on-chain vs display units
**Symptoms:** Position sizes off by 100×+, TP/SL at wrong prices
**Root Cause:** SDK returns different units than expected
**Solution:** Always log raw values, verify units explicitly
**Example Fixes:**
- Common Pitfall #22 (Nov 12, 2025) - position.size is TOKENS not USD
- Common Pitfall #68 (Dec 3, 2025) - Signal price is percentage not USD
---
## 🔍 Debugging Checklist
**When investigating bugs:**
1. **Check logs first:**
```bash
docker logs -f trading-bot-v4 | grep -E "CRITICAL|ERROR|Failed"
```
2. **Verify deployment:**
```bash
# Container start time MUST be > commit timestamp
docker inspect trading-bot-v4 --format='{{.State.StartedAt}}'
git log -1 --format='%ai %s'
```
3. **Query database state:**
```sql
-- Check for inconsistencies
SELECT * FROM "Trade"
WHERE exitReason IS NULL
AND (createdAt < NOW() - INTERVAL '1 hour')
ORDER BY createdAt DESC;
```
4. **Compare to Drift:**
```bash
# Get actual position from Drift
curl -X GET http://localhost:3001/api/trading/positions \
-H "Authorization: Bearer $API_SECRET_KEY"
```
5. **Search Common Pitfalls:**
```bash
grep -n "symptom_keyword" .github/copilot-instructions.md | head -20
```
---
## 📝 Creating Bug Reports
**Required Sections:**
```markdown
# [Bug Title] (CRITICAL - Fixed [Date])
**Symptom:** [What user observed]
**Root Cause:** [Technical explanation]
**Real Incident ([Date]):**
* [Specific trade or event]
* [Expected behavior]
* [Actual behavior]
* [Financial impact]
**Impact:** [Scope of problem]
**Fix ([Date]):**
```typescript
// Code showing the fix
```
**Files changed:** [List of files]
**Git commit:** [Commit hash]
**Deployed:** [Deployment timestamp]
**Verification Required:**
* [How to test fix works]
* [Expected logs/behavior]
**Lessons Learned:**
1. [Key insight #1]
2. [Key insight #2]
```
**Naming Convention:**
- `CRITICAL_[TOPIC]_BUG.md` - Single critical bug
- `CRITICAL_[TOPIC]_[DATE].md` - Dated incident report
- `FIXES_[SYSTEM].md` - Multiple related fixes
---
## ⚠️ Prevention Guidelines
**From Past Incidents:**
1. **Always verify deployment before declaring "fixed"**
- Container restart timestamp > commit timestamp
- Test with actual trade if possible
- Check logs for expected behavior change
2. **Never trust SDK data formats**
- Log raw values first
- Verify units explicitly (tokens vs USD, % vs decimal)
- Check SDK docs are often wrong
3. **Database writes before in-memory updates**
- Save to database FIRST
- Only then update Position Manager, caches, etc.
- If DB fails, don't proceed
4. **Async operations need serialization**
- Delete from Map IMMEDIATELY (atomic lock)
- Don't check Map.has() then delete later (race condition)
- Use Map.delete() return value as lock
5. **External closures need verification**
- Wait 5-10 seconds for Drift state propagation
- Query Drift to confirm position actually closed
- Keep monitoring if verification fails
---
## 📞 Escalation
**When to declare "CRITICAL":**
- Financial loss occurred or was narrowly avoided
- System produced incorrect data (P&L, positions, exits)
- Position left unprotected (no monitoring, no TP/SL)
- Bug could recur and cause future losses
**Response Required:**
1. Stop trading immediately if data integrity affected
2. Document incident in this directory
3. Add to Common Pitfalls in copilot-instructions.md
4. Fix + deploy + verify within 24 hours
5. Update relevant architecture docs
---
See `../README.md` for overall documentation structure.