- Create moneyline_1min_price_feed.pinescript (70% smaller payload) - Remove ATR/ADX/RSI/VOL/POS from 1-minute alerts (not used for decisions) - Keep only price + symbol + timeframe for market data cache - Document rationale in docs/1MIN_SIMPLIFIED_FEED.md - Fix: 5-minute trading signals being dropped due to 1-minute flood (60/hour) - Impact: Preserve priority for actual trading signals
7.3 KiB
Bug Reports & Critical Fixes
Historical record of critical incidents, bugs, and their resolutions.
This directory contains CRITICAL_.md and FIXES_.md bug reports. Every file documents a real incident that cost money, time, or could have caused financial loss if not caught.
🚨 Critical Bug Reports
Position Management
-
CRITICAL_FIX_POSITION_SIZE_BUG.md- Smart Entry using webhook percentage as signal price- Impact: $89 position sizes instead of $2,300, 97% pullback calculations
- Root Cause: TradingView webhook sent percentage (70.80) not price ($142.50)
- Fix: Use Pyth oracle price instead of webhook signal.price
- Date: Dec 3, 2025
-
CRITICAL_INCIDENT_UNPROTECTED_POSITION.md- Database-first pattern violation- Impact: Position opened with NO database record, NO Position Manager tracking
- Root Cause: Position Manager added before database save
- Fix: Database save FIRST, then Position Manager add
- Date: Nov 13, 2025
-
CRITICAL_TP1_FALSE_DETECTION_BUG.md- TP1 detection fails when on-chain orders fill fast- Impact: Winners marked as "SL" exits, analytics incorrect
- Root Cause: External closure detected after both TP1 + runner closed
- Fix: Percentage-based exit reason inference
- Date: Nov 19, 2025
Trade Execution
-
CRITICAL_MISSING_TRADES_NOV19.md- Trades executed but not in database- Impact: 3 trades missing from database, P&L values inflated 5-14×
- Root Cause: P&L compounding in external closure detection
- Fix: Don't mutate trade.realizedPnL during calculation
- Date: Nov 19, 2025
-
CRITICAL_ISSUES_FOUND.md- Multiple critical discovery incidents- Container restart killing positions
- Phantom detection killing runners
- Field name mismatches in startup validation
- Dates: Various Nov 2025
🔧 Fix Documentation
Runner System
FIXES_RUNNER_AND_CANCELLATION.md- Runner stop loss gap fixes- Problem: No SL protection between TP1 and TP2
- Solution: Explicit runner SL check in monitoring loop
- Impact: Prevented unlimited risk exposure on 25-40% runner
- Date: Nov 15, 2025
Applied Fixes
FIXES_APPLIED.md- Collection of multiple bug fixes- TP1 detection
- P&L calculation
- External closure handling
- Order cancellation
- Period: Nov 2025
📋 Common Bug Patterns
1. P&L Compounding (Multiple Incidents)
Pattern: Monitoring loop detects closure multiple times, accumulates P&L Symptoms: Database shows 5-20× actual P&L, duplicate Telegram notifications Root Cause: Async operations + monitoring loop = race condition Solution: Delete from Map IMMEDIATELY (atomic operation) before any async work
Example Fixes:
- Common Pitfall #49 (Nov 17, 2025)
- Common Pitfall #61 (Nov 22, 2025)
- Common Pitfall #67 (Dec 2, 2025)
2. Database-First Violations
Pattern: In-memory state updated before database write Symptoms: Container restart loses tracking, positions unprotected Root Cause: Developer assumes database always succeeds Solution: Always database write FIRST, then update in-memory state
Example Fixes:
- Common Pitfall #29 (Nov 13, 2025)
- CRITICAL_INCIDENT_UNPROTECTED_POSITION.md
3. External Closure Detection
Pattern: Position closed on-chain, Position Manager detects late Symptoms: Duplicate updates, wrong exit reasons, ghost positions Root Cause: Drift state propagation delay (5-10 seconds) Solution: Verification wait + closingInProgress flag + ghost detection
Example Fixes:
- Common Pitfall #47 (Nov 16, 2025) - Close verification gap
- Common Pitfall #56 (Nov 20, 2025) - Ghost orders
- Common Pitfall #57 (Nov 20, 2025) - P&L accuracy
4. Unit Conversion Errors
Pattern: Tokens vs USD, percentage vs decimal, on-chain vs display units Symptoms: Position sizes off by 100×+, TP/SL at wrong prices Root Cause: SDK returns different units than expected Solution: Always log raw values, verify units explicitly
Example Fixes:
- Common Pitfall #22 (Nov 12, 2025) - position.size is TOKENS not USD
- Common Pitfall #68 (Dec 3, 2025) - Signal price is percentage not USD
🔍 Debugging Checklist
When investigating bugs:
- Check logs first:
docker logs -f trading-bot-v4 | grep -E "CRITICAL|ERROR|Failed"
- Verify deployment:
# Container start time MUST be > commit timestamp
docker inspect trading-bot-v4 --format='{{.State.StartedAt}}'
git log -1 --format='%ai %s'
- Query database state:
-- Check for inconsistencies
SELECT * FROM "Trade"
WHERE exitReason IS NULL
AND (createdAt < NOW() - INTERVAL '1 hour')
ORDER BY createdAt DESC;
- Compare to Drift:
# Get actual position from Drift
curl -X GET http://localhost:3001/api/trading/positions \
-H "Authorization: Bearer $API_SECRET_KEY"
- Search Common Pitfalls:
grep -n "symptom_keyword" .github/copilot-instructions.md | head -20
📝 Creating Bug Reports
Required Sections:
# [Bug Title] (CRITICAL - Fixed [Date])
**Symptom:** [What user observed]
**Root Cause:** [Technical explanation]
**Real Incident ([Date]):**
* [Specific trade or event]
* [Expected behavior]
* [Actual behavior]
* [Financial impact]
**Impact:** [Scope of problem]
**Fix ([Date]):**
```typescript
// Code showing the fix
Files changed: [List of files]
Git commit: [Commit hash]
Deployed: [Deployment timestamp]
Verification Required:
- [How to test fix works]
- [Expected logs/behavior]
Lessons Learned:
- [Key insight #1]
- [Key insight #2]
**Naming Convention:**
- `CRITICAL_[TOPIC]_BUG.md` - Single critical bug
- `CRITICAL_[TOPIC]_[DATE].md` - Dated incident report
- `FIXES_[SYSTEM].md` - Multiple related fixes
---
## ⚠️ Prevention Guidelines
**From Past Incidents:**
1. **Always verify deployment before declaring "fixed"**
- Container restart timestamp > commit timestamp
- Test with actual trade if possible
- Check logs for expected behavior change
2. **Never trust SDK data formats**
- Log raw values first
- Verify units explicitly (tokens vs USD, % vs decimal)
- Check SDK docs are often wrong
3. **Database writes before in-memory updates**
- Save to database FIRST
- Only then update Position Manager, caches, etc.
- If DB fails, don't proceed
4. **Async operations need serialization**
- Delete from Map IMMEDIATELY (atomic lock)
- Don't check Map.has() then delete later (race condition)
- Use Map.delete() return value as lock
5. **External closures need verification**
- Wait 5-10 seconds for Drift state propagation
- Query Drift to confirm position actually closed
- Keep monitoring if verification fails
---
## 📞 Escalation
**When to declare "CRITICAL":**
- Financial loss occurred or was narrowly avoided
- System produced incorrect data (P&L, positions, exits)
- Position left unprotected (no monitoring, no TP/SL)
- Bug could recur and cause future losses
**Response Required:**
1. Stop trading immediately if data integrity affected
2. Document incident in this directory
3. Add to Common Pitfalls in copilot-instructions.md
4. Fix + deploy + verify within 24 hours
5. Update relevant architecture docs
---
See `../README.md` for overall documentation structure.