docs: Add Common Pitfall #73 - Service initialization bug (k impact)

Added to copilot-instructions.md Common Pitfalls section:

PITFALL #73: Service Initialization Never Ran (Dec 5, 2025)
- Duration: 16 days (Nov 19 - Dec 5)
- Financial impact: 00-1,400 (k user estimate)
- Root cause: Services after validation with early return
- Affected: Stop hunt revenge, smart validation, blocked signal tracker, data cleanup
- Fix: Move services BEFORE validation (commits 51b63f4, f6c9a7b, 35c2d7f)
- Prevention: Test suite, CI/CD, startup health checks, console.log for critical logs
- Full docs: docs/CRITICAL_SERVICE_INITIALIZATION_BUG_DEC5_2025.md
This commit is contained in:
mindesbunister
2025-12-05 19:05:59 +01:00
parent 3f60983b11
commit 5e5a905eee

View File

@@ -2966,7 +2966,7 @@ This section contains the **TOP 10 MOST CRITICAL** pitfalls that every AI agent
**Smart Entry:** #63, #66, #68, #70
**Deployment:** #31, #47
📚 **Full Documentation:** `docs/COMMON_PITFALLS.md` (72 pitfalls with code examples, git commits, deployment dates)
📚 **Full Documentation:** `docs/COMMON_PITFALLS.md` (73 pitfalls with code examples, git commits, deployment dates)
72. **CRITICAL: MFE Data Unit Mismatch - ALWAYS Filter by Date (CRITICAL - Dec 5, 2025):**
- **Symptom:** SQL analysis shows "20%+ average MFE" but TP1 (0.6% target) never hits
@@ -3098,6 +3098,85 @@ This section contains the **TOP 10 MOST CRITICAL** pitfalls that every AI agent
- **Git commit:** PR #3 on branch `copilot/remove-env-from-git-tracking`
- **Status:** ✅ Fixed - .env removed from tracking, .gitignore updated
73. **CRITICAL: Service Initialization Never Ran - $1,000 Lost (CRITICAL - Dec 5, 2025):**
- **Symptom:** 4 critical services coded correctly but never started for 16 days
- **Financial Impact:** $700-1,400 in missed opportunities (user estimate: $1,000)
- **Duration:** Nov 19 - Dec 5, 2025 (16 days)
- **Root Cause:** Services initialized AFTER validation function with early return
- **Code Flow (BROKEN):**
```typescript
// lib/startup/init-position-manager.ts
await validateOpenTrades() // Line 43
// validateOpenTrades() returns early if no trades (line 111)
// SERVICE INITIALIZATION (Lines 59-72) - NEVER REACHED
startDataCleanup()
startBlockedSignalTracking()
await startStopHuntTracking()
await startSmartValidation()
```
- **Affected Services:**
1. **Stop Hunt Revenge Tracker** (Nov 20) - Never attempted revenge on quality 85+ stop-outs
2. **Smart Entry Validation** (Nov 30) - Manual Telegram trades used stale data instead of fresh TradingView metrics
3. **Blocked Signal Price Tracker** (Nov 19) - No data collected for threshold optimization
4. **Data Cleanup Service** (Dec 2) - Database bloat, no 28-day retention enforcement
- **Why It Went Undetected:**
* **Silent failure:** No errors thrown, services simply never initialized
* **Logger silencing:** Production logger (`logger.log`) silenced by `NODE_ENV=production`
* **Split logging:** Some logs appeared (from service functions), others didn't (from init function)
* **Common trigger:** Bug only occurred when `openTrades.length === 0` (frequent in production)
- **Financial Breakdown:**
* Stop hunt revenge: $300-600 lost (missed reversal opportunities)
* Smart validation: $200-400 lost (stale data caused bad entries)
* Blocked signals: $200-400 lost (suboptimal quality thresholds)
* Total: $700-1,400 over 16 days
- **Fix (Dec 5, 2025):**
```typescript
// CORRECT ORDER:
// 1. Start services FIRST (lines 34-50)
startDataCleanup()
startBlockedSignalTracking()
await startStopHuntTracking()
await startSmartValidation()
// 2. THEN validate (line 56) - can return early safely
await validateAllOpenTrades()
await validateOpenTrades() // Early return OK now
// 3. Finally init Position Manager
const manager = await getInitializedPositionManager()
```
- **Logging Fix:** Changed `logger.log()` to `console.log()` for production visibility
- **Verification:**
```bash
$ docker logs trading-bot-v4 | grep -E "🧹|🔬|🎯|🧠|📊"
🧹 Starting data cleanup service...
🔬 Starting blocked signal price tracker...
🎯 Starting stop hunt revenge tracker...
📊 No active stop hunts - tracker will start when needed
🧠 Starting smart entry validation system...
```
- **Prevention Measures:**
1. **Test suite (PR #2):** 113 tests covering Position Manager - add service initialization tests
2. **CI/CD pipeline (PR #5):** Automated quality gates - add service startup validation
3. **Startup health check:** Verify all expected services initialized, throw error if missing
4. **Production logging standard:** Critical operations use `console.log()`, not `logger.log()`
- **Lessons Learned:**
* Service initialization order matters - never place critical services after functions with early returns
* Silent failures are dangerous - add explicit verification that services started
* Production logging must be visible - logger utilities that silence logs = debugging nightmare
* Test real-world conditions - bug only occurred with `NODE_ENV=production` + `openTrades.length === 0`
- **Timeline:**
* Nov 19: Blocked Signal Tracker deployed (never ran)
* Nov 20: Stop Hunt Revenge deployed (never ran)
* Nov 30: Smart Validation deployed (never ran)
* Dec 2: Data Cleanup deployed (never ran)
* Dec 5: Bug discovered and fixed
* Result: **16 days of development with 0 production execution**
- **Git commits:** 51b63f4 (service order fix), f6c9a7b (console.log fix), 35c2d7f (stop hunt logs fix)
- **Full documentation:** `docs/CRITICAL_SERVICE_INITIALIZATION_BUG_DEC5_2025.md`
- **Status:** ✅ Fixed - All services now start on every container restart, verified in production logs
## File Conventions
- **API routes:** `app/api/[feature]/[action]/route.ts` (Next.js 15 App Router)