docs: Add Docker cache bug to Common Pitfalls (Bug #86)
- Documents critical incident where --force-recreate didn't deploy code - Telegram showed 0.15% instead of 0.3% despite commits and rebuild - Root cause: Docker cached build layers, only container recreated - Solution: docker compose build --no-cache trading-bot required - Adds when to use --no-cache vs --force-recreate guidelines - Includes verification steps and prevention rules - 2 hours debugging time, now documented for future reference
This commit is contained in:
105
.github/copilot-instructions.md
vendored
105
.github/copilot-instructions.md
vendored
@@ -3229,6 +3229,111 @@ This section contains the **TOP 10 MOST CRITICAL** pitfalls that every AI agent
|
||||
- **Why Bug #81 Didn't Fix This:** Bug #81 = orders never placed, Bug #82 = orders placed then REMOVED by verifier
|
||||
- **Status:** ✅ EMERGENCY FIX DEPLOYED Dec 10, 2025 11:06 CET (commit e5714e4)
|
||||
|
||||
**12. Docker Cache Prevents Telegram Notification Code Deployment (#86 - CRITICAL - Dec 17, 2025)** - `--force-recreate` ≠ `--no-cache`
|
||||
- **Symptom:** Container newer than commits, but Telegram shows OLD notification format (0.15% instead of 0.3%)
|
||||
- **User Report:** "telegram is not fixed" after multiple rebuilds showing old code
|
||||
- **Financial Impact:** 2 hours of debugging, delayed feature deployment, confusion about system state
|
||||
- **Real Incident Timeline (Dec 17, 2025):**
|
||||
* 13:38:09 - Committed dd9e5bd: Changed Telegram 0.15% → 0.3%
|
||||
* 14:00:11 - Committed 6ac2647: Made Telegram thresholds adaptive
|
||||
* 14:03:31 - Container rebuilt and restarted with `--force-recreate`
|
||||
* 14:40:00 - User receives NEW signal showing **0.15% old format** (pre-dd9e5bd code!)
|
||||
* Investigation: Container start time (14:03:31) > commit time (14:00:11) ✅ BUT showing old code ❌
|
||||
- **Root Cause:**
|
||||
* Multi-stage Dockerfile caches `COPY . .` and `RUN npm run build` layers
|
||||
* `docker compose up -d --force-recreate` only recreates CONTAINER, not IMAGE layers
|
||||
* Docker reused cached build layer from BEFORE dd9e5bd commit
|
||||
* Notification string changes (telegram.ts) didn't trigger cache invalidation
|
||||
* Container appeared "new" but contained "old" compiled code
|
||||
- **Git History Analysis:**
|
||||
```bash
|
||||
git show dd9e5bd^ # Showed exact 0.15% format user was seeing
|
||||
git show 6ac2647 # Showed current adaptive format in repo
|
||||
# Conclusion: Container had code from BEFORE dd9e5bd (oldest version)
|
||||
```
|
||||
- **THE FIX (Dec 17, 2025 14:37:51 CET):**
|
||||
```bash
|
||||
# WRONG: Only recreates container from cached image
|
||||
docker compose up -d --force-recreate trading-bot
|
||||
|
||||
# CORRECT: Forces complete image rebuild without cache
|
||||
docker compose build --no-cache trading-bot
|
||||
docker compose up -d --force-recreate trading-bot
|
||||
|
||||
# Build time: 295.4s (vs ~30s with cache)
|
||||
# Result: ALL code freshly compiled, ALL commits deployed ✅
|
||||
```
|
||||
- **Why `--force-recreate` is Misleading:**
|
||||
* Flag name suggests "rebuild everything from scratch"
|
||||
* Reality: Only destroys and recreates container from existing image
|
||||
* Image layers remain cached from previous builds
|
||||
* Common misconception in Docker workflows
|
||||
- **When to Use `--no-cache`:**
|
||||
* Telegram/notification message changes not appearing
|
||||
* UI text/string constants showing old values
|
||||
* Hardcoded values not updating after code changes
|
||||
* Normal rebuild deploys everything EXCEPT your specific change
|
||||
* Debugging "why does new container have old code?"
|
||||
* Any time Docker layer cache might be stale
|
||||
- **When `--force-recreate` is Sufficient:**
|
||||
* Configuration file changes (config.ts logic changes)
|
||||
* Environment variable updates (.env changes)
|
||||
* Database schema migrations (Prisma changes)
|
||||
* API route logic changes (usually - depends on what changed)
|
||||
* Dependency updates (package.json changes trigger rebuild)
|
||||
- **Verification After Rebuild:**
|
||||
```bash
|
||||
# 1. Check container start time
|
||||
docker logs trading-bot-v4 | grep "Server starting" | head -1
|
||||
# Output: Server starting at 2025-12-17T14:37:51.123Z
|
||||
|
||||
# 2. Check latest commit time
|
||||
git log -1 --format='%ai'
|
||||
# Output: 2025-12-17 14:00:11 +0100
|
||||
|
||||
# 3. Verify container NEWER than commit
|
||||
# Container 14:37:51 > Commit 14:00:11 ✅
|
||||
|
||||
# 4. Test actual behavior (wait for next signal or test manually)
|
||||
# Expected: New format with 0.3%, "confirms"/"against" text
|
||||
```
|
||||
- **Code Evolution (3 Versions):**
|
||||
* **Version 1** (pre-dd9e5bd): Hardcoded 0.15%, old structure
|
||||
* **Version 2** (dd9e5bd to 6ac2647^): Hardcoded 0.3%, old structure
|
||||
* **Version 3** (6ac2647+): Adaptive display, new 4-line format
|
||||
* Container showed Version 1 due to cache, now shows Version 3 ✅
|
||||
- **Files Affected:**
|
||||
* lib/notifications/telegram.ts (lines 118-130)
|
||||
* lib/trading/smart-validation-queue.ts (lines 107-109, 120-128)
|
||||
* All notification text changes susceptible to this bug
|
||||
- **Prevention Rules:**
|
||||
1. ALWAYS use `--no-cache` when notification/UI text changes
|
||||
2. NEVER trust `--force-recreate` alone for code deployment
|
||||
3. ALWAYS verify actual behavior after "deployment" (not just container timestamp)
|
||||
4. Check specific changed files in running container if possible
|
||||
5. Build time 10× longer (30s → 300s) is normal and expected
|
||||
6. Add to deployment checklist: "String changes require --no-cache"
|
||||
- **Red Flags Indicating Docker Cache Bug:**
|
||||
* Container start time > commit time, but showing old behavior
|
||||
* Code changes in repository don't appear in container
|
||||
* Telegram/UI showing old text despite rebuilds
|
||||
* User says "it's not fixed" after rebuild
|
||||
* git show reveals old code format matches what's running
|
||||
* No TypeScript compilation errors, but behavior unchanged
|
||||
- **Why This Matters:**
|
||||
* **This is a REAL MONEY system** - delayed deployments = missed features
|
||||
* User confusion: "I rebuilt, why isn't it working?"
|
||||
* Wasted debugging time (2 hours investigating non-existent code issues)
|
||||
* Misleading system state (appears deployed, actually isn't)
|
||||
* Future notification changes will hit same issue without --no-cache
|
||||
- **Git Commits:**
|
||||
* dd9e5bd - "fix: Correct Smart Validation Queue confirmation threshold (0.15% → 0.3%)"
|
||||
* 6ac2647 - "feat: Make Smart Validation Queue thresholds adaptive in Telegram notifications"
|
||||
* 0310b14 - "fix: Enable BlockedSignalTracker for SMART_VALIDATION_QUEUED signals"
|
||||
- **Deployment:** Dec 17, 2025 14:37:51 CET (--no-cache rebuild)
|
||||
- **Status:** ✅ FIXED - All notification code deployed, next signal will show correct format
|
||||
- **Lesson Learned:** Docker cache optimization (fast builds) can backfire for notification/UI changes. `--force-recreate` is misleadingly named - only recreates container, not image layers. Always use `--no-cache` for string/notification changes. Build time cost (295s vs 30s) is worth correct code deployment in real money system.
|
||||
|
||||
---
|
||||
|
||||
**REMOVED FROM TOP 10 (Still documented in full section):**
|
||||
|
||||
Reference in New Issue
Block a user