# AI Agent Instructions for Trading Bot v4 ## Mission & Financial Goals **Primary Objective:** Build wealth systematically from $106 → $100,000+ through algorithmic trading **Current Phase:** Phase 1 - Survival & Proof (Nov 2025 - Jan 2026) - **Current Capital:** $97.55 USDC (zero debt, 100% health) - **Starting Capital:** $106 (Nov 2025) - **Target:** $2,500 by end of Phase 1 (Month 2.5) - **Strategy:** Aggressive compounding, 0 withdrawals - **Position Sizing:** 100% of free collateral (~$97 at 15x leverage = ~$1,463 notional) - **Risk Tolerance:** EXTREME - This is recovery/proof-of-concept mode - **Win Target:** 20-30% monthly returns to reach $2,500 - **Trades Executed:** 161 (as of Nov 12, 2025) **Why This Matters for AI Agents:** - Every dollar counts at this stage - optimize for profitability, not just safety - User needs this system to work for long-term financial goals ($300-500/month withdrawals starting Month 3) - No changes that reduce win rate unless they improve profit factor - System must prove itself before scaling (see `TRADING_GOALS.md` for full 8-phase roadmap) **Key Constraints:** - Can't afford extended drawdowns (limited capital) - Must maintain 60%+ win rate to compound effectively - Quality over quantity - only trade 60+ signal quality scores (lowered from 65 on Nov 12, 2025) - After 3 consecutive losses, STOP and review system ## Architecture Overview **Type:** Autonomous cryptocurrency trading bot with Next.js 15 frontend + Solana/Drift Protocol backend **Data Flow:** TradingView → n8n webhook → Next.js API → Drift Protocol (Solana DEX) → Real-time monitoring → Auto-exit **CRITICAL: RPC Provider Choice** - **MUST use Alchemy RPC** (https://solana-mainnet.g.alchemy.com/v2/YOUR_API_KEY) - **DO NOT use Helius free tier** - causes catastrophic rate limiting (239 errors in 10 minutes) - Helius free: 10 req/sec sustained = TOO LOW for trade execution + Position Manager monitoring - Alchemy free: 300M compute units/month = adequate for bot operations - **Symptom if wrong RPC:** Trades hit SL immediately, duplicate closes, Position Manager loses tracking, database save failures - **Fixed Nov 14, 2025:** Switched to Alchemy, system now works perfectly (TP1/TP2/runner all functioning) **Key Design Principle:** Dual-layer redundancy - every trade has both on-chain orders (Drift) AND software monitoring (Position Manager) as backup. **Exit Strategy:** TP2-as-Runner system (CURRENT): - TP1 at +0.4%: Close configurable % (default 75%, adjustable via `TAKE_PROFIT_1_SIZE_PERCENT`) - TP2 at +0.7%: **Activates trailing stop** on full remaining % (no position close) - Runner: Remaining % after TP1 with ATR-based trailing stop (default 25%, configurable) - **Note:** All UI displays dynamically calculate runner% as `100 - TAKE_PROFIT_1_SIZE_PERCENT` **Per-Symbol Configuration:** SOL and ETH have independent enable/disable toggles and position sizing: - `SOLANA_ENABLED`, `SOLANA_POSITION_SIZE`, `SOLANA_LEVERAGE` (defaults: true, 100%, 15x) - `ETHEREUM_ENABLED`, `ETHEREUM_POSITION_SIZE`, `ETHEREUM_LEVERAGE` (defaults: true, 100%, 1x) - BTC and other symbols fall back to global settings (`MAX_POSITION_SIZE_USD`, `LEVERAGE`) - **Priority:** Per-symbol ENV → Market config → Global ENV → Defaults **Signal Quality System:** Filters trades based on 5 metrics (ATR, ADX, RSI, volumeRatio, pricePosition) scored 0-100. Only trades scoring 60+ are executed (lowered from 65 after data analysis showed 60-64 tier outperformed higher scores). Scores stored in database for future optimization. **Timeframe-Aware Scoring:** Signal quality thresholds adjust based on timeframe (5min vs daily): - 5min: ADX 12+ trending (vs 18+ for daily), ATR 0.2-0.7% healthy (vs 0.4%+ for daily) - Anti-chop filter: -20 points for extreme sideways (ADX <10, ATR <0.25%, Vol <0.9x) - Pass `timeframe` param to `scoreSignalQuality()` from TradingView alerts (e.g., `timeframe: "5"`) **MAE/MFE Tracking:** Every trade tracks Maximum Favorable Excursion (best profit %) and Maximum Adverse Excursion (worst loss %) updated every 2s. Used for data-driven optimization of TP/SL levels. **Manual Trading via Telegram:** Send plain-text messages like `long sol`, `short eth`, `long btc` to open positions instantly (bypasses n8n, calls `/api/trading/execute` directly with preset healthy metrics). **CRITICAL:** Manual trades are marked with `signalSource='manual'` and excluded from TradingView indicator analysis (prevents data contamination). **Re-Entry Analytics System:** Manual trades are validated before execution using fresh TradingView data: - Market data cached from TradingView signals (5min expiry) - `/api/analytics/reentry-check` scores re-entry based on fresh metrics + recent performance - Telegram bot blocks low-quality re-entries unless `--force` flag used - Uses real TradingView ADX/ATR/RSI when available, falls back to historical data - Penalty for recent losing trades, bonus for winning streaks ## VERIFICATION MANDATE: Financial Code Requires Proof **CRITICAL: THIS IS A REAL MONEY TRADING SYSTEM - NOT A TOY PROJECT** **Core Principle:** In trading systems, "working" means "verified with real data", NOT "code looks correct". **NEVER declare something working without:** 1. Observing actual logs showing expected behavior 2. Verifying database state matches expectations 3. Comparing calculated values to source data 4. Testing with real trades when applicable 5. **CONFIRMING CODE IS DEPLOYED** - Check container start time vs commit time **CODE COMMITTED ≠ CODE DEPLOYED** - Git commit at 15:56 means NOTHING if container started at 15:06 - ALWAYS verify: `docker logs trading-bot-v4 | grep "Server starting" | head -1` - Compare container start time to commit timestamp - If container older than commit: **CODE NOT DEPLOYED, FIX NOT ACTIVE** - Never say "fixed" or "protected" until deployment verified ### Critical Path Verification Requirements **Position Manager Changes:** - [ ] Execute test trade with DRY_RUN=false (small size) - [ ] Watch docker logs for full TP1 → TP2 → exit cycle - [ ] SQL query: verify `tp1Hit`, `slMovedToBreakeven`, `currentSize` match Position Manager logs - [ ] Compare Position Manager tracked size to actual Drift position size - [ ] Check exit reason matches actual trigger (TP1/TP2/SL/trailing) **Exit Logic Changes (TP/SL/Trailing):** - [ ] Log EXPECTED values (TP1 price, SL price after breakeven, trailing stop distance) - [ ] Log ACTUAL values from Drift position and Position Manager state - [ ] Verify: Does TP1 hit when price crosses TP1? Does SL move to breakeven? - [ ] Test: Open position, let it hit TP1, verify 75% closed + SL moved - [ ] Document: What SHOULD happen vs what ACTUALLY happened **API Endpoint Changes:** - [ ] curl test with real payload from TradingView/n8n - [ ] Check response JSON matches expectations - [ ] Verify database record created with correct fields - [ ] Check Telegram notification shows correct values (leverage, size, etc.) - [ ] SQL query: confirm all fields populated correctly **Calculation Changes (P&L, Position Sizing, Percentages):** - [ ] Add console.log for EVERY step of calculation - [ ] Verify units match (tokens vs USD, percent vs decimal, etc.) - [ ] SQL query with manual calculation: does code result match hand calculation? - [ ] Test edge cases: 0%, 100%, negative values, very small/large numbers **SDK/External Data Integration:** - [ ] Log raw SDK response to verify assumptions about data format - [ ] NEVER trust documentation - verify with console.log - [ ] Example: position.size doc said "USD" but logs showed "tokens" - [ ] Document actual behavior in Common Pitfalls section ### Red Flags Requiring Extra Verification **High-Risk Changes:** - Unit conversions (tokens ↔ USD, percent ↔ decimal) - State transitions (TP1 hit → move SL to breakeven) - Configuration precedence (per-symbol vs global vs defaults) - Display values from complex calculations (leverage, size, P&L) - Timing-dependent logic (grace periods, cooldowns, race conditions) **Verification Steps for Each:** 1. **Before declaring working**: Show proof (logs, SQL results, test output) 2. **After deployment**: Monitor first real trade closely, verify behavior 3. **Edge cases**: Test boundary conditions (0, 100%, max leverage, min size) 4. **Regression**: Check that fix didn't break other functionality ### SQL Verification Queries **After Position Manager changes:** ```sql -- Verify TP1 detection worked correctly SELECT symbol, entryPrice, currentSize, realizedPnL, tp1Hit, slMovedToBreakeven, exitReason, TO_CHAR(createdAt, 'MM-DD HH24:MI') as time FROM "Trade" WHERE exitReason IS NULL -- Open positions OR createdAt > NOW() - INTERVAL '1 hour' -- Recent closes ORDER BY createdAt DESC LIMIT 5; -- Compare Position Manager state to expectations SELECT configSnapshot->'positionManagerState' as pm_state FROM "Trade" WHERE symbol = 'SOL-PERP' AND exitReason IS NULL; ``` **After calculation changes:** ```sql -- Verify P&L calculations SELECT symbol, direction, entryPrice, exitPrice, positionSize, realizedPnL, -- Manual calculation: CASE WHEN direction = 'long' THEN positionSize * ((exitPrice - entryPrice) / entryPrice) ELSE positionSize * ((entryPrice - exitPrice) / entryPrice) END as expected_pnl, -- Difference: realizedPnL - CASE WHEN direction = 'long' THEN positionSize * ((exitPrice - entryPrice) / entryPrice) ELSE positionSize * ((entryPrice - exitPrice) / entryPrice) END as pnl_difference FROM "Trade" WHERE exitReason IS NOT NULL AND createdAt > NOW() - INTERVAL '24 hours' ORDER BY createdAt DESC LIMIT 10; ``` ### Example: How Position.size Bug Should Have Been Caught **What went wrong:** - Read code: "Looks like it's comparing sizes correctly" - Declared: "Position Manager is working!" - Didn't verify with actual trade **What should have been done:** ```typescript // In Position Manager monitoring loop - ADD THIS LOGGING: console.log('🔍 VERIFICATION:', { positionSizeRaw: position.size, // What SDK returns positionSizeUSD: position.size * currentPrice, // Converted to USD trackedSizeUSD: trade.currentSize, // What we're tracking ratio: (position.size * currentPrice) / trade.currentSize, tp1ShouldTrigger: (position.size * currentPrice) < trade.currentSize * 0.95 }) ``` Then observe logs on actual trade: ``` 🔍 VERIFICATION: { positionSizeRaw: 12.28, // ← AH! This is SOL tokens, not USD! positionSizeUSD: 1950.84, // ← Correct USD value trackedSizeUSD: 1950.00, ratio: 1.0004, // ← Should be near 1.0 when position full tp1ShouldTrigger: false // ← Correct } ``` **Lesson:** One console.log would have exposed the bug immediately. ### Deployment Checklist **MANDATORY PRE-DEPLOYMENT VERIFICATION:** - [ ] Check container start time: `docker logs trading-bot-v4 | grep "Server starting" | head -1` - [ ] Compare to commit timestamp: Container MUST be newer than code changes - [ ] If container older: **STOP - Code not deployed, fix not active** - [ ] Never declare "fixed" or "working" until container restarted with new code Before marking feature complete: - [ ] Code review completed - [ ] Unit tests pass (if applicable) - [ ] Integration test with real API calls - [ ] Logs show expected behavior - [ ] Database state verified with SQL - [ ] Edge cases tested - [ ] **Container restarted and verified running new code** - [ ] Documentation updated (including Common Pitfalls if applicable) - [ ] User notified of what to verify during first real trade ### When to Escalate to User **Don't say "it's working" if:** - You haven't observed actual logs showing the expected behavior - SQL query shows unexpected values - Test trade behaved differently than expected - You're unsure about unit conversions or SDK behavior - Change affects money (position sizing, P&L, exits) - **Container hasn't been restarted since code commit** **Instead say:** - "Code is updated. Need to verify with test trade - watch for [specific log message]" - "Fixed, but requires verification: check database shows [expected value]" - "Deployed. First real trade should show [behavior]. If not, there's still a bug." - **"Code committed but NOT deployed - container running old version, fix not active yet"** ### Docker Build Best Practices **CRITICAL: Prevent build interruptions with background execution + live monitoring** Docker builds take 40-70 seconds and are easily interrupted by terminal issues. Use this pattern: ```bash # Start build in background with live log tail cd /home/icke/traderv4 && docker compose build trading-bot > /tmp/docker-build-live.log 2>&1 & BUILD_PID=$!; echo "Build started, PID: $BUILD_PID"; tail -f /tmp/docker-build-live.log ``` **Why this works:** - Build runs in background (`&`) - immune to terminal disconnects/Ctrl+C - Output redirected to log file - can review later if needed - `tail -f` shows real-time progress - see compilation, linting, errors - Can Ctrl+C the `tail -f` without killing build - build continues - Verification after: `tail -50 /tmp/docker-build-live.log` to check success **Success indicators:** - `✓ Compiled successfully in 27s` - `✓ Generating static pages (30/30)` - `#22 naming to docker.io/library/traderv4-trading-bot done` - `DONE X.Xs` on final step **Failure indicators:** - `Failed to compile.` - `Type error:` - `ERROR: process "/bin/sh -c npm run build" did not complete successfully: exit code: 1` **After successful build:** ```bash # Deploy new container docker compose up -d --force-recreate trading-bot # Verify it started docker logs --tail=30 trading-bot-v4 # Confirm deployed version docker logs trading-bot-v4 | grep "Server starting" | head -1 ``` **DO NOT use:** `docker compose build trading-bot` in foreground - one network hiccup kills 60s of work ### Docker Cleanup After Builds **CRITICAL: Prevent disk full issues from build cache accumulation** Docker builds create intermediate layers (1.3+ GB per build) that accumulate over time. Build cache can reach 40-50 GB after frequent rebuilds. **After successful deployment, clean up:** ```bash # Remove dangling images (old builds) docker image prune -f # Remove build cache (biggest space hog - 40+ GB typical) docker builder prune -f # Optional: Remove dangling volumes (if no important data) docker volume prune -f # Check space saved docker system df ``` **When to run:** - After each successful deployment (recommended) - Weekly if building frequently - When disk space warnings appear - Before major updates/migrations **Space typically freed:** - Dangling images: 2-5 GB - Build cache: 40-50 GB - Dangling volumes: 0.5-1 GB - **Total: 40-55 GB per cleanup** **What's safe to delete:** - `` tagged images (old builds) - Build cache (recreated on next build) - Dangling volumes (orphaned from removed containers) **What NOT to delete:** - Named volumes (contain data: `trading-bot-postgres`, etc.) - Active containers - Tagged images currently in use --- ## Critical Components ### 1. Phantom Trade Auto-Closure System **Purpose:** Automatically close positions when size mismatch detected (position opened but wrong size) **When triggered:** - Position opened on Drift successfully - Expected size: $50 (50% @ 1x leverage) - Actual size: $1.37 (7% fill - likely oracle price stale or exchange rejection) - Size ratio < 50% threshold → phantom detected **Automated response (all happens in <1 second):** 1. **Immediate closure:** Market order closes 100% of phantom position 2. **Database logging:** Creates trade record with `status='phantom'`, saves P&L 3. **n8n notification:** Returns HTTP 200 with full details (not 500 - allows workflow to continue) 4. **Telegram alert:** Message includes entry/exit prices, P&L, reason, transaction IDs **Why auto-close instead of manual intervention:** - User may be asleep, away from devices, unavailable for hours - Unmonitored position = unlimited risk exposure - Position Manager won't track phantom (by design) - No TP/SL protection, no trailing stop, no monitoring - Better to exit with small loss/gain than leave position exposed - Re-entry always possible if setup was actually good **Example notification:** ``` ⚠️ PHANTOM TRADE AUTO-CLOSED Symbol: SOL-PERP Direction: LONG Expected Size: $48.75 Actual Size: $1.37 (2.8%) Entry: $168.50 Exit: $168.45 P&L: -$0.02 Reason: Size mismatch detected - likely oracle price issue or exchange rejection Action: Position auto-closed for safety (unmonitored positions = risk) TX: 5Yx2Fm8vQHKLdPaw... ``` **Database tracking:** - `status='phantom'` field identifies these trades - `isPhantom=true`, `phantomReason='ORACLE_PRICE_MISMATCH'` - `expectedSizeUSD`, `actualSizeUSD` fields for analysis - Exit reason: `'manual'` (phantom auto-close category) - Enables post-trade analysis of phantom frequency and patterns **Code location:** `app/api/trading/execute/route.ts` lines 322-445 ### 2. Signal Quality Scoring (`lib/trading/signal-quality.ts`) **Purpose:** Unified quality validation system that scores trading signals 0-100 based on 5 market metrics **Timeframe-aware thresholds:** ```typescript scoreSignalQuality({ atr, adx, rsi, volumeRatio, pricePosition, timeframe?: string // "5" for 5min, undefined for higher timeframes }) ``` **5min chart adjustments:** - ADX healthy range: 12-22 (vs 18-30 for daily) - ATR healthy range: 0.2-0.7% (vs 0.4%+ for daily) - Anti-chop filter: -20 points for extreme sideways (ADX <10, ATR <0.25%, Vol <0.9x) **Price position penalties (all timeframes):** - Long at 90-95%+ range: -15 to -30 points (chasing highs) - Short at <5-10% range: -15 to -30 points (chasing lows) - Prevents flip-flop losses from entering range extremes **Key behaviors:** - Returns score 0-100 and detailed breakdown object - Minimum score 60 required to execute trade - Called by both `/api/trading/check-risk` and `/api/trading/execute` - Scores saved to database for post-trade analysis ### 2. Position Manager (`lib/trading/position-manager.ts`) **Purpose:** Software-based monitoring loop that checks prices every 2 seconds and closes positions via market orders **Singleton pattern:** Always use `getInitializedPositionManager()` - never instantiate directly ```typescript const positionManager = await getInitializedPositionManager() await positionManager.addTrade(activeTrade) ``` **Key behaviors:** - Tracks `ActiveTrade` objects in a Map - **TP2-as-Runner system**: TP1 (configurable %, default 75%) → TP2 trigger (no close, activate trailing) → Runner (remaining %) with ATR-based trailing stop - Dynamic SL adjustments: Moves to breakeven after TP1, locks profit at +1.2% - **On-chain order synchronization:** After TP1 hits, calls `cancelAllOrders()` then `placeExitOrders()` with updated SL price at breakeven (uses `retryWithBackoff()` for rate limit handling) - **ATR-based trailing stop:** Calculates trail distance as `(atrAtEntry / currentPrice × 100) × trailingStopAtrMultiplier`, clamped between min/max % - Trailing stop: Activates when TP2 price hit, tracks `peakPrice` and trails dynamically - Closes positions via `closePosition()` market orders when targets hit - Acts as backup if on-chain orders don't fill - State persistence: Saves to database, restores on restart via `configSnapshot.positionManagerState` - **Startup validation:** On container restart, cross-checks last 24h "closed" trades against Drift to detect orphaned positions (see `lib/startup/init-position-manager.ts`) - **Grace period for new trades:** Skips "external closure" detection for positions <30 seconds old (Drift positions take 5-10s to propagate) - **Exit reason detection:** Uses trade state flags (`tp1Hit`, `tp2Hit`) and realized P&L to determine exit reason, NOT current price (avoids misclassification when price moves after order fills) - **Real P&L calculation:** Calculates actual profit based on entry vs exit price, not SDK's potentially incorrect values - **Rate limit-aware exit:** On 429 errors during close, keeps trade in monitoring (doesn't mark closed), retries naturally on next price update ### 3. Telegram Bot (`telegram_command_bot.py`) **Purpose:** Python-based Telegram bot for manual trading commands and position status monitoring **Manual trade commands via plain text:** ```python # User sends plain text message (not slash commands) "long sol" → Validates via analytics, then opens SOL-PERP long "short eth" → Validates via analytics, then opens ETH-PERP short "long btc --force" → Skips analytics validation, opens BTC-PERP long immediately ``` **Key behaviors:** - MessageHandler processes all text messages (not just commands) - Maps user-friendly symbols (sol, eth, btc) to Drift format (SOL-PERP, etc.) - **Analytics validation:** Calls `/api/analytics/reentry-check` before execution - Blocks trades with score <55 unless `--force` flag used - Uses fresh TradingView data (<5min old) when available - Falls back to historical metrics with penalty - Considers recent trade performance (last 3 trades) - Calls `/api/trading/execute` directly with preset healthy metrics (ATR=0.45, ADX=32, RSI=58/42) - Bypasses n8n workflow and TradingView requirements - 60-second timeout for API calls - Responds with trade confirmation or analytics rejection message **Status command:** ```python /status → Returns JSON of open positions from Drift ``` **Implementation details:** - Uses `python-telegram-bot` library - Deployed via `docker-compose.telegram-bot.yml` - Requires `TELEGRAM_BOT_TOKEN` and `TELEGRAM_CHANNEL_ID` in .env - API calls to `http://trading-bot:3000/api/trading/execute` **Drift client integration:** - Singleton pattern: Use `initializeDriftService()` and `getDriftService()` - maintains single connection ```typescript const driftService = await initializeDriftService() const health = await driftService.getAccountHealth() ``` - Wallet handling: Supports both JSON array `[91,24,...]` and base58 string formats from Phantom wallet ### 4. Rate Limit Monitoring (`lib/drift/orders.ts` + `app/api/analytics/rate-limits`) **Purpose:** Track and analyze Solana RPC rate limiting (429 errors) to prevent silent failures **Helius RPC Limits (Free Tier):** - **Burst:** 100 requests/second - **Sustained:** 10 requests/second - **Monthly:** 100k requests - See `docs/HELIUS_RATE_LIMITS.md` for upgrade recommendations **Retry mechanism with exponential backoff (Nov 14, 2025 - Updated):** ```typescript await retryWithBackoff(async () => { return await driftClient.cancelOrders(...) }, maxRetries = 3, baseDelay = 5000) // Increased from 2s to 5s ``` **Progression:** 5s → 10s → 20s (vs old 2s → 4s → 8s) **Rationale:** Gives Helius time to recover, reduces cascade pressure by 2.5x **Database logging:** Three event types in SystemEvent table: - `rate_limit_hit`: Each 429 error (logged with attempt #, delay, error snippet) - `rate_limit_recovered`: Successful retry (logged with total time, retry count) - `rate_limit_exhausted`: Failed after max retries (CRITICAL - order operation failed) **Analytics endpoint:** ```bash curl http://localhost:3001/api/analytics/rate-limits ``` Returns: Total hits/recoveries/failures, hourly patterns, recovery times, success rate **Key behaviors:** - Only RPC calls wrapped: `cancelAllOrders()`, `placeExitOrders()`, `closePosition()` - Position Manager monitoring: Event-driven via Pyth WebSocket (not polling) - Rate limit-aware exit: Position Manager keeps monitoring on 429 errors (retries naturally) - Logs to both console and database for post-trade analysis **Monitoring queries:** See `docs/RATE_LIMIT_MONITORING.md` for SQL queries **Startup Position Validation (Nov 14, 2025 - Added):** On container startup, cross-checks last 24h of "closed" trades against actual Drift positions: - If DB says closed but Drift shows open → reopens in DB to restore Position Manager tracking - Prevents orphaned positions from failed close transactions - Logs: `🔴 CRITICAL: ${symbol} marked as CLOSED in DB but still OPEN on Drift!` - Implementation: `lib/startup/init-position-manager.ts` - `validateOpenTrades()` ### 5. Order Placement (`lib/drift/orders.ts`) **Critical functions:** - `openPosition()` - Opens market position with transaction confirmation - `closePosition()` - Closes position with transaction confirmation - `placeExitOrders()` - Places TP/SL orders on-chain - `cancelAllOrders()` - Cancels all reduce-only orders for a market **CRITICAL: Transaction Confirmation Pattern** Both `openPosition()` and `closePosition()` MUST confirm transactions on-chain: ```typescript const txSig = await driftClient.placePerpOrder(orderParams) console.log('⏳ Confirming transaction on-chain...') const connection = driftService.getConnection() const confirmation = await connection.confirmTransaction(txSig, 'confirmed') if (confirmation.value.err) { throw new Error(`Transaction failed: ${JSON.stringify(confirmation.value.err)}`) } console.log('✅ Transaction confirmed on-chain') ``` Without this, the SDK returns signatures for transactions that never execute, causing phantom trades/closes. **CRITICAL: Drift SDK position.size is BASE ASSET TOKENS, not USD** The Drift SDK returns `position.size` as token quantity (SOL/ETH/BTC), NOT USD notional: ```typescript // CORRECT: Convert tokens to USD by multiplying by current price const positionSizeUSD = Math.abs(position.size) * currentPrice // WRONG: Using position.size directly as USD (off by 150x+ for SOL!) const positionSizeUSD = Math.abs(position.size) ``` **This affects Position Manager's TP1/TP2 detection** - if position.size is not converted to USD before comparing to tracked USD values, the system will never detect partial closes correctly. See Common Pitfall #22 for the full bug details and fix applied Nov 12, 2025. **Solana RPC Rate Limiting with Exponential Backoff** Solana RPC endpoints return 429 errors under load. Always use retry logic for order operations: ```typescript export async function retryWithBackoff( operation: () => Promise, maxRetries: number = 3, initialDelay: number = 5000 // Increased from 2000ms to 5000ms (Nov 14, 2025) ): Promise { for (let attempt = 0; attempt < maxRetries; attempt++) { try { return await operation() } catch (error: any) { if (error?.message?.includes('429') && attempt < maxRetries - 1) { const delay = initialDelay * Math.pow(2, attempt) console.log(`⏳ Rate limited, retrying in ${delay/1000}s... (attempt ${attempt + 1}/${maxRetries})`) await new Promise(resolve => setTimeout(resolve, delay)) continue } throw error } } throw new Error('Max retries exceeded') } // Usage in cancelAllOrders await retryWithBackoff(() => driftClient.cancelOrders(...)) ``` **Note:** Increased from 2s to 5s base delay to give Helius RPC more recovery time. See `docs/HELIUS_RATE_LIMITS.md` for detailed analysis. Without this, order cancellations fail silently during TP1→breakeven order updates, leaving ghost orders that cause incorrect fills. **Dual Stop System** (USE_DUAL_STOPS=true): ```typescript // Soft stop: TRIGGER_LIMIT at -1.5% (avoids wicks) // Hard stop: TRIGGER_MARKET at -2.5% (guarantees exit) ``` **Order types:** - Entry: MARKET (immediate execution) - TP1/TP2: LIMIT reduce-only orders - Soft SL: TRIGGER_LIMIT reduce-only - Hard SL: TRIGGER_MARKET reduce-only ### 6. Database (`lib/database/trades.ts` + `prisma/schema.prisma`) **Purpose:** PostgreSQL via Prisma ORM for trade history and analytics **Models:** Trade, PriceUpdate, SystemEvent, DailyStats, BlockedSignal **Singleton pattern:** Use `getPrismaClient()` - never instantiate PrismaClient directly **Key functions:** - `createTrade()` - Save trade after execution (includes dual stop TX signatures + signalQualityScore) - `updateTradeExit()` - Record exit with P&L - `addPriceUpdate()` - Track price movements (called by Position Manager) - `getTradeStats()` - Win rate, profit factor, avg win/loss - `getLastTrade()` - Fetch most recent trade for analytics dashboard - `createBlockedSignal()` - Save blocked signals for data-driven optimization analysis - `getRecentBlockedSignals()` - Query recent blocked signals - `getBlockedSignalsForAnalysis()` - Fetch signals needing price analysis (future automation) **Important fields:** - `signalSource` (String?) - Identifies trade origin: 'tradingview', 'manual', or NULL (old trades) - **CRITICAL:** Manual Telegram trades are marked `signalSource='manual'` and excluded from TradingView indicator analysis - Use filter: `WHERE ("signalSource" IS NULL OR "signalSource" != 'manual')` for indicator optimization queries - See `docs/MANUAL_TRADE_FILTERING.md` for complete SQL filtering guide - `signalQualityScore` (Int?) - 0-100 score for data-driven optimization - `signalQualityVersion` (String?) - Tracks which scoring logic was used ('v1', 'v2', 'v3', 'v4') - v1: Original logic (price position < 5% threshold) - v2: Added volume compensation for low ADX (2025-11-07) - v3: Stricter breakdown requirements: positions < 15% require (ADX > 18 AND volume > 1.2x) OR (RSI < 35 for shorts / RSI > 60 for longs) - v4: CURRENT - Blocked signals tracking enabled for data-driven threshold optimization (2025-11-11) - All new trades tagged with current version for comparative analysis - `maxFavorableExcursion` / `maxAdverseExcursion` - Track best/worst P&L during trade lifetime - `maxFavorablePrice` / `maxAdversePrice` - Track prices at MFE/MAE points - `configSnapshot` (Json) - Stores Position Manager state for crash recovery - `atr`, `adx`, `rsi`, `volumeRatio`, `pricePosition` - Context metrics from TradingView **BlockedSignal model fields (NEW):** - Signal metrics: `atr`, `adx`, `rsi`, `volumeRatio`, `pricePosition`, `timeframe` - Quality scoring: `signalQualityScore`, `signalQualityVersion`, `scoreBreakdown` (JSON), `minScoreRequired` - Block tracking: `blockReason` (QUALITY_SCORE_TOO_LOW, COOLDOWN_PERIOD, HOURLY_TRADE_LIMIT, etc.), `blockDetails` - Future analysis: `priceAfter1/5/15/30Min`, `wouldHitTP1/TP2/SL`, `analysisComplete` - Automatically saved by check-risk endpoint when signals are blocked - Enables data-driven optimization: collect 10-20 blocked signals → analyze patterns → adjust thresholds **Per-symbol functions:** - `getLastTradeTimeForSymbol(symbol)` - Get last trade time for specific coin (enables per-symbol cooldown) - Each coin (SOL/ETH/BTC) has independent cooldown timer to avoid missed opportunities ## Configuration System **Three-layer merge:** 1. `DEFAULT_TRADING_CONFIG` (config/trading.ts) 2. Environment variables (.env) via `getConfigFromEnv()` 3. Runtime overrides via `getMergedConfig(overrides)` **Always use:** `getMergedConfig()` to get final config - never read env vars directly in business logic **Per-symbol position sizing:** Use `getPositionSizeForSymbol(symbol, config)` which returns `{ size, leverage, enabled }` ```typescript const { size, leverage, enabled } = getPositionSizeForSymbol('SOL-PERP', config) if (!enabled) { return NextResponse.json({ success: false, error: 'Symbol trading disabled' }, { status: 400 }) } ``` **Symbol normalization:** TradingView sends "SOLUSDT" → must convert to "SOL-PERP" for Drift ```typescript const driftSymbol = normalizeTradingViewSymbol(body.symbol) ``` ## API Endpoints Architecture **Authentication:** All `/api/trading/*` endpoints (except `/test`) require `Authorization: Bearer API_SECRET_KEY` **Pattern:** Each endpoint follows same flow: 1. Auth check 2. Get config via `getMergedConfig()` 3. Initialize Drift service 4. Check account health 5. Execute operation 6. Save to database 7. Add to Position Manager if applicable **Key endpoints:** - `/api/trading/execute` - Main entry point from n8n (production, requires auth), **auto-caches market data** - `/api/trading/check-risk` - Pre-execution validation (duplicate check, quality score, **per-symbol cooldown**, rate limits, **symbol enabled check**, **saves blocked signals automatically**) - `/api/trading/test` - Test trades from settings UI (no auth required, **respects symbol enable/disable**) - `/api/trading/close` - Manual position closing (requires symbol normalization) - `/api/trading/sync-positions` - **Force Position Manager sync with Drift** (POST, requires auth) - restores tracking for orphaned positions - `/api/trading/cancel-orders` - **Manual order cleanup** (for stuck/ghost orders after rate limit failures) - `/api/trading/positions` - Query open positions from Drift - `/api/trading/market-data` - Webhook for TradingView market data updates (GET for debug, POST for data) - `/api/settings` - Get/update config (writes to .env file, **includes per-symbol settings**) - `/api/analytics/last-trade` - Fetch most recent trade details for dashboard (includes quality score) - `/api/analytics/reentry-check` - **Validate manual re-entry** with fresh TradingView data + recent performance - `/api/analytics/version-comparison` - Compare performance across signal quality logic versions (v1/v2/v3/v4) - `/api/restart` - Create restart flag for watch-restart.sh script ## Critical Workflows ### Execute Trade (Production) ``` TradingView alert → n8n Parse Signal Enhanced (extracts metrics + timeframe) ↓ /api/trading/check-risk [validates quality score ≥60, checks duplicates, per-symbol cooldown] ↓ /api/trading/execute ↓ normalize symbol (SOLUSDT → SOL-PERP) ↓ getMergedConfig() ↓ getPositionSizeForSymbol() [check if symbol enabled + get sizing] ↓ openPosition() [MARKET order] ↓ calculate dual stop prices if enabled ↓ placeExitOrders() [on-chain TP1/TP2/SL orders] ↓ scoreSignalQuality({ ..., timeframe }) [compute 0-100 score with timeframe-aware thresholds] ↓ createTrade() [CRITICAL: save to database FIRST - see Common Pitfall #27] ↓ positionManager.addTrade() [ONLY after DB save succeeds - prevents unprotected positions] ``` **CRITICAL EXECUTION ORDER (Nov 13, 2025 Fix):** The order of database save → Position Manager add is NOT arbitrary - it's a safety requirement: - If database save fails, API returns HTTP 500 with critical warning - User sees: "CLOSE POSITION MANUALLY IMMEDIATELY" with transaction signature - Position Manager only tracks database-persisted trades - Container restarts can restore all positions from database - **Never add to Position Manager before database save** - creates unprotected positions ### Position Monitoring Loop ``` Position Manager every 2s: ↓ Verify on-chain position still exists (detect external closures) ↓ getPythPriceMonitor().getLatestPrice() ↓ Calculate current P&L and update MAE/MFE metrics ↓ Check emergency stop (-2%) → closePosition(100%) ↓ Check SL hit → closePosition(100%) ↓ Check TP1 hit → closePosition(75%), cancelAllOrders(), placeExitOrders() with SL at breakeven ↓ Check profit lock trigger (+1.2%) → move SL to +configured% ↓ Check TP2 hit → closePosition(80% of remaining), activate runner ↓ Check trailing stop (if runner active) → adjust SL dynamically based on peakPrice ↓ addPriceUpdate() [save to database every N checks] ↓ saveTradeState() [persist Position Manager state + MAE/MFE for crash recovery] ``` ### Settings Update ``` Web UI → /api/settings POST ↓ Validate new settings ↓ Write to .env file using string replacement ↓ Return success ↓ User clicks "Restart Bot" → /api/restart ↓ Creates /tmp/trading-bot-restart.flag ↓ watch-restart.sh detects flag ↓ Executes: docker restart trading-bot-v4 ``` ## Docker Context **Multi-stage build:** deps → builder → runner (Node 20 Alpine) **Critical Dockerfile steps:** 1. Install deps with `npm install --production` 2. Copy source and `npx prisma generate` (MUST happen before build) 3. `npm run build` (Next.js standalone output) 4. Runner stage copies standalone + static + node_modules + Prisma client **Container networking:** - External: `trading-bot-v4` on port 3001 - Internal: Next.js on port 3000 - Database: `trading-bot-postgres` on 172.28.0.0/16 network **DATABASE_URL caveat:** Use `trading-bot-postgres` (container name) in .env for runtime, but `localhost:5432` for Prisma CLI migrations from host ## Project-Specific Patterns ### 1. Singleton Services Never create multiple instances - always use getter functions: ```typescript const driftService = await initializeDriftService() // NOT: new DriftService() const positionManager = getPositionManager() // NOT: new PositionManager() const prisma = getPrismaClient() // NOT: new PrismaClient() ``` ### 2. Price Calculations Direction matters for long vs short: ```typescript function calculatePrice(entry: number, percent: number, direction: 'long' | 'short') { if (direction === 'long') { return entry * (1 + percent / 100) // Long: +1% = higher price } else { return entry * (1 - percent / 100) // Short: +1% = lower price } } ``` ### 3. Error Handling Database failures should not fail trades - always wrap in try/catch: ```typescript try { await createTrade(params) console.log('💾 Trade saved to database') } catch (dbError) { console.error('❌ Failed to save trade:', dbError) // Don't fail the trade if database save fails } ``` ### 4. Reduce-Only Orders All exit orders MUST be reduce-only (can only close, not open positions): ```typescript const orderParams = { reduceOnly: true, // CRITICAL for TP/SL orders // ... other params } ``` ### 5. Nextcloud Deck Roadmap Sync **Purpose:** Visual kanban board for tracking optimization roadmap progress **Key Components:** - `scripts/discover-deck-ids.sh` - Find Nextcloud Deck board/stack IDs - `scripts/sync-roadmap-to-deck.py` - Sync roadmap files to Deck cards - `docs/NEXTCLOUD_DECK_SYNC.md` - Complete documentation **Workflow:** ```bash # One-time setup (already done) bash scripts/discover-deck-ids.sh # Creates /tmp/deck-config.json # Sync roadmap to Deck (creates/updates cards) python3 scripts/sync-roadmap-to-deck.py --init # Always dry-run first to preview changes python3 scripts/sync-roadmap-to-deck.py --init --dry-run ``` **Stack Mapping:** - 📥 **Backlog:** Future phases, ideas, ML work (status: FUTURE) - 📋 **Planning:** Next phases, ready to implement (status: PENDING, NEXT) - 🚀 **In Progress:** Currently active work (status: CURRENT, IN PROGRESS, DEPLOYED) - ✅ **Complete:** Finished phases (status: COMPLETE) **Card Structure:** - 3 high-level initiative cards (from `OPTIMIZATION_MASTER_ROADMAP.md`) - 18 detailed phase cards (from individual roadmap files) - Total: 21 cards tracking all optimization work **When to Sync:** - After completing a phase (update markdown status → re-sync) - When starting new phase (move card in Deck UI) - Weekly during active development to keep visual state current **Important Notes:** - API doesn't support duplicate detection - always use `--dry-run` first - Manual card deletion required (API returns 405 on DELETE) - Code blocks auto-removed from descriptions (prevent API errors) - Card titles cleaned (no markdown, emojis removed for readability) ## Testing Commands ```bash # Local development npm run dev # Build production npm run build && npm start # Docker build and restart docker compose build trading-bot docker compose up -d --force-recreate trading-bot docker logs -f trading-bot-v4 # Database operations npx prisma generate # Generate client DATABASE_URL="postgresql://...@localhost:5432/..." npx prisma migrate dev docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c "\dt" # Test trade from UI # Go to http://localhost:3001/settings # Click "Test LONG" or "Test SHORT" ``` ## SQL Analysis Queries Essential queries for monitoring signal quality and blocked signals. Run via: ```bash docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c "YOUR_QUERY" ``` ### Phase 1: Monitor Data Collection Progress ```sql -- Check blocked signals count (target: 10-20 for Phase 2) SELECT COUNT(*) as total_blocked FROM "BlockedSignal"; -- Score distribution of blocked signals SELECT CASE WHEN signalQualityScore >= 60 THEN '60-64 (Close Call)' WHEN signalQualityScore >= 55 THEN '55-59 (Marginal)' WHEN signalQualityScore >= 50 THEN '50-54 (Weak)' ELSE '0-49 (Very Weak)' END as tier, COUNT(*) as count, ROUND(AVG(signalQualityScore)::numeric, 1) as avg_score FROM "BlockedSignal" WHERE blockReason = 'QUALITY_SCORE_TOO_LOW' GROUP BY tier ORDER BY MIN(signalQualityScore) DESC; -- Recent blocked signals with full details SELECT symbol, direction, signalQualityScore as score, ROUND(adx::numeric, 1) as adx, ROUND(atr::numeric, 2) as atr, ROUND(pricePosition::numeric, 1) as pos, ROUND(volumeRatio::numeric, 2) as vol, blockReason, TO_CHAR(createdAt, 'MM-DD HH24:MI') as time FROM "BlockedSignal" ORDER BY createdAt DESC LIMIT 10; ``` ### Phase 2: Compare Blocked vs Executed Trades ```sql -- Compare executed trades in 60-69 score range SELECT signalQualityScore as score, COUNT(*) as trades, ROUND(AVG(realizedPnL)::numeric, 2) as avg_pnl, ROUND(SUM(realizedPnL)::numeric, 2) as total_pnl, ROUND(100.0 * SUM(CASE WHEN realizedPnL > 0 THEN 1 ELSE 0 END) / COUNT(*)::numeric, 1) as win_rate FROM "Trade" WHERE exitReason IS NOT NULL AND signalQualityScore BETWEEN 60 AND 69 GROUP BY signalQualityScore ORDER BY signalQualityScore; -- Block reason breakdown SELECT blockReason, COUNT(*) as count, ROUND(AVG(signalQualityScore)::numeric, 1) as avg_score FROM "BlockedSignal" GROUP BY blockReason ORDER BY count DESC; ``` ### Analyze Specific Patterns ```sql -- Blocked signals at range extremes (price position) SELECT direction, signalQualityScore as score, ROUND(pricePosition::numeric, 1) as pos, ROUND(adx::numeric, 1) as adx, ROUND(volumeRatio::numeric, 2) as vol, symbol, TO_CHAR(createdAt, 'MM-DD HH24:MI') as time FROM "BlockedSignal" WHERE blockReason = 'QUALITY_SCORE_TOO_LOW' AND (pricePosition < 10 OR pricePosition > 90) ORDER BY signalQualityScore DESC; -- ADX distribution in blocked signals SELECT CASE WHEN adx >= 25 THEN 'Strong (25+)' WHEN adx >= 20 THEN 'Moderate (20-25)' WHEN adx >= 15 THEN 'Weak (15-20)' ELSE 'Very Weak (<15)' END as adx_tier, COUNT(*) as count, ROUND(AVG(signalQualityScore)::numeric, 1) as avg_score FROM "BlockedSignal" WHERE blockReason = 'QUALITY_SCORE_TOO_LOW' AND adx IS NOT NULL GROUP BY adx_tier ORDER BY MIN(adx) DESC; ``` **Usage Pattern:** 1. Run "Monitor Data Collection" queries weekly during Phase 1 2. Once 10+ blocked signals collected, run "Compare Blocked vs Executed" queries 3. Use "Analyze Specific Patterns" to identify optimization opportunities 4. Full query reference: `BLOCKED_SIGNALS_TRACKING.md` ## Common Pitfalls 1. **DRIFT SDK MEMORY LEAK (CRITICAL - Fixed Nov 15, 2025):** - **Symptom:** JavaScript heap out of memory after 10+ hours runtime, Telegram bot timeouts (60s) - **Root Cause:** Drift SDK accumulates WebSocket subscriptions over time without cleanup - **Manifestation:** Thousands of `accountUnsubscribe error: readyState was 2 (CLOSING)` in logs - **Heap Growth:** Normal ~200MB → 4GB+ after 10 hours → OOM crash - **Solution:** Automatic reconnection every 4 hours (`lib/drift/client.ts`) - **Implementation:** * `scheduleReconnection()` - Sets 4-hour timer after initialization * `reconnect()` - Unsubscribes, resets state, reinitializes Drift client * Timer cleared in `disconnect()` to prevent orphaned timers - **Manual Control:** `/api/drift/reconnect` endpoint (POST with auth, GET for status) - **Impact:** System now self-healing, can run indefinitely without manual restarts - **Monitoring:** Watch for scheduled reconnection logs: `🔄 Scheduled reconnection...` 2. **WRONG RPC PROVIDER (CRITICAL - CATASTROPHIC SYSTEM FAILURE):** - **FINAL CONCLUSION Nov 14, 2025 (INVESTIGATION COMPLETE):** Helius is the ONLY reliable RPC provider for Drift SDK - **Root Cause CONFIRMED:** Alchemy's rate limiting breaks Drift SDK's burst subscription pattern during initialization - **Definitive Proof (Nov 14, 21:14 CET):** * Created diagnostic endpoint `/api/testing/drift-init` * Alchemy: 17-71 subscription errors EVERY init (49 avg over 5 runs), 1644ms avg init time * Helius: 0 subscription errors EVERY init, 800ms avg init time * See `docs/ALCHEMY_RPC_INVESTIGATION_RESULTS.md` for full test data - **Why Alchemy Fails:** * Drift SDK subscribes to 30-50+ accounts simultaneously during init (burst pattern) * Alchemy's CUPS enforcement rate limits these burst requests * Drift SDK does NOT retry failed subscriptions * SDK reports "initialized successfully" but with incomplete subscription set * Subsequent operations fail/timeout due to missing account data * Error message: "Received JSON-RPC error calling `accountSubscribe`" - **Why "Breakthrough" at 14:25 Wasn't Real:** * First Alchemy test had 17-71 subscription errors (random variation) * Sometimes gets lucky with "just enough" subscriptions for one operation * SDK in degraded state from the start, just not obvious until second operation * This explains why first trade "worked" but subsequent trades failed - **Why Helius Works:** * Higher burst tolerance for Solana dApp subscription patterns * Zero subscription errors during init * Faster initialization (800ms vs 1600ms) * Stable for continuous operations - **Technical Reality vs Documentation:** * Alchemy DOES support WebSocket subscriptions (research confirmed) * Alchemy DOES support accountSubscribe method (not -32601 error) * BUT: Rate limit enforcement model incompatible with Drift's burst pattern * Documentation doesn't mention burst subscription limits - **Production Status:** * Using: Helius RPC (https://mainnet.helius-rpc.com/?api-key=...) * Retry logic: 5s exponential backoff for rate limits * System: Stable, TP1/TP2/SL working, Position Manager tracking correctly - **Investigation Closed:** This is DEFINITIVE. Use Helius. Do not use Alchemy. - **Test Yourself:** `curl 'http://localhost:3001/api/testing/drift-init?rpc=alchemy'` 3. **Prisma not generated in Docker:** Must run `npx prisma generate` in Dockerfile BEFORE `npm run build` 4. **Wrong DATABASE_URL:** Container runtime needs `trading-bot-postgres`, Prisma CLI from host needs `localhost:5432` 5. **Symbol format mismatch:** Always normalize with `normalizeTradingViewSymbol()` before calling Drift (applies to ALL endpoints including `/api/trading/close`) 6. **Missing reduce-only flag:** Exit orders without `reduceOnly: true` can accidentally open new positions 7. **Singleton violations:** Creating multiple DriftClient or Position Manager instances causes connection/state issues 8. **Type errors with Prisma:** The Trade type from Prisma is only available AFTER `npx prisma generate` - use explicit types or `// @ts-ignore` carefully 9. **Quality score duplication:** Signal quality calculation exists in BOTH `check-risk` and `execute` endpoints - keep logic synchronized 10. **TP2-as-Runner configuration:** - `takeProfit2SizePercent: 0` means "TP2 activates trailing stop, no position close" - This creates runner of remaining % after TP1 (default 25%, configurable via TAKE_PROFIT_1_SIZE_PERCENT) - `TAKE_PROFIT_2_PERCENT=0.7` sets TP2 trigger price, `TAKE_PROFIT_2_SIZE_PERCENT` should be 0 - Settings UI correctly shows "TP2 activates trailing stop" with dynamic runner % calculation 11. **P&L calculation CRITICAL:** Use actual entry vs exit price calculation, not SDK values: ```typescript const profitPercent = this.calculateProfitPercent(trade.entryPrice, exitPrice, trade.direction) const actualRealizedPnL = (closedSizeUSD * profitPercent) / 100 trade.realizedPnL += actualRealizedPnL // NOT: result.realizedPnL from SDK ``` 12. **Transaction confirmation CRITICAL:** Both `openPosition()` AND `closePosition()` MUST call `connection.confirmTransaction()` after `placePerpOrder()`. Without this, the SDK returns transaction signatures that aren't confirmed on-chain, causing "phantom trades" or "phantom closes". Always check `confirmation.value.err` before proceeding. 13. **Execution order matters:** When creating trades via API endpoints, the order MUST be: 1. Open position + place exit orders 2. Save to database (`createTrade()`) 3. Add to Position Manager (`positionManager.addTrade()`) If Position Manager is added before database save, race conditions occur where monitoring checks before the trade exists in DB. 14. **New trade grace period:** Position Manager skips "external closure" detection for trades <30 seconds old because Drift positions take 5-10 seconds to propagate after opening. Without this grace period, new positions are immediately detected as "closed externally" and cancelled. 15. **Drift minimum position sizes:** Actual minimums differ from documentation: - SOL-PERP: 0.1 SOL (~$5-15 depending on price) - ETH-PERP: 0.01 ETH (~$38-40 at $4000/ETH) - BTC-PERP: 0.0001 BTC (~$10-12 at $100k/BTC) Always calculate: `minOrderSize × currentPrice` must exceed Drift's $4 minimum. Add buffer for price movement. 16. **Exit reason detection bug:** Position Manager was using current price to determine exit reason, but on-chain orders filled at a DIFFERENT price in the past. Now uses `trade.tp1Hit` / `trade.tp2Hit` flags and realized P&L to correctly identify whether TP1, TP2, or SL triggered. Prevents profitable trades being mislabeled as "SL" exits. 17. **Per-symbol cooldown:** Cooldown period is per-symbol, NOT global. ETH trade at 10:00 does NOT block SOL trade at 10:01. Each coin (SOL/ETH/BTC) has independent cooldown timer to avoid missing opportunities on different assets. 18. **Timeframe-aware scoring crucial:** Signal quality thresholds MUST adjust for 5min vs higher timeframes: - 5min charts naturally have lower ADX (12-22 healthy) and ATR (0.2-0.7% healthy) than daily charts - Without timeframe awareness, valid 5min breakouts get blocked as "low quality" - Anti-chop filter applies -20 points for extreme sideways regardless of timeframe - Always pass `timeframe` parameter from TradingView alerts to `scoreSignalQuality()` 19. **Price position chasing causes flip-flops:** Opening longs at 90%+ range or shorts at <10% range reliably loses money: - Database analysis showed overnight flip-flop losses all had price position 9-94% (chasing extremes) - These trades had valid ADX (16-18) but entered at worst possible time - Quality scoring now penalizes -15 to -30 points for range extremes - Prevents rapid reversals when price is already overextended 20. **TradingView ADX minimum for 5min:** Set ADX filter to 15 (not 20+) in TradingView alerts for 5min charts: - Higher timeframes can use ADX 20+ for strong trends - 5min charts need lower threshold to catch valid breakouts - Bot's quality scoring provides second-layer filtering with context-aware metrics - Two-stage filtering (TradingView + bot) prevents both overtrading and missing valid signals 21. **Prisma Decimal type handling:** Raw SQL queries return Prisma `Decimal` objects, not plain numbers: - Use `any` type for numeric fields in `$queryRaw` results: `total_pnl: any` - Convert with `Number()` before returning to frontend: `totalPnL: Number(stat.total_pnl) || 0` - Frontend uses `.toFixed()` which doesn't exist on Decimal objects - Applies to all aggregations: SUM(), AVG(), ROUND() - all return Decimal types - Example: `/api/analytics/version-comparison` converts all numeric fields 22. **ATR-based trailing stop implementation (Nov 11, 2025):** Runner system was using FIXED 0.3% trailing, causing immediate stops: - **Problem:** At $168 SOL, 0.3% = $0.50 wiggle room. Trades with +7-9% MFE exited for losses. - **Fix:** `trailingDistancePercent = (atrAtEntry / currentPrice * 100) × trailingStopAtrMultiplier` - **Config:** `TRAILING_STOP_ATR_MULTIPLIER=1.5`, `MIN=0.25%`, `MAX=0.9%`, `ACTIVATION=0.5%` - **Typical improvement:** 0.45% ATR × 1.5 = 0.675% trail ($1.13 vs $0.50 = 2.26x more room) - **Fallback:** If `atrAtEntry` unavailable, uses clamped legacy `trailingStopPercent` - **Log verification:** Look for "📊 ATR-based trailing: 0.0045 (0.52%) × 1.5x = 0.78%" messages - **ActiveTrade interface:** Must include `atrAtEntry?: number` field for calculation - See `ATR_TRAILING_STOP_FIX.md` for full details and database analysis 23. **CreateTradeParams interface sync:** When adding new database fields to Trade model, MUST update `CreateTradeParams` interface in `lib/database/trades.ts`: - Interface defines what parameters `createTrade()` accepts - Must add new field to interface (e.g., `indicatorVersion?: string`) - Must add field to Prisma create data object in `createTrade()` function - TypeScript build will fail if endpoint passes field not in interface - Example: indicatorVersion tracking required 3-file update (execute route.ts, CreateTradeParams interface, createTrade function) 24. **Position.size tokens vs USD bug (CRITICAL - Fixed Nov 12, 2025):** - **Symptom:** Position Manager detects false TP1 hits, moves SL to breakeven prematurely - **Root Cause:** `lib/drift/client.ts` returns `position.size` as BASE ASSET TOKENS (12.28 SOL), not USD ($1,950) - **Bug:** Comparing tokens (12.28) directly to USD ($1,950) → 12.28 < 1,950 × 0.95 = "99.4% reduction" → FALSE TP1! - **Fix:** Always convert to USD before comparisons: ```typescript // In Position Manager (lines 322, 519, 558, 591) const positionSizeUSD = Math.abs(position.size) * currentPrice // Now compare USD to USD if (positionSizeUSD < trade.currentSize * 0.95) { // Actual 5%+ reduction detected } ``` - **Impact:** Without this fix, TP1 never triggers correctly, SL moves at wrong times, runner system fails - **Where it matters:** Position Manager, any code querying Drift positions - **Database evidence:** Trade showed `tp1Hit: true` when 100% still open, `slMovedToBreakeven: true` prematurely 25. **Leverage display showing global config instead of symbol-specific (Fixed Nov 12, 2025):** - **Symptom:** Telegram notifications showing "⚡ Leverage: 10x" when actual position uses 15x or 20x - **Root Cause:** API response returning `config.leverage` (global default) instead of symbol-specific value - **Fix:** Use actual leverage from `getPositionSizeForSymbol()`: ```typescript // app/api/trading/execute/route.ts (lines 345, 448, 522, 557) const { size, leverage, enabled } = getPositionSizeForSymbol(driftSymbol, config) // Return symbol-specific leverage leverage: leverage, // NOT: config.leverage ``` - **Impact:** Misleading notifications, user confusion about actual position risk - **Hierarchy:** Per-symbol ENV (SOLANA_LEVERAGE) → Market config → Global ENV (LEVERAGE) → Defaults 26. **Indicator version tracking (Nov 12, 2025+):** - Database field `indicatorVersion` tracks which TradingView strategy generated the signal - **v5:** Buy/Sell Signal strategy (pre-Nov 12) - **v6:** HalfTrend + BarColor strategy (Nov 12+) - Used for performance comparison between strategies 27. **Runner stop loss gap - NO protection between TP1 and TP2 (CRITICAL - Fixed Nov 15, 2025):** - **Symptom:** Runner position remained open despite price moving far past stop loss level - **Root Cause:** Position Manager only checked stop loss BEFORE TP1 (line 877: `if (!trade.tp1Hit && this.shouldStopLoss(...)`), creating a protection gap - **Bug sequence:** 1. SHORT opened, TP1 hit at 70% close (runner = 30% remaining) 2. Runner had stop loss at profit-lock level (+0.5%) 3. Price moved past stop loss → NO CHECK RAN (tp1Hit = true, so SL check skipped) 4. Runner exposed to unlimited loss for hours during TP1→TP2 window 5. Made worse by runner below Drift minimum size ($12.79 < $15) = no on-chain orders either - **Impact:** Hours of unprotected runner exposure = potential unlimited loss on 25-30% remaining position - **Code analysis:** ```typescript // Line 877: Stop loss checked ONLY before TP1 if (!trade.tp1Hit && this.shouldStopLoss(currentPrice, trade)) { console.log(`🔴 STOP LOSS: ${trade.symbol}`) await this.executeExit(trade, 100, 'SL', currentPrice) } // Lines 881-895: TP1 and TP2 processing - NO STOP LOSS CHECK // BUG: Runner between TP1-TP2 had ZERO stop loss protection! ``` - **Fix:** Added explicit runner stop loss check at line ~881: ```typescript // 2b. CRITICAL: Runner stop loss (AFTER TP1, BEFORE TP2) // This protects the runner position after TP1 closes main position if (trade.tp1Hit && !trade.tp2Hit && this.shouldStopLoss(currentPrice, trade)) { console.log(`🔴 RUNNER STOP LOSS: ${trade.symbol} at ${profitPercent.toFixed(2)}% (profit lock triggered)`) await this.executeExit(trade, 100, 'SL', currentPrice) return } ``` - **Why undetected:** Runner system relatively new (Nov 11), most trades hit TP2 quickly without price reversals - **Compounded by:** Drift minimum size check ($15 for SOL) prevented on-chain SL orders for small runners - **Log warning:** `⚠️ SL size below market min, skipping on-chain SL` indicates runner has NO on-chain protection - **Lesson:** Every conditional branch in risk management MUST have explicit stop loss checks - never assume "it'll get caught somewhere" 27. **External closure duplicate updates bug (CRITICAL - Fixed Nov 12, 2025):** - **Symptom:** Trades showing 7-8x larger losses than actual ($58 loss when Drift shows $7 loss) - **Root Cause:** Position Manager monitoring loop re-processes external closures multiple times before trade removed from activeTrades Map - **Bug sequence:** 1. Trade closed externally (on-chain SL order fills at -$7.98) 2. Position Manager detects closure: `position === null` 3. Calculates P&L and calls `updateTradeExit()` → -$7.50 in DB 4. **BUT:** Trade still in `activeTrades` Map (removal happens after DB update) 5. Next monitoring loop (2s later) detects closure AGAIN 6. Accumulates P&L: `previouslyRealized (-$7.50) + runnerRealized (-$7.50) = -$15.00` 7. Updates database AGAIN → -$15.00 in DB 8. Repeats 8 times → final -$58.43 (8× the actual loss) - **Fix:** Remove trade from `activeTrades` Map BEFORE database update: ```typescript // BEFORE (BROKEN): await updateTradeExit({ ... }) await this.removeTrade(trade.id) // Too late! Loop already ran again // AFTER (FIXED): this.activeTrades.delete(trade.id) // Remove FIRST await updateTradeExit({ ... }) // Then update DB if (this.activeTrades.size === 0) { this.stopMonitoring() } ``` - **Impact:** Without this fix, every external closure is recorded 5-8 times with compounding P&L - **Root cause:** Async timing issue - `removeTrade()` is async but monitoring loop continues synchronously - **Evidence:** Logs showed 8 consecutive "External closure recorded" messages with increasing P&L - **Line:** `lib/trading/position-manager.ts` line 493 (external closure detection block) - Must update `CreateTradeParams` interface when adding new database fields (see pitfall #23) - Analytics endpoint `/api/analytics/version-comparison` compares v5 vs v6 performance 28. **Signal quality threshold adjustment (Nov 12, 2025):** - **Lowered from 65 → 60** based on data analysis of 161 trades - **Reason:** Score 60-64 tier outperformed higher scores: - 60-64: 2 trades, +$45.78 total, 100% WR, +$22.89 avg - 65-69: 13 trades, +$28.28 total, 53.8% WR, +$2.18 avg - 70-79: 67 trades, +$8.28 total, 44.8% WR (worst performance!) - **Paradox:** Higher quality scores don't correlate with better performance in current data - **Expected impact:** 2-3 additional trades/week, +$46-69 weekly profit potential - **Data collection:** Enables blocked signals at 55-59 range for Phase 2 optimization - **Risk:** Small sample size (2 trades) could be outliers, but downside limited - SQL analysis showed clear pattern: stricter filtering was blocking profitable setups 29. **Database-First Pattern (CRITICAL - Fixed Nov 13, 2025):** - **Symptom:** Positions opened on Drift with NO database record, NO Position Manager tracking, NO TP/SL protection - **Root Cause:** Execute endpoint saved to database AFTER adding to Position Manager, with silent error catch - **Bug sequence:** 1. TradingView signal → `/api/trading/execute` 2. Position opened on Drift ✅ 3. Position Manager tracking added ✅ 4. Database save attempted ❌ (fails silently) 5. API returns success to user ❌ 6. Container restarts → Position Manager loses in-memory state ❌ 7. Result: Unprotected position with no monitoring or TP/SL orders - **Fix:** Database-first execution order in `app/api/trading/execute/route.ts`: ```typescript // CRITICAL: Save to database FIRST before adding to Position Manager try { await createTrade({...}) } catch (dbError) { console.error('❌ CRITICAL: Failed to save trade to database:', dbError) return NextResponse.json({ success: false, error: 'Database save failed - position unprotected', message: `Position opened on Drift but database save failed. CLOSE POSITION MANUALLY IMMEDIATELY. Transaction: ${openResult.transactionSignature}`, }, { status: 500 }) } // ONLY add to Position Manager if database save succeeded await positionManager.addTrade(activeTrade) ``` - **Impact:** Without this fix, ANY database failure creates unprotected positions - **Verification:** Test trade cmhxj8qxl0000od076m21l58z (Nov 13) confirmed fix working - **Documentation:** See `CRITICAL_INCIDENT_UNPROTECTED_POSITION.md` for full incident report - **Rule:** Database persistence ALWAYS comes before in-memory state updates 30. **DNS retry logic (Nov 13, 2025):** - **Problem:** Trading bot fails with "fetch failed" errors when DNS resolution temporarily fails for `mainnet.helius-rpc.com` - **Impact:** n8n workflow failures, missed trades, container restart failures - **Root Cause:** `EAI_AGAIN` errors are transient DNS issues that resolve in seconds, but bot treated them as permanent failures - **Fix:** Automatic retry in `lib/drift/client.ts` - `retryOperation()` wrapper: ```typescript // Detects transient errors: fetch failed, EAI_AGAIN, ENOTFOUND, ETIMEDOUT // Retries up to 3 times with 2s delay between attempts (DNS-specific, separate from rate limit retries) // Fails fast on non-transient errors (auth, config, permanent network issues) await this.retryOperation(async () => { // Initialize Drift SDK, subscribe, get user account }, 3, 2000, 'Drift initialization') ``` - **Success logs:** `⚠️ Drift initialization failed (attempt 1/3): fetch failed` → `⏳ Retrying in 2000ms...` → `✅ Drift service initialized successfully` - **Impact:** 99% of transient DNS failures now auto-recover, preventing missed trades - **Note:** DNS retries use 2s delays (fast recovery), rate limit retries use 5s delays (RPC cooldown) - **Documentation:** See `docs/DNS_RETRY_LOGIC.md` for monitoring queries and metrics 31. **Declaring fixes "working" before deployment (CRITICAL - Nov 13, 2025):** - **Symptom:** AI says "position is protected" or "fix is deployed" when container still running old code - **Root Cause:** Conflating "code committed to git" with "code running in production" - **Real Incident:** Database-first fix committed 15:56, declared "working" at 19:42, but container started 15:06 (old code) - **Result:** Unprotected position opened, database save failed silently, Position Manager never tracked it - **Financial Impact:** User discovered $250+ unprotected position 3.5 hours after opening - **Verification Required:** ```bash # ALWAYS check before declaring fix deployed: docker logs trading-bot-v4 | grep "Server starting" | head -1 # Compare container start time to git commit timestamp # If container older: FIX NOT DEPLOYED ``` - **Rule:** NEVER say "fixed", "working", "protected", or "deployed" without verifying container restart timestamp - **Impact:** This is a REAL MONEY system - premature declarations cause financial losses - **Documentation:** Added mandatory deployment verification to VERIFICATION MANDATE section 32. **Phantom trade notification workflow breaks (Nov 14, 2025):** - **Symptom:** Phantom trade detected, position opened on Drift, but n8n workflow stops with HTTP 500 error. User NOT notified. - **Root Cause:** Execute endpoint returned HTTP 500 when phantom detected, causing n8n chain to halt before Telegram notification - **Problem:** Unmonitored phantom position on exchange while user is asleep/away = unlimited risk exposure - **Fix:** Auto-close phantom trades immediately + return HTTP 200 with warning (allows n8n to continue) ```typescript // When phantom detected in app/api/trading/execute/route.ts: // 1. Immediately close position via closePosition() // 2. Save to database (create trade + update with exit info) // 3. Return HTTP 200 with full notification message in response // 4. n8n workflow continues to Telegram notification step ``` - **Response format change:** `{ success: true, warning: 'Phantom trade detected and auto-closed', isPhantom: true, message: '[Full notification text]', phantomDetails: {...} }` - **Why auto-close:** User can't always respond (sleeping, no phone, traveling). Better to exit with small loss/gain than leave unmonitored position exposed. - **Impact:** Protects user from unlimited risk during unavailable hours. Phantom trades are rare edge cases (oracle issues, exchange rejections). - **Database tracking:** `status='phantom'`, `exitReason='manual'`, enables analysis of phantom frequency and patterns 33. **Wrong entry price after orphaned position restoration (CRITICAL - Fixed Nov 15, 2025):** - **Symptom:** Position Manager tracking SHORT at $141.51 entry, but Drift UI shows $141.31 actual entry - **Root Cause:** Startup validation restored orphaned position but used OLD database entry price instead of querying Drift for real value - **Bug sequence:** 1. Position opened at $141.317 (per Drift order history) 2. TP1 closed 70% at $140.942 3. Database incorrectly saved entry as $141.508 (maybe averaged or from previous position) 4. Container restart → startup validation found position on Drift 5. Reopened trade in DB but used stale `trade.entryPrice` from database 6. Position Manager tracked with wrong entry ($141.51 vs actual $141.31) 7. Stop loss calculated from wrong base: $141.08 instead of $140.89 - **Impact:** 0.14% difference ($0.20/SOL) in SL placement - could mean difference between small profit and small loss - **Fix:** Query Drift SDK for actual entry price during orphaned position restoration ```typescript // In lib/startup/init-position-manager.ts (line 121-144): // When reopening closed trade found on Drift: const currentPrice = await driftService.getOraclePrice(marketConfig.driftMarketIndex) const positionSizeUSD = position.size * currentPrice await prisma.trade.update({ where: { id: trade.id }, data: { status: 'open', exitReason: null, entryPrice: position.entryPrice, // CRITICAL: Use Drift's actual entry price positionSizeUSD: positionSizeUSD, // Update to current size (runner after TP1) } }) ``` - **Drift SDK returns real entry:** `position.entryPrice` from `getPosition()` calculates from on-chain data (quoteAssetAmount / baseAssetAmount) - **Future-proofed:** All orphaned position restorations now use authoritative Drift entry price, not stale DB value - **Manual fix required once:** Had to manually UPDATE database for existing position, then restart container - **Lesson:** Always prefer on-chain data over cached database values for critical trading parameters 34. **Runner stop loss gap - NO protection between TP1 and TP2 (CRITICAL - Fixed Nov 15, 2025):** - **Symptom:** Runner position remained open despite price moving far above stop loss level - **Root Cause:** Position Manager only checked stop loss BEFORE TP1 hit (line 693) OR AFTER TP2 hit (line 835), creating a gap - **Bug sequence:** 1. SHORT opened at $141.317, TP1 hit at $140.942 (70% closed) 2. Runner (30% remaining, $12.70) had stop loss at $140.89 (profit lock) 3. Price rose to $141.98 (way above $140.89 SL) → NO STOP LOSS CHECK 4. Position exposed to unlimited loss for hours during TP1→TP2 window 5. User manually checked: "runner close did not work. still open and the price is above 141,98" - **Impact:** Hours of unprotected runner exposure = potential unlimited loss on 25-30% remaining position - **Code analysis:** ```typescript // Line 693: Stop loss checked ONLY before TP1 if (!trade.tp1Hit && this.shouldStopLoss(currentPrice, trade)) { console.log(`🔴 STOP LOSS: ${trade.symbol}`) await this.executeExit(trade, 100, 'SL', currentPrice) } // Lines 706-831: TP1 and TP2 processing - NO STOP LOSS CHECK // Line 835: Stop loss checked ONLY after TP2 if (trade.tp2Hit && this.config.useTrailingStop && this.shouldStopLoss(currentPrice, trade)) { console.log(`🔴 TRAILING STOP: ${trade.symbol}`) await this.executeExit(trade, 100, 'SL', currentPrice) } // BUG: Runner between TP1-TP2 has ZERO stop loss protection! ``` - **Fix:** Added explicit runner stop loss check at line ~795: ```typescript // CRITICAL: Check stop loss for runner (after TP1, before TP2) if (trade.tp1Hit && !trade.tp2Hit && this.shouldStopLoss(currentPrice, trade)) { console.log(`🔴 RUNNER STOP LOSS: ${trade.symbol} at ${profitPercent.toFixed(2)}% (profit lock triggered)`) await this.executeExit(trade, 100, 'SL', currentPrice) return } ``` - **Live verification (Nov 15, 22:03):** Runner SL triggered successfully after deployment, closed with +$2.94 profit - **Rate limit issue:** Hit 429 storm during close (20+ attempts over several minutes), but eventually succeeded - **Database evidence:** Trade shows `exitReason='SL'`, proving runner stop loss triggered correctly - **Why undetected:** Runner system relatively new (Nov 11), most trades hit TP2 quickly without price reversals - **Lesson:** Every conditional branch in risk management MUST have explicit stop loss checks - never assume "it'll get caught somewhere" 38. **Analytics dashboard showing original position size instead of current runner size (Fixed Nov 15, 2025):** - **Symptom:** Analytics page displays $42.54 when actual runner is $12.59 after TP1 - **Root Cause:** `/api/analytics/last-trade` returns `trade.positionSizeUSD` (original size), not runner size - **Database structure:** No separate `currentSize` column - stored in `configSnapshot.positionManagerState.currentSize` - **Impact:** User sees misleading exposure information on dashboard - **Fix:** Modified API to check Position Manager state for open positions: ```typescript // In app/api/analytics/last-trade/route.ts const configSnapshot = trade.configSnapshot as any const positionManagerState = configSnapshot?.positionManagerState const currentSize = positionManagerState?.currentSize // Use currentSize for open positions (after TP1), fallback to original const displaySize = trade.exitReason === null && currentSize ? currentSize : trade.positionSizeUSD const formattedTrade = { // ... positionSizeUSD: displaySize, // Shows runner size for open positions // ... } ``` - **Behavior:** Open positions show current runner size, closed positions show original size - **Benefits:** Accurate exposure visibility, correct risk assessment on dashboard - **No container restart needed:** API-only change, live immediately after deployment 34. **Flip-flop price context using wrong data (CRITICAL - Fixed Nov 14, 2025):** - **Symptom:** Flip-flop detection showing "100% price move" when actual movement was 0.2%, allowing trades that should be blocked - **Root Cause:** `currentPrice` parameter not available in check-risk endpoint (trade hasn't opened yet), so calculation used undefined/zero - **Real incident:** Nov 14, 06:05 CET - SHORT allowed with 0.2% flip-flop, lost -$1.56 in 5 minutes - **Bug sequence:** 1. LONG opened at $143.86 (06:00) 2. SHORT signal 4min later at $143.58 (0.2% move) 3. Flip-flop check: `(undefined - 143.86) / 143.86 * 100` = garbage → showed "100%" 4. System thought it was reversal → allowed trade 5. Should have been blocked as tight-range chop - **Fix:** Two-part fix in commits 77a9437 and 795026a: ```typescript // In app/api/trading/check-risk/route.ts: // Get current price from Pyth BEFORE quality scoring const priceMonitor = getPythPriceMonitor() const latestPrice = priceMonitor.getCachedPrice(body.symbol) const currentPrice = latestPrice?.price || body.currentPrice // In lib/trading/signal-quality.ts: // Validate price data exists before calculation if (!params.currentPrice || params.currentPrice === 0) { // No current price available - apply penalty (conservative) console.warn(`⚠️ Flip-flop check: No currentPrice available, applying penalty`) frequencyPenalties.flipFlop = -25 score -= 25 } else { const priceChangePercent = Math.abs( (params.currentPrice - recentSignals.oppositeDirectionPrice) / recentSignals.oppositeDirectionPrice * 100 ) console.log(`🔍 Flip-flop price check: $${recentSignals.oppositeDirectionPrice.toFixed(2)} → $${params.currentPrice.toFixed(2)} = ${priceChangePercent.toFixed(2)}%`) // Apply penalty only if < 2% move } ``` - **Impact:** Without this fix, flip-flop detection is useless - blocks reversals, allows chop - **Lesson:** Always validate input data for financial calculations, especially when data might not exist yet - **Monitoring:** Watch logs for "🔍 Flip-flop price check: $X → $Y = Z%" to verify correct calculations 35. **Phantom trades need exitReason for cleanup (CRITICAL - Fixed Nov 15, 2025):** - **Symptom:** Position Manager keeps restoring phantom trade on every restart, triggers false runner stop loss alerts - **Root Cause:** Phantom auto-closure sets `status='phantom'` but leaves `exitReason=NULL` - **Bug:** Startup validator checks `exitReason !== null` (line 122 of init-position-manager.ts), ignores status field - **Consequence:** Phantom trade with exitReason=NULL treated as "open" and restored to Position Manager - **Real incident:** Nov 14 phantom trade (cmhy6xul20067nx077agh260n) caused 232% size mismatch, hundreds of false "🔴 RUNNER STOP LOSS" alerts - **Fix:** When auto-closing phantom trades, MUST set exitReason: ```typescript // In app/api/trading/execute/route.ts (phantom detection): await updateTradeExit({ tradeId: trade.id, exitPrice: currentPrice, exitReason: 'manual', // CRITICAL: Must set exitReason for cleanup realizedPnL: actualPnL, status: 'phantom' }) ``` - **Manual cleanup:** If phantom already exists: `UPDATE "Trade" SET "exitReason" = 'manual' WHERE status = 'phantom' AND "exitReason" IS NULL` - **Impact:** Without exitReason, phantom trades create ghost positions that trigger false alerts and pollute monitoring - **Verification:** After restart, check logs for "Found 0 open trades" (not "Found 1 open trades to restore") - **Lesson:** status field is for classification, exitReason is for lifecycle management - both must be set on closure 36. **closePosition() missing retry logic causes rate limit storm (CRITICAL - Fixed Nov 15, 2025):** - **Symptom:** Position Manager tries to close trade, gets 429 error, retries EVERY 2 SECONDS → 100+ failed attempts → rate limit exhaustion - **Root Cause:** `placeExitOrders()` has `retryWithBackoff()` wrapper (Nov 14 fix), but `closePosition()` did NOT - **Real incident:** Trade cmi0il8l30000r607l8aec701 (Nov 15, 16:49 CET) 1. Position Manager tried to close (SL or TP trigger) 2. closePosition() called raw `placePerpOrder()` → 429 error 3. executeExit() caught 429, returned early (line 935-940) 4. Position Manager kept monitoring, retried close EVERY 2 seconds 5. Logs show 100+ "❌ Failed to close position: 429" + "⚠️ Rate limited while closing SOL-PERP" 6. Meanwhile: On-chain TP2 limit order filled (unaffected by SDK rate limits) 7. External closure detected, DB updated 8 TIMES: $0.14 → $0.20 → $0.26 → ... → $0.51 8. Container eventually restarted (likely from rate limit exhaustion) - **Why duplicate updates:** Common Pitfall #27 fix (remove from Map before DB update) works UNLESS rate limits cause tons of retries before external closure detection - **Impact:** User saw $0.51 profit in DB, $0.03 on Drift UI (8× compounding vs 1 actual fill) - **Fix:** Wrapped closePosition() with retryWithBackoff() in lib/drift/orders.ts: ```typescript // Line ~567 (BEFORE): const txSig = await driftClient.placePerpOrder(orderParams) // Line ~567 (AFTER): const txSig = await retryWithBackoff(async () => { return await driftClient.placePerpOrder(orderParams) }, 3, 8000) // 8s base delay, 3 max retries (8s → 16s → 32s) ``` - **Behavior now:** 3 SDK retries over 56s (8+16+32) + Position Manager natural retry on next monitoring cycle = robust without spam - **RPC load reduction:** 30-50× fewer requests during close operations (3 retries vs 100+ attempts) - **Verification:** Container restarted 18:05 CET Nov 15, code deployed - **Lesson:** EVERY SDK order operation (open, close, cancel, place) MUST have retry wrapper - Position Manager monitoring creates infinite retry loop without it - **Root Cause:** Phantom auto-closure sets `status='phantom'` but leaves `exitReason=NULL` - **Bug:** Startup validator checks `exitReason !== null` (line 122 of init-position-manager.ts), ignores status field - **Consequence:** Phantom trade with exitReason=NULL treated as "open" and restored to Position Manager - **Real incident:** Nov 14 phantom trade (cmhy6xul20067nx077agh260n) caused 232% size mismatch, hundreds of false "🔴 RUNNER STOP LOSS" alerts - **Fix:** When auto-closing phantom trades, MUST set exitReason: ```typescript // In app/api/trading/execute/route.ts (phantom detection): await updateTradeExit({ tradeId: trade.id, exitPrice: currentPrice, exitReason: 'manual', // CRITICAL: Must set exitReason for cleanup realizedPnL: actualPnL, status: 'phantom' }) ``` - **Manual cleanup:** If phantom already exists: `UPDATE "Trade" SET "exitReason" = 'manual' WHERE status = 'phantom' AND "exitReason" IS NULL` - **Impact:** Without exitReason, phantom trades create ghost positions that trigger false alerts and pollute monitoring - **Verification:** After restart, check logs for "Found 0 open trades" (not "Found 1 open trades to restore") - **Lesson:** status field is for classification, exitReason is for lifecycle management - both must be set on closure 37. **Ghost position accumulation from failed DB updates (CRITICAL - Fixed Nov 15, 2025):** - **Symptom:** Position Manager tracking 4+ positions simultaneously when database shows only 1 open trade - **Root Cause:** Database has `exitReason IS NULL` for positions actually closed on Drift - **Impact:** Rate limit storms (4 positions × monitoring × order updates = 100+ RPC calls/second) - **Bug sequence:** 1. Position closed externally (on-chain TP/SL order fills) 2. Position Manager attempts database update but fails silently 3. Trade remains in database with `exitReason IS NULL` 4. Container restart → Position Manager restores "open" trade from DB 5. Position doesn't exist on Drift but is tracked in memory = ghost position 6. Accumulates over time: 1 ghost → 2 ghosts → 4+ ghosts 7. Each ghost triggers monitoring, order updates, price checks 8. RPC rate limit exhaustion → 429 errors → system instability - **Real incidents:** * Nov 14: Untracked 0.09 SOL position with no TP/SL protection * Nov 15 19:01: Position Manager tracking 4+ ghosts, massive rate limiting, "vanishing orders" * After cleanup: 4+ ghosts → 1 actual position, system stable - **Why manual restarts worked:** Forced Position Manager to re-query Drift, but didn't prevent recurrence - **Solution:** Periodic Drift position validation (Nov 15, 2025) ```typescript // In lib/trading/position-manager.ts: // Schedule validation every 5 minutes private scheduleValidation(): void { this.validationInterval = setInterval(async () => { await this.validatePositions() }, 5 * 60 * 1000) } // Validate tracked positions against Drift reality private async validatePositions(): Promise { for (const [tradeId, trade] of this.activeTrades) { const position = await driftService.getPosition(marketConfig.driftMarketIndex) // Ghost detected: tracked but missing on Drift if (!position || Math.abs(position.size) < 0.01) { console.log(`🔴 Ghost position detected: ${trade.symbol}`) await this.handleExternalClosure(trade, 'Ghost position cleanup') } } } // Reusable ghost cleanup method private async handleExternalClosure(trade: ActiveTrade, reason: string): Promise { // Remove from monitoring FIRST (prevent race conditions) this.activeTrades.delete(trade.id) // Update database with estimated P&L await updateTradeExit({ positionId: trade.positionId, exitPrice: trade.lastPrice, exitReason: 'manual', // Ghost closures = manual realizedPnL: estimatedPnL, exitOrderTx: reason, // Store cleanup reason ... }) if (this.activeTrades.size === 0) { this.stopMonitoring() } } ``` - **Behavior:** Auto-detects and cleans ghosts every 5 minutes, no manual intervention - **RPC overhead:** Minimal (1 check per 5 min per position = ~288 calls/day for 1 position) - **Benefits:** * Self-healing system prevents ghost accumulation * Eliminates rate limit storms from ghost management * No more manual container restarts needed * Addresses root cause (state management) not symptom (rate limits) - **Logs:** `🔍 Scheduled position validation every 5 minutes` on startup - **Monitoring:** `🔴 Ghost position detected` + `✅ Ghost position cleaned up` in logs - **Verification:** Container restart shows 1 position, not 4+ like before - **Why paid RPC doesn't fix this:** Ghost positions are state management bug, not capacity issue - **Lesson:** Periodic validation of in-memory state against authoritative source prevents state drift 39. **Settings UI permission error - .env file not writable by container user (CRITICAL - Fixed Nov 15, 2025):** - **Symptom:** Settings UI save fails with "Failed to save new settings" error - **Root Cause:** .env file on host owned by root:root, nextjs user (UID 1001) inside container has read-only access - **Impact:** Users cannot adjust ANY configuration via settings UI (position size, leverage, TP/SL levels, etc.) - **Error message:** `EACCES: permission denied, open '/app/.env'` (errno -13, syscall 'open') - **User escalation:** "thats a major flaw. THIS NEEDS TO WORK." - **Why it happens:** 1. Docker mounts .env file from host: `./.env:/app/.env` (docker-compose.yml line 62) 2. Mounted files retain host ownership (root:root on host = root:root in container) 3. Container runs as nextjs user (UID 1001) for security 4. Settings API attempts `fs.writeFileSync('/app/.env')` → permission denied - **Attempted fix (FAILED):** `docker exec trading-bot-v4 chown nextjs:nodejs /app/.env` * Error: "Operation not permitted" - cannot change ownership on mounted files from inside container - **Correct fix:** Change ownership on HOST before container starts ```bash # On host as root chown 1001:1001 /home/icke/traderv4/.env chmod 644 /home/icke/traderv4/.env # Restart container to pick up new permissions docker compose restart trading-bot # Verify inside container docker exec trading-bot-v4 ls -la /app/.env # Should show: -rw-r--r-- 1 nextjs nodejs ``` - **Why UID 1001:** Matches nextjs user created in Dockerfile: ```dockerfile RUN addgroup --system --gid 1001 nodejs && \ adduser --system --uid 1001 nextjs ``` - **Verification:** Settings UI now saves successfully, .env file updated with new values - **Impact:** Restores full settings UI functionality - users can adjust position sizing, leverage, TP/SL percentages - **Alternative solution (NOT used):** Copy .env during Docker build with `COPY --chown=nextjs:nodejs`, but this breaks runtime config updates - **Lesson:** Docker volume mounts retain host ownership - must plan for writability by setting host file ownership to match container user UID 40. **Ghost position death spiral from skipped validation (CRITICAL - Fixed Nov 15, 2025):** - **Symptom:** Telegram /status shows 2 open positions when database shows all closed, massive rate limit storms (100+ RPC calls/minute) - **Root Cause:** Periodic validation (every 5min) SKIPPED when Drift service rate-limited: `⏳ Drift service not ready, skipping validation` - **Death Spiral:** Ghosts → rate limits → validation skipped → more rate limits → more ghosts - **Impact:** System unusable, requires manual container restart, user can't be away from laptop - **User Requirement:** "bot has to work all the time especially when i am not on my laptop" - MUST be fully autonomous - **Real Incident (Nov 15, 2025):** * Position Manager tracking 2 ghost positions * Both positions closed on Drift but still in memory * Trying to close non-existent positions every 2 seconds * Rate limit exhaustion prevented validation from running * Only solution was container restart (not autonomous) - **Solution: 3-layer protection system** ```typescript // LAYER 1: Database-based age check (doesn't require RPC) private async cleanupStalePositions(): Promise { const sixHoursAgo = Date.now() - (6 * 60 * 60 * 1000) for (const [tradeId, trade] of this.activeTrades) { if (trade.entryTime < sixHoursAgo) { console.log(`🔴 STALE GHOST DETECTED: ${trade.symbol} (age: ${hours}h)`) await this.handleExternalClosure(trade, 'Stale position cleanup (>6h old)') } } } // LAYER 2: Death spiral detector in executeExit() if (errorMsg.includes('429')) { if (trade.priceCheckCount > 20) { // 20+ failed close attempts (40+ seconds) console.log(`🔴 DEATH SPIRAL DETECTED: ${trade.symbol}`) await this.handleExternalClosure(trade, 'Death spiral prevention') return // Force remove from monitoring } } // LAYER 3: Ghost check during normal monitoring (every 20 price updates) if (trade.priceCheckCount % 20 === 0) { const position = await driftService.getPosition(marketConfig.driftMarketIndex) if (!position || Math.abs(position.size) < 0.01) { console.log(`🔴 GHOST DETECTED in monitoring loop`) await this.handleExternalClosure(trade, 'Ghost detected during monitoring') return } } ``` - **Key Changes:** * validatePositions() now runs database cleanup FIRST (Layer 1) before Drift RPC checks * Changed skip message from "skipping validation" to "using database-only validation" * Layer 1 ALWAYS runs (no RPC required) - prevents long-term ghost accumulation (>6h) * Layer 2 breaks death spirals within 40 seconds of detection * Layer 3 catches ghosts quickly during normal monitoring (every 40s vs 5min) - **Impact:** * System now self-healing - no manual intervention needed * Ghost positions cleaned within 40-360 seconds (depending on layer) * Works even during severe rate limiting (Layer 1 doesn't need RPC) * Telegram /status always accurate * User can be away - bot handles itself autonomously - **Verification:** Container restart + new code = no more ghost accumulation possible - **Lesson:** Critical validation logic must NEVER skip during error conditions - use fallback methods that don't require the failing resource ## File Conventions - **API routes:** `app/api/[feature]/[action]/route.ts` (Next.js 15 App Router) - **Services:** `lib/[service]/[module].ts` (drift, pyth, trading, database) - **Config:** Single source in `config/trading.ts` with env merging - **Types:** Define interfaces in same file as implementation (not separate types directory) - **Console logs:** Use emojis for visual scanning: 🎯 🚀 ✅ ❌ 💰 📊 🛡️ ## Re-Entry Analytics System (Phase 1) **Purpose:** Validate manual Telegram trades using fresh TradingView data + recent performance analysis **Components:** 1. **Market Data Cache** (`lib/trading/market-data-cache.ts`) - Singleton service storing TradingView metrics - 5-minute expiry on cached data - Tracks: ATR, ADX, RSI, volume ratio, price position, timeframe 2. **Market Data Webhook** (`app/api/trading/market-data/route.ts`) - Receives TradingView alerts every 1-5 minutes - POST: Updates cache with fresh metrics - GET: View cached data (debugging) 3. **Re-Entry Check Endpoint** (`app/api/analytics/reentry-check/route.ts`) - Validates manual trade requests - Uses fresh TradingView data if available (<5min old) - Falls back to historical metrics from last trade - Scores signal quality + applies performance modifiers: - **-20 points** if last 3 trades lost money (avgPnL < -5%) - **+10 points** if last 3 trades won (avgPnL > +5%, WR >= 66%) - **-5 points** for stale data, **-10 points** for no data - Minimum score: 55 (vs 60 for new signals) 4. **Auto-Caching** (`app/api/trading/execute/route.ts`) - Every trade signal from TradingView auto-caches metrics - Ensures fresh data available for manual re-entries 5. **Telegram Integration** (`telegram_command_bot.py`) - Calls `/api/analytics/reentry-check` before executing manual trades - Shows data freshness ("✅ FRESH 23s old" vs "⚠️ Historical") - Blocks low-quality re-entries unless `--force` flag used - Fail-open: Proceeds if analytics check fails **User Flow:** ``` User: "long sol" ↓ Check cache for SOL-PERP ↓ Fresh data? → Use real TradingView metrics ↓ Stale/missing? → Use historical + penalty ↓ Score quality + recent performance ↓ Score >= 55? → Execute ↓ Score < 55? → Block (unless --force) ``` **TradingView Setup:** Create alerts that fire every 1-5 minutes with this webhook message: ```json { "action": "market_data", "symbol": "{{ticker}}", "timeframe": "{{interval}}", "atr": {{ta.atr(14)}}, "adx": {{ta.dmi(14, 14)}}, "rsi": {{ta.rsi(14)}}, "volumeRatio": {{volume / ta.sma(volume, 20)}}, "pricePosition": {{(close - ta.lowest(low, 100)) / (ta.highest(high, 100) - ta.lowest(low, 100)) * 100}}, "currentPrice": {{close}} } ``` Webhook URL: `https://your-domain.com/api/trading/market-data` ## Per-Symbol Trading Controls **Purpose:** Independent enable/disable toggles and position sizing for SOL and ETH to support different trading strategies (e.g., ETH for data collection at minimal size, SOL for profit generation). **Configuration Priority:** 1. **Per-symbol ENV vars** (highest priority) - `SOLANA_ENABLED`, `SOLANA_POSITION_SIZE`, `SOLANA_LEVERAGE` - `ETHEREUM_ENABLED`, `ETHEREUM_POSITION_SIZE`, `ETHEREUM_LEVERAGE` 2. **Market-specific config** (from `MARKET_CONFIGS` in config/trading.ts) 3. **Global ENV vars** (fallback for BTC and other symbols) - `MAX_POSITION_SIZE_USD`, `LEVERAGE` 4. **Default config** (lowest priority) **Settings UI:** `app/settings/page.tsx` has dedicated sections: - 💎 Solana section: Toggle + position size + leverage + risk calculator - ⚡ Ethereum section: Toggle + position size + leverage + risk calculator - 💰 Global fallback: For BTC-PERP and future symbols **Example usage:** ```typescript // In execute/test endpoints const { size, leverage, enabled } = getPositionSizeForSymbol(driftSymbol, config) if (!enabled) { return NextResponse.json({ success: false, error: 'Symbol trading disabled' }, { status: 400 }) } ``` **Test buttons:** Settings UI has symbol-specific test buttons: - 💎 Test SOL LONG/SHORT (disabled when `SOLANA_ENABLED=false`) - ⚡ Test ETH LONG/SHORT (disabled when `ETHEREUM_ENABLED=false`) ## When Making Changes 1. **Adding new config:** Update DEFAULT_TRADING_CONFIG + getConfigFromEnv() + .env file 2. **Adding database fields:** Update prisma/schema.prisma → `npx prisma migrate dev` → `npx prisma generate` → rebuild Docker 3. **Changing order logic:** Test with DRY_RUN=true first, use small position sizes ($10) 4. **API endpoint changes:** Update both endpoint + corresponding n8n workflow JSON (Check Risk and Execute Trade nodes) 5. **Docker changes:** Rebuild with `docker compose build trading-bot` then restart container 6. **Modifying quality score logic:** Update BOTH `/api/trading/check-risk` and `/api/trading/execute` endpoints, ensure timeframe-aware thresholds are synchronized 7. **Exit strategy changes:** Modify Position Manager logic + update on-chain order placement in `placeExitOrders()` 8. **TradingView alert changes:** Ensure alerts pass `timeframe` field (e.g., `"timeframe": "5"`) to enable proper signal quality scoring 9. **Position Manager changes:** ALWAYS execute test trade after deployment - Use `/api/trading/test` endpoint or Telegram `long sol --force` - Monitor `docker logs -f trading-bot-v4` for full cycle - Verify TP1 hit → 75% close → SL moved to breakeven - SQL: Check `tp1Hit`, `slMovedToBreakeven`, `currentSize` in Trade table - Compare: Position Manager logs vs actual Drift position size 10. **Calculation changes:** Add verbose logging and verify with SQL - Log every intermediate step, especially unit conversions - Never assume SDK data format - log raw values to verify - SQL query with manual calculation to compare results - Test boundary cases: 0%, 100%, min/max values 11. **DEPLOYMENT VERIFICATION (MANDATORY):** Before declaring ANY fix working: - Check container start time vs commit timestamp - If container older than commit: CODE NOT DEPLOYED - Restart container and verify new code is running - Never say "fixed" or "protected" without deployment confirmation - This is a REAL MONEY system - unverified fixes cause losses 12. **GIT COMMIT AND PUSH (MANDATORY):** After completing ANY feature, fix, or significant change: - ALWAYS commit changes with descriptive message - ALWAYS push to remote repository - User should NOT have to ask for this - it's part of completion - Commit message format: ```bash git add -A git commit -m "type: brief description - Bullet point details - Files changed - Why the change was needed " git push ``` - Types: `feat:` (feature), `fix:` (bug fix), `docs:` (documentation), `refactor:` (code restructure) - This is NOT optional - code exists only when committed and pushed 13. **NEXTCLOUD DECK SYNC (MANDATORY):** After completing phases or making significant roadmap progress: - Update roadmap markdown files with new status (🔄 IN PROGRESS, ✅ COMPLETE, 🔜 NEXT) - Run sync to update Deck cards: `python3 scripts/sync-roadmap-to-deck.py --init` - Move cards between stacks in Nextcloud Deck UI to reflect progress visually - Backlog (📥) → Planning (📋) → In Progress (🚀) → Complete (✅) - Keep Deck in sync with actual work - it's the visual roadmap tracker - Documentation: `docs/NEXTCLOUD_DECK_SYNC.md` 14. **UPDATE COPILOT-INSTRUCTIONS.MD (MANDATORY):** After implementing ANY significant feature or system change: - Document new database fields and their purpose - Add filtering requirements (e.g., manual vs TradingView trades) - Update "Important fields" sections with new schema changes - Add new API endpoints to the architecture overview - Document data integrity requirements (what must be excluded from analysis) - Add SQL query patterns for common operations - Update "When Making Changes" section with new patterns learned - Create reference docs in `docs/` for complex features (e.g., `MANUAL_TRADE_FILTERING.md`) - **WHY:** Future AI agents need complete context to maintain data integrity and avoid breaking analysis - **EXAMPLES:** signalSource field for filtering, MAE/MFE tracking, phantom trade detection ## Development Roadmap **Current Status (Nov 14, 2025):** - **168 trades executed** with quality scores and MAE/MFE tracking - **Capital:** $97.55 USDC at 100% health (zero debt, all USDC collateral) - **Leverage:** 15x SOL (reduced from 20x for safer liquidation cushion) - **Three active optimization initiatives** in data collection phase: 1. **Signal Quality:** 0/20 blocked signals collected → need 10-20 for analysis 2. **Position Scaling:** 161 v5 trades, collecting v6 data → need 50+ v6 trades 3. **ATR-based TP:** 1/50 trades with ATR data → need 50 for validation - **Expected combined impact:** 35-40% P&L improvement when all three optimizations complete - **Master roadmap:** See `OPTIMIZATION_MASTER_ROADMAP.md` for consolidated view See `SIGNAL_QUALITY_OPTIMIZATION_ROADMAP.md` for systematic signal quality improvements: - **Phase 1 (🔄 IN PROGRESS):** Collect 10-20 blocked signals with quality scores (1-2 weeks) - **Phase 2 (🔜 NEXT):** Analyze patterns and make data-driven threshold decisions - **Phase 3 (🎯 FUTURE):** Implement dual-threshold system or other optimizations based on data - **Phase 4 (🤖 FUTURE):** Automated price analysis for blocked signals - **Phase 5 (🧠 DISTANT):** ML-based scoring weight optimization See `POSITION_SCALING_ROADMAP.md` for planned position management optimizations: - **Phase 1 (✅ COMPLETE):** Collect data with quality scores (20-50 trades needed) - **Phase 2:** ATR-based dynamic targets (adapt to volatility) - **Phase 3:** Signal quality-based scaling (high quality = larger runners) - **Phase 4:** Direction-based optimization (shorts vs longs have different performance) - **Phase 5 (✅ COMPLETE):** TP2-as-runner system implemented - configurable runner (default 25%, adjustable via TAKE_PROFIT_1_SIZE_PERCENT) with ATR-based trailing stop - **Phase 6:** ML-based exit prediction (future) **Recent Implementation:** TP2-as-runner system provides 5x larger runner (default 25% vs old 5%) for better profit capture on extended moves. When TP2 price is hit, trailing stop activates on full remaining position instead of closing partial amount. Runner size is configurable (100% - TP1 close %). **Blocked Signals Tracking (Nov 11, 2025):** System now automatically saves all blocked signals to database for data-driven optimization. See `BLOCKED_SIGNALS_TRACKING.md` for SQL queries and analysis workflows. **Data-driven approach:** Each phase requires validation through SQL analysis before implementation. No premature optimization. **Signal Quality Version Tracking:** Database tracks `signalQualityVersion` field to compare algorithm performance: - Analytics dashboard shows version comparison: trades, win rate, P&L, extreme position stats - v4 (current) includes blocked signals tracking for data-driven optimization - Focus on extreme positions (< 15% range) - v3 aimed to reduce losses from weak ADX entries - SQL queries in `docs/analysis/SIGNAL_QUALITY_VERSION_ANALYSIS.sql` for deep-dive analysis - Need 20+ trades per version before meaningful comparison **Financial Roadmap Integration:** All technical improvements must align with current phase objectives (see top of document): - **Phase 1 (CURRENT):** Prove system works, compound aggressively, 60%+ win rate mandatory - **Phase 2-3:** Transition to sustainable growth while funding withdrawals - **Phase 4+:** Scale capital while reducing risk progressively - See `TRADING_GOALS.md` for complete 8-phase plan ($106 → $1M+) - SQL queries in `docs/analysis/SIGNAL_QUALITY_VERSION_ANALYSIS.sql` for deep-dive analysis - Need 20+ trades per version before meaningful comparison **Blocked Signals Analysis:** See `BLOCKED_SIGNALS_TRACKING.md` for: - SQL queries to analyze blocked signal patterns - Score distribution and metric analysis - Comparison with executed trades at similar quality levels - Future automation of price tracking (would TP1/TP2/SL have hit?) ## Integration Points - **n8n:** Expects exact response format from `/api/trading/execute` (see n8n-complete-workflow.json) - **Drift Protocol:** Uses SDK v2.75.0 - check docs at docs.drift.trade for API changes - **Pyth Network:** WebSocket + HTTP fallback for price feeds (handles reconnection) - **PostgreSQL:** Version 16-alpine, must be running before bot starts --- **Key Mental Model:** Think of this as two parallel systems (on-chain orders + software monitoring) working together. The Position Manager is the "backup brain" that constantly watches and acts if on-chain orders fail. Both write to the same database for complete trade history.