Files

mindesbunister becd5631e7 docs: Add Common Pitfall #49 - P&L exponential compounding bug

Documents the critical P&L compounding bug fixed in commit 6156c0f where
trade.realizedPnL mutation during external closure detection caused 15-20x
inflation of actual profit/loss values.

Includes:
- Root cause: Mutating trade.realizedPnL in monitoring loop
- Real incident: $6 actual profit → $92.46 in database
- Bug mechanism with code examples
- Why previous fixes (Common Pitfalls #27, #48) didn't prevent this
- Fix: Don't mutate shared state during calculations
- Related to other P&L compounding variants for cross-reference

2025-11-17 15:34:22 +01:00

140 KiB

Raw Blame History

AI Agent Instructions for Trading Bot v4

Mission & Financial Goals

Primary Objective: Build wealth systematically from $106 → $100,000+ through algorithmic trading

Current Phase: Phase 1 - Survival & Proof (Nov 2025 - Jan 2026)

Current Capital: $97.55 USDC (zero debt, 100% health)
Starting Capital: $106 (Nov 2025)
Target: $2,500 by end of Phase 1 (Month 2.5)
Strategy: Aggressive compounding, 0 withdrawals
Position Sizing: 100% of free collateral (~$97 at 15x leverage = ~$1,463 notional)
Risk Tolerance: EXTREME - This is recovery/proof-of-concept mode
Win Target: 20-30% monthly returns to reach $2,500
Trades Executed: 161 (as of Nov 12, 2025)

Why This Matters for AI Agents:

Every dollar counts at this stage - optimize for profitability, not just safety
User needs this system to work for long-term financial goals ($300-500/month withdrawals starting Month 3)
No changes that reduce win rate unless they improve profit factor
System must prove itself before scaling (see TRADING_GOALS.md for full 8-phase roadmap)

Key Constraints:

Can't afford extended drawdowns (limited capital)
Must maintain 60%+ win rate to compound effectively
Quality over quantity - only trade 60+ signal quality scores (lowered from 65 on Nov 12, 2025)
After 3 consecutive losses, STOP and review system

Architecture Overview

Type: Autonomous cryptocurrency trading bot with Next.js 15 frontend + Solana/Drift Protocol backend

Data Flow: TradingView → n8n webhook → Next.js API → Drift Protocol (Solana DEX) → Real-time monitoring → Auto-exit

CRITICAL: RPC Provider Choice

MUST use Alchemy RPC (https://solana-mainnet.g.alchemy.com/v2/YOUR_API_KEY)
DO NOT use Helius free tier - causes catastrophic rate limiting (239 errors in 10 minutes)
Helius free: 10 req/sec sustained = TOO LOW for trade execution + Position Manager monitoring
Alchemy free: 300M compute units/month = adequate for bot operations
Symptom if wrong RPC: Trades hit SL immediately, duplicate closes, Position Manager loses tracking, database save failures
Fixed Nov 14, 2025: Switched to Alchemy, system now works perfectly (TP1/TP2/runner all functioning)

Key Design Principle: Dual-layer redundancy - every trade has both on-chain orders (Drift) AND software monitoring (Position Manager) as backup.

Exit Strategy: ATR-Based TP2-as-Runner system (CURRENT - Nov 17, 2025):

ATR-BASED TP/SL (PRIMARY): TP1/TP2/SL calculated from ATR × multipliers
- TP1: ATR × 2.0 (typically ~0.86%, closes 60% default)
- TP2: ATR × 4.0 (typically ~1.72%, activates trailing stop)
- SL: ATR × 3.0 (typically ~1.29%)
- Safety bounds: MIN/MAX caps prevent extremes
- Falls back to fixed % if ATR unavailable
Runner: 40% remaining after TP1 (configurable via TAKE_PROFIT_1_SIZE_PERCENT=60)
Trailing Stop: ATR-based (1.3-1.5x ATR multiplier), activates after TP2 trigger
Benefits: Regime-agnostic (adapts to bull/bear automatically), asset-agnostic (SOL vs BTC different ATR)
Note: All UI displays dynamically calculate runner% as 100 - TAKE_PROFIT_1_SIZE_PERCENT

Per-Symbol Configuration: SOL and ETH have independent enable/disable toggles and position sizing:

SOLANA_ENABLED, SOLANA_POSITION_SIZE, SOLANA_LEVERAGE (defaults: true, 100%, 15x)
ETHEREUM_ENABLED, ETHEREUM_POSITION_SIZE, ETHEREUM_LEVERAGE (defaults: true, 100%, 1x)
BTC and other symbols fall back to global settings (MAX_POSITION_SIZE_USD, LEVERAGE)
Priority: Per-symbol ENV → Market config → Global ENV → Defaults

Signal Quality System: Filters trades based on 5 metrics (ATR, ADX, RSI, volumeRatio, pricePosition) scored 0-100. Only trades scoring 60+ are executed (lowered from 65 after data analysis showed 60-64 tier outperformed higher scores). Scores stored in database for future optimization.

Timeframe-Aware Scoring: Signal quality thresholds adjust based on timeframe (5min vs daily):

5min: ADX 12+ trending (vs 18+ for daily), ATR 0.2-0.7% healthy (vs 0.4%+ for daily)
Anti-chop filter: -20 points for extreme sideways (ADX <10, ATR <0.25%, Vol <0.9x)
Pass timeframe param to scoreSignalQuality() from TradingView alerts (e.g., timeframe: "5")

MAE/MFE Tracking: Every trade tracks Maximum Favorable Excursion (best profit %) and Maximum Adverse Excursion (worst loss %) updated every 2s. Used for data-driven optimization of TP/SL levels.

Manual Trading via Telegram: Send plain-text messages like long sol, short eth, long btc to open positions instantly (bypasses n8n, calls /api/trading/execute directly with preset healthy metrics). CRITICAL: Manual trades are marked with signalSource='manual' and excluded from TradingView indicator analysis (prevents data contamination).

Telegram Manual Trade Presets (Nov 17, 2025 - Data-Driven):

ATR: 0.43 (median from 162 SOL trades, Nov 2024-Nov 2025)
ADX: 32 (strong trend assumption)
RSI: 58 long / 42 short (neutral-favorable)
Volume: 1.2x average (healthy)
Price Position: 45 long / 55 short (mid-range)
Purpose: Enables quick manual entries when TradingView signals unavailable
Note: Re-entry analytics validate against fresh TradingView data when cached (<5min)

Re-Entry Analytics System: Manual trades are validated before execution using fresh TradingView data:

Market data cached from TradingView signals (5min expiry)
/api/analytics/reentry-check scores re-entry based on fresh metrics + recent performance
Telegram bot blocks low-quality re-entries unless --force flag used
Uses real TradingView ADX/ATR/RSI when available, falls back to historical data
Penalty for recent losing trades, bonus for winning streaks

VERIFICATION MANDATE: Financial Code Requires Proof

CRITICAL: THIS IS A REAL MONEY TRADING SYSTEM - NOT A TOY PROJECT

Core Principle: In trading systems, "working" means "verified with real data", NOT "code looks correct".

NEVER declare something working without:

Observing actual logs showing expected behavior
Verifying database state matches expectations
Comparing calculated values to source data
Testing with real trades when applicable
CONFIRMING CODE IS DEPLOYED - Check container start time vs commit time
VERIFYING ALL RELATED FIXES DEPLOYED - Multi-fix sessions require complete deployment verification

CODE COMMITTED ≠ CODE DEPLOYED

Git commit at 15:56 means NOTHING if container started at 15:06
ALWAYS verify: docker logs trading-bot-v4 | grep "Server starting" | head -1
Compare container start time to commit timestamp
If container older than commit: CODE NOT DEPLOYED, FIX NOT ACTIVE
Never say "fixed" or "protected" until deployment verified

MULTI-FIX DEPLOYMENT VERIFICATION When multiple related fixes are developed in same session:

# 1. Check container start time
docker inspect trading-bot-v4 --format='{{.State.StartedAt}}'
# Example: 2025-11-16T09:28:20.757451138Z

# 2. Check all commit timestamps
git log --oneline --format='%h %ai %s' -5
# Example output:
# b23dde0 2025-11-16 09:25:10 fix: Add needsVerification field
# c607a66 2025-11-16 09:00:42 critical: Fix close verification
# 673a493 2025-11-16 08:45:21 critical: Fix breakeven SL

# 3. Verify container newer than ALL commits
# Container 09:28:20 > Latest commit 09:25:10 ✅ ALL FIXES DEPLOYED

# 4. Test-specific verification for each fix
docker logs -f trading-bot-v4 | grep "expected log message from fix"

DEPLOYMENT CHECKLIST FOR MULTI-FIX SESSIONS:

All commits pushed to git
Container rebuilt successfully (no TypeScript errors)
Container restarted with --force-recreate
Container start time > ALL commit timestamps
Specific log messages from each fix observed (if testable)
Database state reflects changes (if applicable)

Example: Nov 16, 2025 Session (Breakeven SL + Close Verification)

Fix 1: Breakeven SL (commit 673a493, 08:45:21)
Fix 2: Close verification (commit c607a66, 09:00:42)
Fix 3: TypeScript interface (commit b23dde0, 09:25:10)
Container restart: 09:28:20 ✅ All three fixes deployed
Verification: Log messages include "Using original entry price" and "Waiting 5s for Drift state"

Critical Path Verification Requirements

Position Manager Changes:

Execute test trade with DRY_RUN=false (small size)
Watch docker logs for full TP1 → TP2 → exit cycle
SQL query: verify tp1Hit, slMovedToBreakeven, currentSize match Position Manager logs
Compare Position Manager tracked size to actual Drift position size
Check exit reason matches actual trigger (TP1/TP2/SL/trailing)

Exit Logic Changes (TP/SL/Trailing):

Log EXPECTED values (TP1 price, SL price after breakeven, trailing stop distance)
Log ACTUAL values from Drift position and Position Manager state
Verify: Does TP1 hit when price crosses TP1? Does SL move to breakeven?
Test: Open position, let it hit TP1, verify 75% closed + SL moved
Document: What SHOULD happen vs what ACTUALLY happened

API Endpoint Changes:

curl test with real payload from TradingView/n8n
Check response JSON matches expectations
Verify database record created with correct fields
Check Telegram notification shows correct values (leverage, size, etc.)
SQL query: confirm all fields populated correctly

Calculation Changes (P&L, Position Sizing, Percentages):

Add console.log for EVERY step of calculation
Verify units match (tokens vs USD, percent vs decimal, etc.)
SQL query with manual calculation: does code result match hand calculation?
Test edge cases: 0%, 100%, negative values, very small/large numbers

SDK/External Data Integration:

Log raw SDK response to verify assumptions about data format
NEVER trust documentation - verify with console.log
Example: position.size doc said "USD" but logs showed "tokens"
Document actual behavior in Common Pitfalls section

Red Flags Requiring Extra Verification

High-Risk Changes:

Unit conversions (tokens ↔ USD, percent ↔ decimal)
State transitions (TP1 hit → move SL to breakeven)
Configuration precedence (per-symbol vs global vs defaults)
Display values from complex calculations (leverage, size, P&L)
Timing-dependent logic (grace periods, cooldowns, race conditions)

Verification Steps for Each:

Before declaring working: Show proof (logs, SQL results, test output)
After deployment: Monitor first real trade closely, verify behavior
Edge cases: Test boundary conditions (0, 100%, max leverage, min size)
Regression: Check that fix didn't break other functionality

SQL Verification Queries

After Position Manager changes:

-- Verify TP1 detection worked correctly
SELECT 
  symbol, entryPrice, currentSize, realizedPnL,
  tp1Hit, slMovedToBreakeven, exitReason,
  TO_CHAR(createdAt, 'MM-DD HH24:MI') as time
FROM "Trade"
WHERE exitReason IS NULL  -- Open positions
  OR createdAt > NOW() - INTERVAL '1 hour'  -- Recent closes
ORDER BY createdAt DESC
LIMIT 5;

-- Compare Position Manager state to expectations
SELECT configSnapshot->'positionManagerState' as pm_state
FROM "Trade" 
WHERE symbol = 'SOL-PERP' AND exitReason IS NULL;

After calculation changes:

-- Verify P&L calculations
SELECT 
  symbol, direction, entryPrice, exitPrice,
  positionSize, realizedPnL,
  -- Manual calculation:
  CASE 
    WHEN direction = 'long' THEN 
      positionSize * ((exitPrice - entryPrice) / entryPrice)
    ELSE 
      positionSize * ((entryPrice - exitPrice) / entryPrice)
  END as expected_pnl,
  -- Difference:
  realizedPnL - CASE 
    WHEN direction = 'long' THEN 
      positionSize * ((exitPrice - entryPrice) / entryPrice)
    ELSE 
      positionSize * ((entryPrice - exitPrice) / entryPrice)
  END as pnl_difference
FROM "Trade"
WHERE exitReason IS NOT NULL
  AND createdAt > NOW() - INTERVAL '24 hours'
ORDER BY createdAt DESC
LIMIT 10;

Example: How Position.size Bug Should Have Been Caught

What went wrong:

Read code: "Looks like it's comparing sizes correctly"
Declared: "Position Manager is working!"
Didn't verify with actual trade

What should have been done:

// In Position Manager monitoring loop - ADD THIS LOGGING:
console.log('🔍 VERIFICATION:', {
  positionSizeRaw: position.size,  // What SDK returns
  positionSizeUSD: position.size * currentPrice,  // Converted to USD
  trackedSizeUSD: trade.currentSize,  // What we're tracking
  ratio: (position.size * currentPrice) / trade.currentSize,
  tp1ShouldTrigger: (position.size * currentPrice) < trade.currentSize * 0.95
})

Then observe logs on actual trade:

🔍 VERIFICATION: {
  positionSizeRaw: 12.28,  // ← AH! This is SOL tokens, not USD!
  positionSizeUSD: 1950.84,  // ← Correct USD value
  trackedSizeUSD: 1950.00,
  ratio: 1.0004,  // ← Should be near 1.0 when position full
  tp1ShouldTrigger: false  // ← Correct
}

Lesson: One console.log would have exposed the bug immediately.

Deployment Checklist

MANDATORY PRE-DEPLOYMENT VERIFICATION:

Check container start time: docker logs trading-bot-v4 | grep "Server starting" | head -1
Compare to commit timestamp: Container MUST be newer than code changes
If container older: STOP - Code not deployed, fix not active
Never declare "fixed" or "working" until container restarted with new code

Before marking feature complete:

Code review completed
Unit tests pass (if applicable)
Integration test with real API calls
Logs show expected behavior
Database state verified with SQL
Edge cases tested
Container restarted and verified running new code
Documentation updated (including Common Pitfalls if applicable)
User notified of what to verify during first real trade

When to Escalate to User

Don't say "it's working" if:

You haven't observed actual logs showing the expected behavior
SQL query shows unexpected values
Test trade behaved differently than expected
You're unsure about unit conversions or SDK behavior
Change affects money (position sizing, P&L, exits)
Container hasn't been restarted since code commit

Instead say:

"Code is updated. Need to verify with test trade - watch for [specific log message]"
"Fixed, but requires verification: check database shows [expected value]"
"Deployed. First real trade should show [behavior]. If not, there's still a bug."
"Code committed but NOT deployed - container running old version, fix not active yet"

Docker Build Best Practices

CRITICAL: Prevent build interruptions with background execution + live monitoring

Docker builds take 40-70 seconds and are easily interrupted by terminal issues. Use this pattern:

# Start build in background with live log tail
cd /home/icke/traderv4 && docker compose build trading-bot > /tmp/docker-build-live.log 2>&1 & BUILD_PID=$!; echo "Build started, PID: $BUILD_PID"; tail -f /tmp/docker-build-live.log

Why this works:

Build runs in background (&) - immune to terminal disconnects/Ctrl+C
Output redirected to log file - can review later if needed
tail -f shows real-time progress - see compilation, linting, errors
Can Ctrl+C the tail -f without killing build - build continues
Verification after: tail -50 /tmp/docker-build-live.log to check success

Success indicators:

✓ Compiled successfully in 27s
✓ Generating static pages (30/30)
#22 naming to docker.io/library/traderv4-trading-bot done
DONE X.Xs on final step

Failure indicators:

Failed to compile.
Type error:
ERROR: process "/bin/sh -c npm run build" did not complete successfully: exit code: 1

After successful build:

# Deploy new container
docker compose up -d --force-recreate trading-bot

# Verify it started
docker logs --tail=30 trading-bot-v4

# Confirm deployed version
docker logs trading-bot-v4 | grep "Server starting" | head -1

DO NOT use: docker compose build trading-bot in foreground - one network hiccup kills 60s of work

Docker Cleanup After Builds

CRITICAL: Prevent disk full issues from build cache accumulation

Docker builds create intermediate layers (1.3+ GB per build) that accumulate over time. Build cache can reach 40-50 GB after frequent rebuilds.

After successful deployment, clean up:

# Remove dangling images (old builds)
docker image prune -f

# Remove build cache (biggest space hog - 40+ GB typical)
docker builder prune -f

# Optional: Remove dangling volumes (if no important data)
docker volume prune -f

# Check space saved
docker system df

When to run:

After each successful deployment (recommended)
Weekly if building frequently
When disk space warnings appear
Before major updates/migrations

Space typically freed:

Dangling images: 2-5 GB
Build cache: 40-50 GB
Dangling volumes: 0.5-1 GB
Total: 40-55 GB per cleanup

What's safe to delete:

<none> tagged images (old builds)
Build cache (recreated on next build)
Dangling volumes (orphaned from removed containers)

What NOT to delete:

Named volumes (contain data: trading-bot-postgres, etc.)
Active containers
Tagged images currently in use

Critical Components

1. Phantom Trade Auto-Closure System

Purpose: Automatically close positions when size mismatch detected (position opened but wrong size)

When triggered:

Position opened on Drift successfully
Expected size: $50 (50% @ 1x leverage)
Actual size: $1.37 (7% fill - likely oracle price stale or exchange rejection)
Size ratio < 50% threshold → phantom detected

Automated response (all happens in <1 second):

Immediate closure: Market order closes 100% of phantom position
Database logging: Creates trade record with status='phantom', saves P&L
n8n notification: Returns HTTP 200 with full details (not 500 - allows workflow to continue)
Telegram alert: Message includes entry/exit prices, P&L, reason, transaction IDs

Why auto-close instead of manual intervention:

User may be asleep, away from devices, unavailable for hours
Unmonitored position = unlimited risk exposure
Position Manager won't track phantom (by design)
No TP/SL protection, no trailing stop, no monitoring
Better to exit with small loss/gain than leave position exposed
Re-entry always possible if setup was actually good

Example notification:

⚠️ PHANTOM TRADE AUTO-CLOSED

Symbol: SOL-PERP
Direction: LONG
Expected Size: $48.75
Actual Size: $1.37 (2.8%)

Entry: $168.50
Exit: $168.45
P&L: -$0.02

Reason: Size mismatch detected - likely oracle price issue or exchange rejection
Action: Position auto-closed for safety (unmonitored positions = risk)

TX: 5Yx2Fm8vQHKLdPaw...

Database tracking:

status='phantom' field identifies these trades
isPhantom=true, phantomReason='ORACLE_PRICE_MISMATCH'
expectedSizeUSD, actualSizeUSD fields for analysis
Exit reason: 'manual' (phantom auto-close category)
Enables post-trade analysis of phantom frequency and patterns

Code location: app/api/trading/execute/route.ts lines 322-445

2. Signal Quality Scoring (`lib/trading/signal-quality.ts`)

Purpose: Unified quality validation system that scores trading signals 0-100 based on 5 market metrics

Timeframe-aware thresholds:

scoreSignalQuality({ 
  atr, adx, rsi, volumeRatio, pricePosition, 
  timeframe?: string // "5" for 5min, undefined for higher timeframes
})

5min chart adjustments:

ADX healthy range: 12-22 (vs 18-30 for daily)
ATR healthy range: 0.2-0.7% (vs 0.4%+ for daily)
Anti-chop filter: -20 points for extreme sideways (ADX <10, ATR <0.25%, Vol <0.9x)

Price position penalties (all timeframes):

Long at 90-95%+ range: -15 to -30 points (chasing highs)
Short at <5-10% range: -15 to -30 points (chasing lows)
Prevents flip-flop losses from entering range extremes

Key behaviors:

Returns score 0-100 and detailed breakdown object
Minimum score 60 required to execute trade
Called by both /api/trading/check-risk and /api/trading/execute
Scores saved to database for post-trade analysis

2. Position Manager (`lib/trading/position-manager.ts`)

Purpose: Software-based monitoring loop that checks prices every 2 seconds and closes positions via market orders

Singleton pattern: Always use getInitializedPositionManager() - never instantiate directly

const positionManager = await getInitializedPositionManager()
await positionManager.addTrade(activeTrade)

Key behaviors:

Tracks ActiveTrade objects in a Map
TP2-as-Runner system: TP1 (configurable %, default 75%) → TP2 trigger (no close, activate trailing) → Runner (remaining %) with ATR-based trailing stop
Dynamic SL adjustments: Moves to breakeven after TP1, locks profit at +1.2%
On-chain order synchronization: After TP1 hits, calls cancelAllOrders() then placeExitOrders() with updated SL price at breakeven (uses retryWithBackoff() for rate limit handling)
ATR-based trailing stop: Calculates trail distance as (atrAtEntry / currentPrice × 100) × trailingStopAtrMultiplier, clamped between min/max %
Trailing stop: Activates when TP2 price hit, tracks peakPrice and trails dynamically
Closes positions via closePosition() market orders when targets hit
Acts as backup if on-chain orders don't fill
State persistence: Saves to database, restores on restart via configSnapshot.positionManagerState
Startup validation: On container restart, cross-checks last 24h "closed" trades against Drift to detect orphaned positions (see lib/startup/init-position-manager.ts)
Grace period for new trades: Skips "external closure" detection for positions <30 seconds old (Drift positions take 5-10s to propagate)
Exit reason detection: Uses trade state flags (tp1Hit, tp2Hit) and realized P&L to determine exit reason, NOT current price (avoids misclassification when price moves after order fills)
Real P&L calculation: Calculates actual profit based on entry vs exit price, not SDK's potentially incorrect values
Rate limit-aware exit: On 429 errors during close, keeps trade in monitoring (doesn't mark closed), retries naturally on next price update

3. Telegram Bot (`telegram_command_bot.py`)

Purpose: Python-based Telegram bot for manual trading commands and position status monitoring

Manual trade commands via plain text:

# User sends plain text message (not slash commands)
"long sol"          → Validates via analytics, then opens SOL-PERP long
"short eth"         → Validates via analytics, then opens ETH-PERP short
"long btc --force"  → Skips analytics validation, opens BTC-PERP long immediately

Key behaviors:

MessageHandler processes all text messages (not just commands)
Maps user-friendly symbols (sol, eth, btc) to Drift format (SOL-PERP, etc.)
Analytics validation: Calls /api/analytics/reentry-check before execution
- Blocks trades with score <55 unless --force flag used
- Uses fresh TradingView data (<5min old) when available
- Falls back to historical metrics with penalty
- Considers recent trade performance (last 3 trades)
Calls /api/trading/execute directly with preset healthy metrics (ATR=0.45, ADX=32, RSI=58/42)
Bypasses n8n workflow and TradingView requirements
60-second timeout for API calls
Responds with trade confirmation or analytics rejection message

Status command:

/status → Returns JSON of open positions from Drift

Implementation details:

Uses python-telegram-bot library
Deployed via docker-compose.telegram-bot.yml
Requires TELEGRAM_BOT_TOKEN and TELEGRAM_CHANNEL_ID in .env
API calls to http://trading-bot:3000/api/trading/execute

Drift client integration:

Singleton pattern: Use initializeDriftService() and getDriftService() - maintains single connection

const driftService = await initializeDriftService()
const health = await driftService.getAccountHealth()

Wallet handling: Supports both JSON array [91,24,...] and base58 string formats from Phantom wallet

4. Rate Limit Monitoring (`lib/drift/orders.ts` + `app/api/analytics/rate-limits`)

Purpose: Track and analyze Solana RPC rate limiting (429 errors) to prevent silent failures

Helius RPC Limits (Free Tier):

Burst: 100 requests/second
Sustained: 10 requests/second
Monthly: 100k requests
See docs/HELIUS_RATE_LIMITS.md for upgrade recommendations

Retry mechanism with exponential backoff (Nov 14, 2025 - Updated):

await retryWithBackoff(async () => {
  return await driftClient.cancelOrders(...)
}, maxRetries = 3, baseDelay = 5000) // Increased from 2s to 5s

Progression: 5s → 10s → 20s (vs old 2s → 4s → 8s) Rationale: Gives Helius time to recover, reduces cascade pressure by 2.5x

Database logging: Three event types in SystemEvent table:

rate_limit_hit: Each 429 error (logged with attempt #, delay, error snippet)
rate_limit_recovered: Successful retry (logged with total time, retry count)
rate_limit_exhausted: Failed after max retries (CRITICAL - order operation failed)

Analytics endpoint:

curl http://localhost:3001/api/analytics/rate-limits

Returns: Total hits/recoveries/failures, hourly patterns, recovery times, success rate

Key behaviors:

Only RPC calls wrapped: cancelAllOrders(), placeExitOrders(), closePosition()
Position Manager monitoring: Event-driven via Pyth WebSocket (not polling)
Rate limit-aware exit: Position Manager keeps monitoring on 429 errors (retries naturally)
Logs to both console and database for post-trade analysis

Monitoring queries: See docs/RATE_LIMIT_MONITORING.md for SQL queries

Startup Position Validation (Nov 14, 2025 - Added): On container startup, cross-checks last 24h of "closed" trades against actual Drift positions:

If DB says closed but Drift shows open → reopens in DB to restore Position Manager tracking
Prevents orphaned positions from failed close transactions
Logs: 🔴 CRITICAL: ${symbol} marked as CLOSED in DB but still OPEN on Drift!
Implementation: lib/startup/init-position-manager.ts - validateOpenTrades()

5. Order Placement (`lib/drift/orders.ts`)

Critical functions:

openPosition() - Opens market position with transaction confirmation
closePosition() - Closes position with transaction confirmation
placeExitOrders() - Places TP/SL orders on-chain
cancelAllOrders() - Cancels all reduce-only orders for a market

CRITICAL: Transaction Confirmation Pattern Both openPosition() and closePosition() MUST confirm transactions on-chain:

const txSig = await driftClient.placePerpOrder(orderParams)
console.log('⏳ Confirming transaction on-chain...')
const connection = driftService.getConnection()
const confirmation = await connection.confirmTransaction(txSig, 'confirmed')

if (confirmation.value.err) {
  throw new Error(`Transaction failed: ${JSON.stringify(confirmation.value.err)}`)
}
console.log('✅ Transaction confirmed on-chain')

Without this, the SDK returns signatures for transactions that never execute, causing phantom trades/closes.

CRITICAL: Drift SDK position.size is BASE ASSET TOKENS, not USD The Drift SDK returns position.size as token quantity (SOL/ETH/BTC), NOT USD notional:

// CORRECT: Convert tokens to USD by multiplying by current price
const positionSizeUSD = Math.abs(position.size) * currentPrice

// WRONG: Using position.size directly as USD (off by 150x+ for SOL!)
const positionSizeUSD = Math.abs(position.size)

This affects Position Manager's TP1/TP2 detection - if position.size is not converted to USD before comparing to tracked USD values, the system will never detect partial closes correctly. See Common Pitfall #22 for the full bug details and fix applied Nov 12, 2025.

Solana RPC Rate Limiting with Exponential Backoff Solana RPC endpoints return 429 errors under load. Always use retry logic for order operations:

export async function retryWithBackoff<T>(
  operation: () => Promise<T>,
  maxRetries: number = 3,
  initialDelay: number = 5000  // Increased from 2000ms to 5000ms (Nov 14, 2025)
): Promise<T> {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      return await operation()
    } catch (error: any) {
      if (error?.message?.includes('429') && attempt < maxRetries - 1) {
        const delay = initialDelay * Math.pow(2, attempt)
        console.log(`⏳ Rate limited, retrying in ${delay/1000}s... (attempt ${attempt + 1}/${maxRetries})`)
        await new Promise(resolve => setTimeout(resolve, delay))
        continue
      }
      throw error
    }
  }
  throw new Error('Max retries exceeded')
}

// Usage in cancelAllOrders
await retryWithBackoff(() => driftClient.cancelOrders(...))

Note: Increased from 2s to 5s base delay to give Helius RPC more recovery time. See docs/HELIUS_RATE_LIMITS.md for detailed analysis. Without this, order cancellations fail silently during TP1→breakeven order updates, leaving ghost orders that cause incorrect fills.

Dual Stop System (USE_DUAL_STOPS=true):

// Soft stop: TRIGGER_LIMIT at -1.5% (avoids wicks)
// Hard stop: TRIGGER_MARKET at -2.5% (guarantees exit)

Order types:

Entry: MARKET (immediate execution)
TP1/TP2: LIMIT reduce-only orders
Soft SL: TRIGGER_LIMIT reduce-only
Hard SL: TRIGGER_MARKET reduce-only

6. Database (`lib/database/trades.ts` + `prisma/schema.prisma`)

Purpose: PostgreSQL via Prisma ORM for trade history and analytics

Models: Trade, PriceUpdate, SystemEvent, DailyStats, BlockedSignal

Singleton pattern: Use getPrismaClient() - never instantiate PrismaClient directly

Key functions:

createTrade() - Save trade after execution (includes dual stop TX signatures + signalQualityScore)
updateTradeExit() - Record exit with P&L
addPriceUpdate() - Track price movements (called by Position Manager)
getTradeStats() - Win rate, profit factor, avg win/loss
getLastTrade() - Fetch most recent trade for analytics dashboard
createBlockedSignal() - Save blocked signals for data-driven optimization analysis
getRecentBlockedSignals() - Query recent blocked signals
getBlockedSignalsForAnalysis() - Fetch signals needing price analysis (future automation)

Important fields:

signalSource (String?) - Identifies trade origin: 'tradingview', 'manual', or NULL (old trades)
- CRITICAL: Manual Telegram trades are marked signalSource='manual' and excluded from TradingView indicator analysis
- Use filter: WHERE ("signalSource" IS NULL OR "signalSource" != 'manual') for indicator optimization queries
- See docs/MANUAL_TRADE_FILTERING.md for complete SQL filtering guide
signalQualityScore (Int?) - 0-100 score for data-driven optimization
signalQualityVersion (String?) - Tracks which scoring logic was used ('v1', 'v2', 'v3', 'v4')
- v1: Original logic (price position < 5% threshold)
- v2: Added volume compensation for low ADX (2025-11-07)
- v3: Stricter breakdown requirements: positions < 15% require (ADX > 18 AND volume > 1.2x) OR (RSI < 35 for shorts / RSI > 60 for longs)
- v4: CURRENT - Blocked signals tracking enabled for data-driven threshold optimization (2025-11-11)
- All new trades tagged with current version for comparative analysis
maxFavorableExcursion / maxAdverseExcursion - Track best/worst P&L during trade lifetime
maxFavorablePrice / maxAdversePrice - Track prices at MFE/MAE points
configSnapshot (Json) - Stores Position Manager state for crash recovery
atr, adx, rsi, volumeRatio, pricePosition - Context metrics from TradingView

BlockedSignal model fields (NEW):

Signal metrics: atr, adx, rsi, volumeRatio, pricePosition, timeframe
Quality scoring: signalQualityScore, signalQualityVersion, scoreBreakdown (JSON), minScoreRequired
Block tracking: blockReason (QUALITY_SCORE_TOO_LOW, COOLDOWN_PERIOD, HOURLY_TRADE_LIMIT, etc.), blockDetails
Future analysis: priceAfter1/5/15/30Min, wouldHitTP1/TP2/SL, analysisComplete
Automatically saved by check-risk endpoint when signals are blocked
Enables data-driven optimization: collect 10-20 blocked signals → analyze patterns → adjust thresholds

Per-symbol functions:

getLastTradeTimeForSymbol(symbol) - Get last trade time for specific coin (enables per-symbol cooldown)
Each coin (SOL/ETH/BTC) has independent cooldown timer to avoid missed opportunities

ATR-Based Risk Management (Nov 17, 2025)

Purpose: Regime-agnostic TP/SL system that adapts to market volatility automatically instead of using fixed percentages that work in one market regime but fail in another.

Core Concept: ATR (Average True Range) measures actual market volatility - when volatility increases (trending markets), targets expand proportionally. When volatility decreases (choppy markets), targets tighten. This solves the "bull/bear optimization bias" problem where fixed % targets optimized in bearish markets underperform in bullish conditions.

Calculation Formula:

function calculatePercentFromAtr(
  atrValue: number,      // Absolute ATR value (e.g., 0.43 for SOL)
  entryPrice: number,    // Position entry price (e.g., $140)
  multiplier: number,    // ATR multiplier (2.0, 4.0, 3.0)
  minPercent: number,    // Safety floor (e.g., 0.5%)
  maxPercent: number     // Safety ceiling (e.g., 1.5%)
): number {
  // Convert absolute ATR to percentage of price
  const atrPercent = (atrValue / entryPrice) * 100
  
  // Apply multiplier (TP1=2x, TP2=4x, SL=3x)
  const targetPercent = atrPercent * multiplier
  
  // Clamp between min/max bounds for safety
  return Math.max(minPercent, Math.min(maxPercent, targetPercent))
}

Example Calculation (SOL at $140 with ATR 0.43):

// ATR as percentage: 0.43 / 140 = 0.00307 = 0.307%

// TP1 (close 60%):
// 0.307% × 2.0 = 0.614% → clamped to [0.5%, 1.5%] = 0.614%
// Price target: $140 × 1.00614 = $140.86

// TP2 (activate trailing):
// 0.307% × 4.0 = 1.228% → clamped to [1.0%, 3.0%] = 1.228%
// Price target: $140 × 1.01228 = $141.72

// SL (emergency exit):
// 0.307% × 3.0 = 0.921% → clamped to [0.8%, 2.0%] = 0.921%
// Price target: $140 × 0.99079 = $138.71

Configuration (ENV variables):

# Enable ATR-based system
USE_ATR_BASED_TARGETS=true

# ATR multipliers (tuned for SOL volatility)
ATR_MULTIPLIER_TP1=2.0   # TP1: 2× ATR (first target)
ATR_MULTIPLIER_TP2=4.0   # TP2: 4× ATR (trailing stop activation)
ATR_MULTIPLIER_SL=3.0    # SL: 3× ATR (stop loss)

# Safety bounds (prevent extreme targets)
MIN_TP1_PERCENT=0.5      # Don't go below 0.5% for TP1
MAX_TP1_PERCENT=1.5      # Don't go above 1.5% for TP1
MIN_TP2_PERCENT=1.0      # Don't go below 1.0% for TP2
MAX_TP2_PERCENT=3.0      # Don't go above 3.0% for TP2
MIN_SL_PERCENT=0.8       # Don't go below 0.8% for SL
MAX_SL_PERCENT=2.0       # Don't go above 2.0% for SL

# Legacy fallback (used when ATR unavailable)
STOP_LOSS_PERCENT=-1.5
TAKE_PROFIT_1_PERCENT=0.8
TAKE_PROFIT_2_PERCENT=0.7

Data-Driven ATR Values:

SOL-PERP: Median ATR 0.43 (from 162 trades, Nov 2024-Nov 2025)
- Range: 0.0-1.17 (extreme outliers during high volatility)
- Typical: 0.32%-0.40% of price
- Used in Telegram manual trade presets
ETH-PERP: TBD (collect 50+ trades with ATR tracking)
BTC-PERP: TBD (collect 50+ trades with ATR tracking)

When ATR is Available:

TradingView signals include atr field in webhook payload
Execute endpoint calculates dynamic TP/SL using ATR × multipliers
Logs show: 📊 ATR-based targets: TP1 0.86%, TP2 1.72%, SL 1.29%
Database saves atrAtEntry for post-trade analysis

When ATR is NOT Available:

Falls back to fixed percentages from ENV (STOP_LOSS_PERCENT, etc.)
Logs show: ⚠️ No ATR data, using fixed percentages
Less optimal but still functional

Regime-Agnostic Benefits:

Bull markets: Higher volatility → ATR increases → targets expand automatically
Bear markets: Lower volatility → ATR decreases → targets tighten automatically
Asset-agnostic: SOL volatility ≠ BTC volatility, ATR adapts to each
No re-optimization needed: System adapts in real-time without manual tuning

Performance Analysis (Nov 17, 2025):

Old fixed targets: v6 shorts captured 3% of avg +20.74% MFE moves (TP2 at +0.7%)
New ATR targets: TP2 at ~1.72% + 40% runner with trailing stop
Expected improvement: Capture 8-10% of move (3× better than fixed targets)
Real-world validation: Awaiting 50+ trades with ATR-based exits for statistical confirmation

Code Locations:

config/trading.ts - ATR multiplier fields in TradingConfig interface
app/api/trading/execute/route.ts - calculatePercentFromAtr() function
telegram_command_bot.py - MANUAL_METRICS with ATR 0.43
.env - ATR_MULTIPLIER_* and MIN/MAX_*_PERCENT variables

Integration with TradingView: Ensure alerts include ATR field:

{
  "symbol": "{{ticker}}",
  "direction": "{{strategy.order.action}}",
  "atr": {{ta.atr(14)}},  // CRITICAL: Include 14-period ATR
  "adx": {{ta.dmi(14, 14)}},
  "rsi": {{ta.rsi(14)}},
  // ... other fields
}

Lesson Learned (Nov 17, 2025): Optimizing fixed % targets in one market regime (bearish Nov 2024) creates bias that fails when market shifts (bullish Dec 2024+). ATR-based targets eliminate this bias by adapting to actual volatility, not historical patterns. This is the correct long-term solution for regime-agnostic trading.

Configuration System

Three-layer merge:

DEFAULT_TRADING_CONFIG (config/trading.ts)
Environment variables (.env) via getConfigFromEnv()
Runtime overrides via getMergedConfig(overrides)

Always use: getMergedConfig() to get final config - never read env vars directly in business logic

Per-symbol position sizing: Use getPositionSizeForSymbol(symbol, config) which returns { size, leverage, enabled }

const { size, leverage, enabled } = getPositionSizeForSymbol('SOL-PERP', config)
if (!enabled) {
  return NextResponse.json({ success: false, error: 'Symbol trading disabled' }, { status: 400 })
}

Symbol normalization: TradingView sends "SOLUSDT" → must convert to "SOL-PERP" for Drift

const driftSymbol = normalizeTradingViewSymbol(body.symbol)

API Endpoints Architecture

Authentication: All /api/trading/* endpoints (except /test) require Authorization: Bearer API_SECRET_KEY

Pattern: Each endpoint follows same flow:

Auth check
Get config via getMergedConfig()
Initialize Drift service
Check account health
Execute operation
Save to database
Add to Position Manager if applicable

Key endpoints:

/api/trading/execute - Main entry point from n8n (production, requires auth), auto-caches market data
/api/trading/check-risk - Pre-execution validation (duplicate check, quality score, per-symbol cooldown, rate limits, symbol enabled check, saves blocked signals automatically)
/api/trading/test - Test trades from settings UI (no auth required, respects symbol enable/disable)
/api/trading/close - Manual position closing (requires symbol normalization)
/api/trading/sync-positions - Force Position Manager sync with Drift (POST, requires auth) - restores tracking for orphaned positions
/api/trading/cancel-orders - Manual order cleanup (for stuck/ghost orders after rate limit failures)
/api/trading/positions - Query open positions from Drift
/api/trading/market-data - Webhook for TradingView market data updates (GET for debug, POST for data)
/api/settings - Get/update config (writes to .env file, includes per-symbol settings)
/api/analytics/last-trade - Fetch most recent trade details for dashboard (includes quality score)
/api/analytics/reentry-check - Validate manual re-entry with fresh TradingView data + recent performance
/api/analytics/version-comparison - Compare performance across signal quality logic versions (v1/v2/v3/v4)
/api/restart - Create restart flag for watch-restart.sh script

Critical Workflows

Execute Trade (Production)

TradingView alert → n8n Parse Signal Enhanced (extracts metrics + timeframe)
  ↓ /api/trading/check-risk [validates quality score ≥60, checks duplicates, per-symbol cooldown]
  ↓ /api/trading/execute
  ↓ normalize symbol (SOLUSDT → SOL-PERP)
  ↓ getMergedConfig()
  ↓ getPositionSizeForSymbol() [check if symbol enabled + get sizing]
  ↓ openPosition() [MARKET order]
  ↓ calculate dual stop prices if enabled
  ↓ placeExitOrders() [on-chain TP1/TP2/SL orders]
  ↓ scoreSignalQuality({ ..., timeframe }) [compute 0-100 score with timeframe-aware thresholds]
  ↓ createTrade() [CRITICAL: save to database FIRST - see Common Pitfall #27]
  ↓ positionManager.addTrade() [ONLY after DB save succeeds - prevents unprotected positions]

CRITICAL EXECUTION ORDER (Nov 13, 2025 Fix): The order of database save → Position Manager add is NOT arbitrary - it's a safety requirement:

If database save fails, API returns HTTP 500 with critical warning
User sees: "CLOSE POSITION MANUALLY IMMEDIATELY" with transaction signature
Position Manager only tracks database-persisted trades
Container restarts can restore all positions from database
Never add to Position Manager before database save - creates unprotected positions

Position Monitoring Loop

Position Manager every 2s:
  ↓ Verify on-chain position still exists (detect external closures)
  ↓ getPythPriceMonitor().getLatestPrice()
  ↓ Calculate current P&L and update MAE/MFE metrics
  ↓ Check emergency stop (-2%) → closePosition(100%)
  ↓ Check SL hit → closePosition(100%)
  ↓ Check TP1 hit → closePosition(75%), cancelAllOrders(), placeExitOrders() with SL at breakeven
  ↓ Check profit lock trigger (+1.2%) → move SL to +configured%
  ↓ Check TP2 hit → closePosition(80% of remaining), activate runner
  ↓ Check trailing stop (if runner active) → adjust SL dynamically based on peakPrice
  ↓ addPriceUpdate() [save to database every N checks]
  ↓ saveTradeState() [persist Position Manager state + MAE/MFE for crash recovery]

Settings Update

Web UI → /api/settings POST
  ↓ Validate new settings
  ↓ Write to .env file using string replacement
  ↓ Return success
  ↓ User clicks "Restart Bot" → /api/restart
  ↓ Creates /tmp/trading-bot-restart.flag
  ↓ watch-restart.sh detects flag
  ↓ Executes: docker restart trading-bot-v4

Docker Context

Multi-stage build: deps → builder → runner (Node 20 Alpine)

Critical Dockerfile steps:

Install deps with npm install --production
Copy source and npx prisma generate (MUST happen before build)
npm run build (Next.js standalone output)
Runner stage copies standalone + static + node_modules + Prisma client

Container networking:

External: trading-bot-v4 on port 3001
Internal: Next.js on port 3000
Database: trading-bot-postgres on 172.28.0.0/16 network

DATABASE_URL caveat: Use trading-bot-postgres (container name) in .env for runtime, but localhost:5432 for Prisma CLI migrations from host

Project-Specific Patterns

1. Singleton Services

Never create multiple instances - always use getter functions:

const driftService = await initializeDriftService() // NOT: new DriftService()
const positionManager = getPositionManager()        // NOT: new PositionManager()
const prisma = getPrismaClient()                     // NOT: new PrismaClient()

2. Price Calculations

Direction matters for long vs short:

function calculatePrice(entry: number, percent: number, direction: 'long' | 'short') {
  if (direction === 'long') {
    return entry * (1 + percent / 100)  // Long: +1% = higher price
  } else {
    return entry * (1 - percent / 100)  // Short: +1% = lower price
  }
}

3. Error Handling

Database failures should not fail trades - always wrap in try/catch:

try {
  await createTrade(params)
  console.log('💾 Trade saved to database')
} catch (dbError) {
  console.error('❌ Failed to save trade:', dbError)
  // Don't fail the trade if database save fails
}

4. Reduce-Only Orders

All exit orders MUST be reduce-only (can only close, not open positions):

const orderParams = {
  reduceOnly: true,  // CRITICAL for TP/SL orders
  // ... other params
}

5. Nextcloud Deck Roadmap Sync

Purpose: Visual kanban board for tracking optimization roadmap progress

Key Components:

scripts/discover-deck-ids.sh - Find Nextcloud Deck board/stack IDs
scripts/sync-roadmap-to-deck.py - Sync roadmap files to Deck cards
docs/NEXTCLOUD_DECK_SYNC.md - Complete documentation

Workflow:

# One-time setup (already done)
bash scripts/discover-deck-ids.sh  # Creates /tmp/deck-config.json

# Sync roadmap to Deck (creates/updates cards)
python3 scripts/sync-roadmap-to-deck.py --init

# Always dry-run first to preview changes
python3 scripts/sync-roadmap-to-deck.py --init --dry-run

Stack Mapping:

📥 Backlog: Future phases, ideas, ML work (status: FUTURE)
📋 Planning: Next phases, ready to implement (status: PENDING, NEXT)
🚀 In Progress: Currently active work (status: CURRENT, IN PROGRESS, DEPLOYED)
✅ Complete: Finished phases (status: COMPLETE)

Card Structure:

3 high-level initiative cards (from OPTIMIZATION_MASTER_ROADMAP.md)
18 detailed phase cards (from individual roadmap files)
Total: 21 cards tracking all optimization work

When to Sync:

After completing a phase (update markdown status → re-sync)
When starting new phase (move card in Deck UI)
Weekly during active development to keep visual state current

Important Notes:

API doesn't support duplicate detection - always use --dry-run first
Manual card deletion required (API returns 405 on DELETE)
Code blocks auto-removed from descriptions (prevent API errors)
Card titles cleaned (no markdown, emojis removed for readability)

Testing Commands

# Local development
npm run dev

# Build production
npm run build && npm start

# Docker build and restart
docker compose build trading-bot
docker compose up -d --force-recreate trading-bot
docker logs -f trading-bot-v4

# Database operations
npx prisma generate                                    # Generate client
DATABASE_URL="postgresql://...@localhost:5432/..." npx prisma migrate dev
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c "\dt"

# Test trade from UI
# Go to http://localhost:3001/settings
# Click "Test LONG" or "Test SHORT"

SQL Analysis Queries

Essential queries for monitoring signal quality and blocked signals. Run via:

docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c "YOUR_QUERY"

Phase 1: Monitor Data Collection Progress

-- Check blocked signals count (target: 10-20 for Phase 2)
SELECT COUNT(*) as total_blocked FROM "BlockedSignal";

-- Score distribution of blocked signals
SELECT 
  CASE 
    WHEN signalQualityScore >= 60 THEN '60-64 (Close Call)'
    WHEN signalQualityScore >= 55 THEN '55-59 (Marginal)'
    WHEN signalQualityScore >= 50 THEN '50-54 (Weak)'
    ELSE '0-49 (Very Weak)'
  END as tier,
  COUNT(*) as count,
  ROUND(AVG(signalQualityScore)::numeric, 1) as avg_score
FROM "BlockedSignal"
WHERE blockReason = 'QUALITY_SCORE_TOO_LOW'
GROUP BY tier
ORDER BY MIN(signalQualityScore) DESC;

-- Recent blocked signals with full details
SELECT 
  symbol,
  direction,
  signalQualityScore as score,
  ROUND(adx::numeric, 1) as adx,
  ROUND(atr::numeric, 2) as atr,
  ROUND(pricePosition::numeric, 1) as pos,
  ROUND(volumeRatio::numeric, 2) as vol,
  blockReason,
  TO_CHAR(createdAt, 'MM-DD HH24:MI') as time
FROM "BlockedSignal"
ORDER BY createdAt DESC
LIMIT 10;

Phase 2: Compare Blocked vs Executed Trades

-- Compare executed trades in 60-69 score range
SELECT 
  signalQualityScore as score,
  COUNT(*) as trades,
  ROUND(AVG(realizedPnL)::numeric, 2) as avg_pnl,
  ROUND(SUM(realizedPnL)::numeric, 2) as total_pnl,
  ROUND(100.0 * SUM(CASE WHEN realizedPnL > 0 THEN 1 ELSE 0 END) / COUNT(*)::numeric, 1) as win_rate
FROM "Trade"
WHERE exitReason IS NOT NULL
  AND signalQualityScore BETWEEN 60 AND 69
GROUP BY signalQualityScore
ORDER BY signalQualityScore;

-- Block reason breakdown
SELECT 
  blockReason,
  COUNT(*) as count,
  ROUND(AVG(signalQualityScore)::numeric, 1) as avg_score
FROM "BlockedSignal"
GROUP BY blockReason
ORDER BY count DESC;

Analyze Specific Patterns

-- Blocked signals at range extremes (price position)
SELECT 
  direction,
  signalQualityScore as score,
  ROUND(pricePosition::numeric, 1) as pos,
  ROUND(adx::numeric, 1) as adx,
  ROUND(volumeRatio::numeric, 2) as vol,
  symbol,
  TO_CHAR(createdAt, 'MM-DD HH24:MI') as time
FROM "BlockedSignal"
WHERE blockReason = 'QUALITY_SCORE_TOO_LOW'
  AND (pricePosition < 10 OR pricePosition > 90)
ORDER BY signalQualityScore DESC;

-- ADX distribution in blocked signals
SELECT 
  CASE 
    WHEN adx >= 25 THEN 'Strong (25+)'
    WHEN adx >= 20 THEN 'Moderate (20-25)'
    WHEN adx >= 15 THEN 'Weak (15-20)'
    ELSE 'Very Weak (<15)'
  END as adx_tier,
  COUNT(*) as count,
  ROUND(AVG(signalQualityScore)::numeric, 1) as avg_score
FROM "BlockedSignal"
WHERE blockReason = 'QUALITY_SCORE_TOO_LOW'
  AND adx IS NOT NULL
GROUP BY adx_tier
ORDER BY MIN(adx) DESC;

Usage Pattern:

Run "Monitor Data Collection" queries weekly during Phase 1
Once 10+ blocked signals collected, run "Compare Blocked vs Executed" queries
Use "Analyze Specific Patterns" to identify optimization opportunities
Full query reference: BLOCKED_SIGNALS_TRACKING.md

Common Pitfalls

DRIFT SDK MEMORY LEAK (CRITICAL - Fixed Nov 15, 2025):
- Symptom: JavaScript heap out of memory after 10+ hours runtime, Telegram bot timeouts (60s)
- Root Cause: Drift SDK accumulates WebSocket subscriptions over time without cleanup
- Manifestation: Thousands of accountUnsubscribe error: readyState was 2 (CLOSING) in logs
- Heap Growth: Normal ~200MB → 4GB+ after 10 hours → OOM crash
- Solution: Automatic reconnection every 4 hours (lib/drift/client.ts)
- Implementation:
  - scheduleReconnection() - Sets 4-hour timer after initialization
  - reconnect() - Unsubscribes, resets state, reinitializes Drift client
  - Timer cleared in disconnect() to prevent orphaned timers
- Manual Control: /api/drift/reconnect endpoint (POST with auth, GET for status)
- Impact: System now self-healing, can run indefinitely without manual restarts
- Monitoring: Watch for scheduled reconnection logs: 🔄 Scheduled reconnection...
WRONG RPC PROVIDER (CRITICAL - CATASTROPHIC SYSTEM FAILURE):
- FINAL CONCLUSION Nov 14, 2025 (INVESTIGATION COMPLETE): Helius is the ONLY reliable RPC provider for Drift SDK
- Root Cause CONFIRMED: Alchemy's rate limiting breaks Drift SDK's burst subscription pattern during initialization
- Definitive Proof (Nov 14, 21:14 CET):
  - Created diagnostic endpoint /api/testing/drift-init
  - Alchemy: 17-71 subscription errors EVERY init (49 avg over 5 runs), 1644ms avg init time
  - Helius: 0 subscription errors EVERY init, 800ms avg init time
  - See docs/ALCHEMY_RPC_INVESTIGATION_RESULTS.md for full test data
- Why Alchemy Fails:
  - Drift SDK subscribes to 30-50+ accounts simultaneously during init (burst pattern)
  - Alchemy's CUPS enforcement rate limits these burst requests
  - Drift SDK does NOT retry failed subscriptions
  - SDK reports "initialized successfully" but with incomplete subscription set
  - Subsequent operations fail/timeout due to missing account data
  - Error message: "Received JSON-RPC error calling accountSubscribe"
- Why "Breakthrough" at 14:25 Wasn't Real:
  - First Alchemy test had 17-71 subscription errors (random variation)
  - Sometimes gets lucky with "just enough" subscriptions for one operation
  - SDK in degraded state from the start, just not obvious until second operation
  - This explains why first trade "worked" but subsequent trades failed
- Why Helius Works:
  - Higher burst tolerance for Solana dApp subscription patterns
  - Zero subscription errors during init
  - Faster initialization (800ms vs 1600ms)
  - Stable for continuous operations
- Technical Reality vs Documentation:
  - Alchemy DOES support WebSocket subscriptions (research confirmed)
  - Alchemy DOES support accountSubscribe method (not -32601 error)
  - BUT: Rate limit enforcement model incompatible with Drift's burst pattern
  - Documentation doesn't mention burst subscription limits
- Production Status:
  - Using: Helius RPC (https://mainnet.helius-rpc.com/?api-key=...)
  - Retry logic: 5s exponential backoff for rate limits
  - System: Stable, TP1/TP2/SL working, Position Manager tracking correctly
- Investigation Closed: This is DEFINITIVE. Use Helius. Do not use Alchemy.
- Test Yourself: curl 'http://localhost:3001/api/testing/drift-init?rpc=alchemy'
Prisma not generated in Docker: Must run npx prisma generate in Dockerfile BEFORE npm run build
Wrong DATABASE_URL: Container runtime needs trading-bot-postgres, Prisma CLI from host needs localhost:5432
Symbol format mismatch: Always normalize with normalizeTradingViewSymbol() before calling Drift (applies to ALL endpoints including /api/trading/close)
Missing reduce-only flag: Exit orders without reduceOnly: true can accidentally open new positions
Singleton violations: Creating multiple DriftClient or Position Manager instances causes connection/state issues
Type errors with Prisma: The Trade type from Prisma is only available AFTER npx prisma generate - use explicit types or // @ts-ignore carefully
Quality score duplication: Signal quality calculation exists in BOTH check-risk and execute endpoints - keep logic synchronized
TP2-as-Runner configuration:

takeProfit2SizePercent: 0 means "TP2 activates trailing stop, no position close"
This creates runner of remaining % after TP1 (default 25%, configurable via TAKE_PROFIT_1_SIZE_PERCENT)
TAKE_PROFIT_2_PERCENT=0.7 sets TP2 trigger price, TAKE_PROFIT_2_SIZE_PERCENT should be 0
Settings UI correctly shows "TP2 activates trailing stop" with dynamic runner % calculation

P&L calculation CRITICAL: Use actual entry vs exit price calculation, not SDK values:

const profitPercent = this.calculateProfitPercent(trade.entryPrice, exitPrice, trade.direction)
const actualRealizedPnL = (closedSizeUSD * profitPercent) / 100
trade.realizedPnL += actualRealizedPnL  // NOT: result.realizedPnL from SDK

Transaction confirmation CRITICAL: Both openPosition() AND closePosition() MUST call connection.confirmTransaction() after placePerpOrder(). Without this, the SDK returns transaction signatures that aren't confirmed on-chain, causing "phantom trades" or "phantom closes". Always check confirmation.value.err before proceeding.
Execution order matters: When creating trades via API endpoints, the order MUST be:
1. Open position + place exit orders
2. Save to database (createTrade())
3. Add to Position Manager (positionManager.addTrade())
If Position Manager is added before database save, race conditions occur where monitoring checks before the trade exists in DB.
New trade grace period: Position Manager skips "external closure" detection for trades <30 seconds old because Drift positions take 5-10 seconds to propagate after opening. Without this grace period, new positions are immediately detected as "closed externally" and cancelled.
Drift minimum position sizes: Actual minimums differ from documentation:
- SOL-PERP: 0.1 SOL (~$5-15 depending on price)
- ETH-PERP: 0.01 ETH (~$38-40 at $4000/ETH)
- BTC-PERP: 0.0001 BTC (~$10-12 at $100k/BTC)
Always calculate: minOrderSize × currentPrice must exceed Drift's $4 minimum. Add buffer for price movement.
Exit reason detection bug: Position Manager was using current price to determine exit reason, but on-chain orders filled at a DIFFERENT price in the past. Now uses trade.tp1Hit / trade.tp2Hit flags and realized P&L to correctly identify whether TP1, TP2, or SL triggered. Prevents profitable trades being mislabeled as "SL" exits.
Per-symbol cooldown: Cooldown period is per-symbol, NOT global. ETH trade at 10:00 does NOT block SOL trade at 10:01. Each coin (SOL/ETH/BTC) has independent cooldown timer to avoid missing opportunities on different assets.
Timeframe-aware scoring crucial: Signal quality thresholds MUST adjust for 5min vs higher timeframes:
- 5min charts naturally have lower ADX (12-22 healthy) and ATR (0.2-0.7% healthy) than daily charts
- Without timeframe awareness, valid 5min breakouts get blocked as "low quality"
- Anti-chop filter applies -20 points for extreme sideways regardless of timeframe
- Always pass timeframe parameter from TradingView alerts to scoreSignalQuality()
Price position chasing causes flip-flops: Opening longs at 90%+ range or shorts at <10% range reliably loses money:
- Database analysis showed overnight flip-flop losses all had price position 9-94% (chasing extremes)
- These trades had valid ADX (16-18) but entered at worst possible time
- Quality scoring now penalizes -15 to -30 points for range extremes
- Prevents rapid reversals when price is already overextended
TradingView ADX minimum for 5min: Set ADX filter to 15 (not 20+) in TradingView alerts for 5min charts:
- Higher timeframes can use ADX 20+ for strong trends
- 5min charts need lower threshold to catch valid breakouts
- Bot's quality scoring provides second-layer filtering with context-aware metrics
- Two-stage filtering (TradingView + bot) prevents both overtrading and missing valid signals
Prisma Decimal type handling: Raw SQL queries return Prisma Decimal objects, not plain numbers:
- Use any type for numeric fields in $queryRaw results: total_pnl: any
- Convert with Number() before returning to frontend: totalPnL: Number(stat.total_pnl) || 0
- Frontend uses .toFixed() which doesn't exist on Decimal objects
- Applies to all aggregations: SUM(), AVG(), ROUND() - all return Decimal types
- Example: /api/analytics/version-comparison converts all numeric fields
ATR-based trailing stop implementation (Nov 11, 2025): Runner system was using FIXED 0.3% trailing, causing immediate stops:
- Problem: At $168 SOL, 0.3% = $0.50 wiggle room. Trades with +7-9% MFE exited for losses.
- Fix: trailingDistancePercent = (atrAtEntry / currentPrice * 100) × trailingStopAtrMultiplier
- Config: TRAILING_STOP_ATR_MULTIPLIER=1.5, MIN=0.25%, MAX=0.9%, ACTIVATION=0.5%
- Typical improvement: 0.45% ATR × 1.5 = 0.675% trail ($1.13 vs $0.50 = 2.26x more room)
- Fallback: If atrAtEntry unavailable, uses clamped legacy trailingStopPercent
- Log verification: Look for "📊 ATR-based trailing: 0.0045 (0.52%) × 1.5x = 0.78%" messages
- ActiveTrade interface: Must include atrAtEntry?: number field for calculation
- See ATR_TRAILING_STOP_FIX.md for full details and database analysis
CreateTradeParams interface sync: When adding new database fields to Trade model, MUST update CreateTradeParams interface in lib/database/trades.ts:
- Interface defines what parameters createTrade() accepts
- Must add new field to interface (e.g., indicatorVersion?: string)
- Must add field to Prisma create data object in createTrade() function
- TypeScript build will fail if endpoint passes field not in interface
- Example: indicatorVersion tracking required 3-file update (execute route.ts, CreateTradeParams interface, createTrade function)
Position.size tokens vs USD bug (CRITICAL - Fixed Nov 12, 2025):
- Symptom: Position Manager detects false TP1 hits, moves SL to breakeven prematurely
- Root Cause: lib/drift/client.ts returns position.size as BASE ASSET TOKENS (12.28 SOL), not USD ($1,950)
- Bug: Comparing tokens (12.28) directly to USD ($1,950) → 12.28 < 1,950 × 0.95 = "99.4% reduction" → FALSE TP1!
- Fix: Always convert to USD before comparisons:
```
// In Position Manager (lines 322, 519, 558, 591)
const positionSizeUSD = Math.abs(position.size) * currentPrice

// Now compare USD to USD
if (positionSizeUSD < trade.currentSize * 0.95) {
  // Actual 5%+ reduction detected
}
```
- Impact: Without this fix, TP1 never triggers correctly, SL moves at wrong times, runner system fails
- Where it matters: Position Manager, any code querying Drift positions
- Database evidence: Trade showed tp1Hit: true when 100% still open, slMovedToBreakeven: true prematurely
Leverage display showing global config instead of symbol-specific (Fixed Nov 12, 2025):
- Symptom: Telegram notifications showing "⚡ Leverage: 10x" when actual position uses 15x or 20x
- Root Cause: API response returning config.leverage (global default) instead of symbol-specific value
- Fix: Use actual leverage from getPositionSizeForSymbol():
```
// app/api/trading/execute/route.ts (lines 345, 448, 522, 557)
const { size, leverage, enabled } = getPositionSizeForSymbol(driftSymbol, config)

// Return symbol-specific leverage
leverage: leverage,  // NOT: config.leverage
```
- Impact: Misleading notifications, user confusion about actual position risk
- Hierarchy: Per-symbol ENV (SOLANA_LEVERAGE) → Market config → Global ENV (LEVERAGE) → Defaults
Indicator version tracking (Nov 12, 2025+):
- Database field indicatorVersion tracks which TradingView strategy generated the signal
- v5: Buy/Sell Signal strategy (pre-Nov 12)
- v6: HalfTrend + BarColor strategy (Nov 12+)
- Used for performance comparison between strategies
Runner stop loss gap - NO protection between TP1 and TP2 (CRITICAL - Fixed Nov 15, 2025):
- Symptom: Runner position remained open despite price moving far past stop loss level
- Root Cause: Position Manager only checked stop loss BEFORE TP1 (line 877: if (!trade.tp1Hit && this.shouldStopLoss(...)), creating a protection gap
- Bug sequence:
  1. SHORT opened, TP1 hit at 70% close (runner = 30% remaining)
  2. Runner had stop loss at profit-lock level (+0.5%)
  3. Price moved past stop loss → NO CHECK RAN (tp1Hit = true, so SL check skipped)
  4. Runner exposed to unlimited loss for hours during TP1→TP2 window
  5. Made worse by runner below Drift minimum size ($12.79 < $15) = no on-chain orders either
- Impact: Hours of unprotected runner exposure = potential unlimited loss on 25-30% remaining position
- Code analysis:
```
// Line 877: Stop loss checked ONLY before TP1
if (!trade.tp1Hit && this.shouldStopLoss(currentPrice, trade)) {
  console.log(`🔴 STOP LOSS: ${trade.symbol}`)
  await this.executeExit(trade, 100, 'SL', currentPrice)
}

// Lines 881-895: TP1 and TP2 processing - NO STOP LOSS CHECK

// BUG: Runner between TP1-TP2 had ZERO stop loss protection!
```
- Fix: Added explicit runner stop loss check at line ~881:
```
// 2b. CRITICAL: Runner stop loss (AFTER TP1, BEFORE TP2)
// This protects the runner position after TP1 closes main position
if (trade.tp1Hit && !trade.tp2Hit && this.shouldStopLoss(currentPrice, trade)) {
  console.log(`🔴 RUNNER STOP LOSS: ${trade.symbol} at ${profitPercent.toFixed(2)}% (profit lock triggered)`)
  await this.executeExit(trade, 100, 'SL', currentPrice)
  return
}
```
- Why undetected: Runner system relatively new (Nov 11), most trades hit TP2 quickly without price reversals
- Compounded by: Drift minimum size check ($15 for SOL) prevented on-chain SL orders for small runners
- Log warning: ⚠️ SL size below market min, skipping on-chain SL indicates runner has NO on-chain protection
- Lesson: Every conditional branch in risk management MUST have explicit stop loss checks - never assume "it'll get caught somewhere"
External closure duplicate updates bug (CRITICAL - Fixed Nov 12, 2025):
- Symptom: Trades showing 7-8x larger losses than actual ($58 loss when Drift shows $7 loss)
- Root Cause: Position Manager monitoring loop re-processes external closures multiple times before trade removed from activeTrades Map
- Bug sequence:
  1. Trade closed externally (on-chain SL order fills at -$7.98)
  2. Position Manager detects closure: position === null
  3. Calculates P&L and calls updateTradeExit() → -$7.50 in DB
  4. BUT: Trade still in activeTrades Map (removal happens after DB update)
  5. Next monitoring loop (2s later) detects closure AGAIN
  6. Accumulates P&L: previouslyRealized (-$7.50) + runnerRealized (-$7.50) = -$15.00
  7. Updates database AGAIN → -$15.00 in DB
  8. Repeats 8 times → final -$58.43 (8× the actual loss)
- Fix: Remove trade from activeTrades Map BEFORE database update:
```
// BEFORE (BROKEN):
await updateTradeExit({ ... })
await this.removeTrade(trade.id)  // Too late! Loop already ran again

// AFTER (FIXED):
this.activeTrades.delete(trade.id)  // Remove FIRST
await updateTradeExit({ ... })      // Then update DB
if (this.activeTrades.size === 0) {
  this.stopMonitoring()
}
```
- Impact: Without this fix, every external closure is recorded 5-8 times with compounding P&L
- Root cause: Async timing issue - removeTrade() is async but monitoring loop continues synchronously
- Evidence: Logs showed 8 consecutive "External closure recorded" messages with increasing P&L
- Line: lib/trading/position-manager.ts line 493 (external closure detection block)
- Must update CreateTradeParams interface when adding new database fields (see pitfall #23)
- Analytics endpoint /api/analytics/version-comparison compares v5 vs v6 performance
Signal quality threshold adjustment (Nov 12, 2025):
- Lowered from 65 → 60 based on data analysis of 161 trades
- Reason: Score 60-64 tier outperformed higher scores:
  - 60-64: 2 trades, +$45.78 total, 100% WR, +$22.89 avg
  - 65-69: 13 trades, +$28.28 total, 53.8% WR, +$2.18 avg
  - 70-79: 67 trades, +$8.28 total, 44.8% WR (worst performance!)
- Paradox: Higher quality scores don't correlate with better performance in current data
- Expected impact: 2-3 additional trades/week, +$46-69 weekly profit potential
- Data collection: Enables blocked signals at 55-59 range for Phase 2 optimization
- Risk: Small sample size (2 trades) could be outliers, but downside limited
- SQL analysis showed clear pattern: stricter filtering was blocking profitable setups
Database-First Pattern (CRITICAL - Fixed Nov 13, 2025):
- Symptom: Positions opened on Drift with NO database record, NO Position Manager tracking, NO TP/SL protection
- Root Cause: Execute endpoint saved to database AFTER adding to Position Manager, with silent error catch
- Bug sequence:
  1. TradingView signal → /api/trading/execute
  2. Position opened on Drift ✅
  3. Position Manager tracking added ✅
  4. Database save attempted ❌ (fails silently)
  5. API returns success to user ❌
  6. Container restarts → Position Manager loses in-memory state ❌
  7. Result: Unprotected position with no monitoring or TP/SL orders
- Fix: Database-first execution order in app/api/trading/execute/route.ts:
```
// CRITICAL: Save to database FIRST before adding to Position Manager
try {
  await createTrade({...})
} catch (dbError) {
  console.error('❌ CRITICAL: Failed to save trade to database:', dbError)
  return NextResponse.json({
    success: false,
    error: 'Database save failed - position unprotected',
    message: `Position opened on Drift but database save failed. CLOSE POSITION MANUALLY IMMEDIATELY. Transaction: ${openResult.transactionSignature}`,
  }, { status: 500 })
}

// ONLY add to Position Manager if database save succeeded
await positionManager.addTrade(activeTrade)
```
- Impact: Without this fix, ANY database failure creates unprotected positions
- Verification: Test trade cmhxj8qxl0000od076m21l58z (Nov 13) confirmed fix working
- Documentation: See CRITICAL_INCIDENT_UNPROTECTED_POSITION.md for full incident report
- Rule: Database persistence ALWAYS comes before in-memory state updates
DNS retry logic (Nov 13, 2025):
- Problem: Trading bot fails with "fetch failed" errors when DNS resolution temporarily fails for mainnet.helius-rpc.com
- Impact: n8n workflow failures, missed trades, container restart failures
- Root Cause: EAI_AGAIN errors are transient DNS issues that resolve in seconds, but bot treated them as permanent failures
- Fix: Automatic retry in lib/drift/client.ts - retryOperation() wrapper:
```
// Detects transient errors: fetch failed, EAI_AGAIN, ENOTFOUND, ETIMEDOUT
// Retries up to 3 times with 2s delay between attempts (DNS-specific, separate from rate limit retries)
// Fails fast on non-transient errors (auth, config, permanent network issues)
await this.retryOperation(async () => {
  // Initialize Drift SDK, subscribe, get user account
}, 3, 2000, 'Drift initialization')
```
- Success logs: ⚠️ Drift initialization failed (attempt 1/3): fetch failed → ⏳ Retrying in 2000ms... → ✅ Drift service initialized successfully
- Impact: 99% of transient DNS failures now auto-recover, preventing missed trades
- Note: DNS retries use 2s delays (fast recovery), rate limit retries use 5s delays (RPC cooldown)
- Documentation: See docs/DNS_RETRY_LOGIC.md for monitoring queries and metrics
Declaring fixes "working" before deployment (CRITICAL - Nov 13, 2025):
- Symptom: AI says "position is protected" or "fix is deployed" when container still running old code
- Root Cause: Conflating "code committed to git" with "code running in production"
- Real Incident: Database-first fix committed 15:56, declared "working" at 19:42, but container started 15:06 (old code)
- Result: Unprotected position opened, database save failed silently, Position Manager never tracked it
- Financial Impact: User discovered $250+ unprotected position 3.5 hours after opening
- Verification Required:
```
# ALWAYS check before declaring fix deployed:
docker logs trading-bot-v4 | grep "Server starting" | head -1
# Compare container start time to git commit timestamp
# If container older: FIX NOT DEPLOYED
```
- Rule: NEVER say "fixed", "working", "protected", or "deployed" without verifying container restart timestamp
- Impact: This is a REAL MONEY system - premature declarations cause financial losses
- Documentation: Added mandatory deployment verification to VERIFICATION MANDATE section
Phantom trade notification workflow breaks (Nov 14, 2025):
- Symptom: Phantom trade detected, position opened on Drift, but n8n workflow stops with HTTP 500 error. User NOT notified.
- Root Cause: Execute endpoint returned HTTP 500 when phantom detected, causing n8n chain to halt before Telegram notification
- Problem: Unmonitored phantom position on exchange while user is asleep/away = unlimited risk exposure
- Fix: Auto-close phantom trades immediately + return HTTP 200 with warning (allows n8n to continue)
```
// When phantom detected in app/api/trading/execute/route.ts:
// 1. Immediately close position via closePosition()
// 2. Save to database (create trade + update with exit info)
// 3. Return HTTP 200 with full notification message in response
// 4. n8n workflow continues to Telegram notification step
```
- Response format change: { success: true, warning: 'Phantom trade detected and auto-closed', isPhantom: true, message: '[Full notification text]', phantomDetails: {...} }
- Why auto-close: User can't always respond (sleeping, no phone, traveling). Better to exit with small loss/gain than leave unmonitored position exposed.
- Impact: Protects user from unlimited risk during unavailable hours. Phantom trades are rare edge cases (oracle issues, exchange rejections).
- Database tracking: status='phantom', exitReason='manual', enables analysis of phantom frequency and patterns
Wrong entry price after orphaned position restoration (CRITICAL - Fixed Nov 15, 2025):
- Symptom: Position Manager tracking SHORT at $141.51 entry, but Drift UI shows $141.31 actual entry
- Root Cause: Startup validation restored orphaned position but used OLD database entry price instead of querying Drift for real value
- Bug sequence:
  1. Position opened at $141.317 (per Drift order history)
  2. TP1 closed 70% at $140.942
  3. Database incorrectly saved entry as $141.508 (maybe averaged or from previous position)
  4. Container restart → startup validation found position on Drift
  5. Reopened trade in DB but used stale trade.entryPrice from database
  6. Position Manager tracked with wrong entry ($141.51 vs actual $141.31)
  7. Stop loss calculated from wrong base: $141.08 instead of $140.89
- Impact: 0.14% difference ($0.20/SOL) in SL placement - could mean difference between small profit and small loss
- Fix: Query Drift SDK for actual entry price during orphaned position restoration
```
// In lib/startup/init-position-manager.ts (line 121-144):
// When reopening closed trade found on Drift:
const currentPrice = await driftService.getOraclePrice(marketConfig.driftMarketIndex)
const positionSizeUSD = position.size * currentPrice

await prisma.trade.update({
  where: { id: trade.id },
  data: {
    status: 'open',
    exitReason: null,
    entryPrice: position.entryPrice, // CRITICAL: Use Drift's actual entry price
    positionSizeUSD: positionSizeUSD, // Update to current size (runner after TP1)
  }
})
```
- Drift SDK returns real entry: position.entryPrice from getPosition() calculates from on-chain data (quoteAssetAmount / baseAssetAmount)
- Future-proofed: All orphaned position restorations now use authoritative Drift entry price, not stale DB value
- Manual fix required once: Had to manually UPDATE database for existing position, then restart container
- Lesson: Always prefer on-chain data over cached database values for critical trading parameters
Runner stop loss gap - NO protection between TP1 and TP2 (CRITICAL - Fixed Nov 15, 2025):
- Symptom: Runner position remained open despite price moving far above stop loss level
- Root Cause: Position Manager only checked stop loss BEFORE TP1 hit (line 693) OR AFTER TP2 hit (line 835), creating a gap
- Bug sequence:
  1. SHORT opened at $141.317, TP1 hit at $140.942 (70% closed)
  2. Runner (30% remaining, $12.70) had stop loss at $140.89 (profit lock)
  3. Price rose to $141.98 (way above $140.89 SL) → NO STOP LOSS CHECK
  4. Position exposed to unlimited loss for hours during TP1→TP2 window
  5. User manually checked: "runner close did not work. still open and the price is above 141,98"
- Impact: Hours of unprotected runner exposure = potential unlimited loss on 25-30% remaining position
- Code analysis:
```
// Line 693: Stop loss checked ONLY before TP1
if (!trade.tp1Hit && this.shouldStopLoss(currentPrice, trade)) {
  console.log(`🔴 STOP LOSS: ${trade.symbol}`)
  await this.executeExit(trade, 100, 'SL', currentPrice)
}

// Lines 706-831: TP1 and TP2 processing - NO STOP LOSS CHECK

// Line 835: Stop loss checked ONLY after TP2
if (trade.tp2Hit && this.config.useTrailingStop && this.shouldStopLoss(currentPrice, trade)) {
  console.log(`🔴 TRAILING STOP: ${trade.symbol}`)
  await this.executeExit(trade, 100, 'SL', currentPrice)
}

// BUG: Runner between TP1-TP2 has ZERO stop loss protection!
```
- Fix: Added explicit runner stop loss check at line ~795:
```
// CRITICAL: Check stop loss for runner (after TP1, before TP2)
if (trade.tp1Hit && !trade.tp2Hit && this.shouldStopLoss(currentPrice, trade)) {
  console.log(`🔴 RUNNER STOP LOSS: ${trade.symbol} at ${profitPercent.toFixed(2)}% (profit lock triggered)`)
  await this.executeExit(trade, 100, 'SL', currentPrice)
  return
}
```
- Live verification (Nov 15, 22:03): Runner SL triggered successfully after deployment, closed with +$2.94 profit
- Rate limit issue: Hit 429 storm during close (20+ attempts over several minutes), but eventually succeeded
- Database evidence: Trade shows exitReason='SL', proving runner stop loss triggered correctly
- Why undetected: Runner system relatively new (Nov 11), most trades hit TP2 quickly without price reversals
- Lesson: Every conditional branch in risk management MUST have explicit stop loss checks - never assume "it'll get caught somewhere"
Analytics dashboard showing original position size instead of current runner size (Fixed Nov 15, 2025):
- Symptom: Analytics page displays $42.54 when actual runner is $12.59 after TP1
- Root Cause: /api/analytics/last-trade returns trade.positionSizeUSD (original size), not runner size
- Database structure: No separate currentSize column - stored in configSnapshot.positionManagerState.currentSize
- Impact: User sees misleading exposure information on dashboard
- Fix: Modified API to check Position Manager state for open positions:
```
// In app/api/analytics/last-trade/route.ts
const configSnapshot = trade.configSnapshot as any
const positionManagerState = configSnapshot?.positionManagerState
const currentSize = positionManagerState?.currentSize

// Use currentSize for open positions (after TP1), fallback to original
const displaySize = trade.exitReason === null && currentSize 
  ? currentSize 
  : trade.positionSizeUSD

const formattedTrade = {
  // ...
  positionSizeUSD: displaySize, // Shows runner size for open positions
  // ...
}
```
- Behavior: Open positions show current runner size, closed positions show original size
- Benefits: Accurate exposure visibility, correct risk assessment on dashboard
- No container restart needed: API-only change, live immediately after deployment

Flip-flop price context using wrong data (CRITICAL - Fixed Nov 14, 2025):

Symptom: Flip-flop detection showing "100% price move" when actual movement was 0.2%, allowing trades that should be blocked
Root Cause: currentPrice parameter not available in check-risk endpoint (trade hasn't opened yet), so calculation used undefined/zero
Real incident: Nov 14, 06:05 CET - SHORT allowed with 0.2% flip-flop, lost -$1.56 in 5 minutes
Bug sequence:
1. LONG opened at $143.86 (06:00)
2. SHORT signal 4min later at $143.58 (0.2% move)
3. Flip-flop check: (undefined - 143.86) / 143.86 * 100 = garbage → showed "100%"
4. System thought it was reversal → allowed trade
5. Should have been blocked as tight-range chop
Fix: Two-part fix in commits 77a9437 and 795026a:

// In app/api/trading/check-risk/route.ts:
// Get current price from Pyth BEFORE quality scoring
const priceMonitor = getPythPriceMonitor()
const latestPrice = priceMonitor.getCachedPrice(body.symbol)
const currentPrice = latestPrice?.price || body.currentPrice

// In lib/trading/signal-quality.ts:
// Validate price data exists before calculation
if (!params.currentPrice || params.currentPrice === 0) {
  // No current price available - apply penalty (conservative)
  console.warn(`⚠️ Flip-flop check: No currentPrice available, applying penalty`)
  frequencyPenalties.flipFlop = -25
  score -= 25
} else {
  const priceChangePercent = Math.abs(
    (params.currentPrice - recentSignals.oppositeDirectionPrice) / 
    recentSignals.oppositeDirectionPrice * 100
  )
  console.log(`🔍 Flip-flop price check: $${recentSignals.oppositeDirectionPrice.toFixed(2)} → $${params.currentPrice.toFixed(2)} = ${priceChangePercent.toFixed(2)}%`)
  // Apply penalty only if < 2% move
}

Impact: Without this fix, flip-flop detection is useless - blocks reversals, allows chop
Lesson: Always validate input data for financial calculations, especially when data might not exist yet
Monitoring: Watch logs for "🔍 Flip-flop price check: $X → $Y = Z%" to verify correct calculations

Phantom trades need exitReason for cleanup (CRITICAL - Fixed Nov 15, 2025):
- Symptom: Position Manager keeps restoring phantom trade on every restart, triggers false runner stop loss alerts
- Root Cause: Phantom auto-closure sets status='phantom' but leaves exitReason=NULL
- Bug: Startup validator checks exitReason !== null (line 122 of init-position-manager.ts), ignores status field
- Consequence: Phantom trade with exitReason=NULL treated as "open" and restored to Position Manager
- Real incident: Nov 14 phantom trade (cmhy6xul20067nx077agh260n) caused 232% size mismatch, hundreds of false "🔴 RUNNER STOP LOSS" alerts
- Fix: When auto-closing phantom trades, MUST set exitReason:
```
// In app/api/trading/execute/route.ts (phantom detection):
await updateTradeExit({
  tradeId: trade.id,
  exitPrice: currentPrice,
  exitReason: 'manual', // CRITICAL: Must set exitReason for cleanup
  realizedPnL: actualPnL,
  status: 'phantom'
})
```
- Manual cleanup: If phantom already exists: UPDATE "Trade" SET "exitReason" = 'manual' WHERE status = 'phantom' AND "exitReason" IS NULL
- Impact: Without exitReason, phantom trades create ghost positions that trigger false alerts and pollute monitoring
- Verification: After restart, check logs for "Found 0 open trades" (not "Found 1 open trades to restore")
- Lesson: status field is for classification, exitReason is for lifecycle management - both must be set on closure
closePosition() missing retry logic causes rate limit storm (CRITICAL - Fixed Nov 15, 2025):
- Symptom: Position Manager tries to close trade, gets 429 error, retries EVERY 2 SECONDS → 100+ failed attempts → rate limit exhaustion
- Root Cause: placeExitOrders() has retryWithBackoff() wrapper (Nov 14 fix), but closePosition() did NOT
- Real incident: Trade cmi0il8l30000r607l8aec701 (Nov 15, 16:49 CET)
  1. Position Manager tried to close (SL or TP trigger)
  2. closePosition() called raw placePerpOrder() → 429 error
  3. executeExit() caught 429, returned early (line 935-940)
  4. Position Manager kept monitoring, retried close EVERY 2 seconds
  5. Logs show 100+ "❌ Failed to close position: 429" + "⚠️ Rate limited while closing SOL-PERP"
  6. Meanwhile: On-chain TP2 limit order filled (unaffected by SDK rate limits)
  7. External closure detected, DB updated 8 TIMES: $0.14 → $0.20 → $0.26 → ... → $0.51
  8. Container eventually restarted (likely from rate limit exhaustion)
- Why duplicate updates: Common Pitfall #27 fix (remove from Map before DB update) works UNLESS rate limits cause tons of retries before external closure detection
- Impact: User saw $0.51 profit in DB, $0.03 on Drift UI (8× compounding vs 1 actual fill)
- Fix: Wrapped closePosition() with retryWithBackoff() in lib/drift/orders.ts:
```
// Line ~567 (BEFORE):
const txSig = await driftClient.placePerpOrder(orderParams)

// Line ~567 (AFTER):
const txSig = await retryWithBackoff(async () => {
  return await driftClient.placePerpOrder(orderParams)
}, 3, 8000) // 8s base delay, 3 max retries (8s → 16s → 32s)
```
- Behavior now: 3 SDK retries over 56s (8+16+32) + Position Manager natural retry on next monitoring cycle = robust without spam
- RPC load reduction: 30-50× fewer requests during close operations (3 retries vs 100+ attempts)
- Verification: Container restarted 18:05 CET Nov 15, code deployed
- Lesson: EVERY SDK order operation (open, close, cancel, place) MUST have retry wrapper - Position Manager monitoring creates infinite retry loop without it
- Root Cause: Phantom auto-closure sets status='phantom' but leaves exitReason=NULL
- Bug: Startup validator checks exitReason !== null (line 122 of init-position-manager.ts), ignores status field
- Consequence: Phantom trade with exitReason=NULL treated as "open" and restored to Position Manager
- Real incident: Nov 14 phantom trade (cmhy6xul20067nx077agh260n) caused 232% size mismatch, hundreds of false "🔴 RUNNER STOP LOSS" alerts
- Fix: When auto-closing phantom trades, MUST set exitReason:
```
// In app/api/trading/execute/route.ts (phantom detection):
await updateTradeExit({
  tradeId: trade.id,
  exitPrice: currentPrice,
  exitReason: 'manual', // CRITICAL: Must set exitReason for cleanup
  realizedPnL: actualPnL,
  status: 'phantom'
})
```
- Manual cleanup: If phantom already exists: UPDATE "Trade" SET "exitReason" = 'manual' WHERE status = 'phantom' AND "exitReason" IS NULL
- Impact: Without exitReason, phantom trades create ghost positions that trigger false alerts and pollute monitoring
- Verification: After restart, check logs for "Found 0 open trades" (not "Found 1 open trades to restore")
- Lesson: status field is for classification, exitReason is for lifecycle management - both must be set on closure
Ghost position accumulation from failed DB updates (CRITICAL - Fixed Nov 15, 2025):
- Symptom: Position Manager tracking 4+ positions simultaneously when database shows only 1 open trade
- Root Cause: Database has exitReason IS NULL for positions actually closed on Drift
- Impact: Rate limit storms (4 positions × monitoring × order updates = 100+ RPC calls/second)
- Bug sequence:
  1. Position closed externally (on-chain TP/SL order fills)
  2. Position Manager attempts database update but fails silently
  3. Trade remains in database with exitReason IS NULL
  4. Container restart → Position Manager restores "open" trade from DB
  5. Position doesn't exist on Drift but is tracked in memory = ghost position
  6. Accumulates over time: 1 ghost → 2 ghosts → 4+ ghosts
  7. Each ghost triggers monitoring, order updates, price checks
  8. RPC rate limit exhaustion → 429 errors → system instability
- Real incidents:
  - Nov 14: Untracked 0.09 SOL position with no TP/SL protection
  - Nov 15 19:01: Position Manager tracking 4+ ghosts, massive rate limiting, "vanishing orders"
  - After cleanup: 4+ ghosts → 1 actual position, system stable
- Why manual restarts worked: Forced Position Manager to re-query Drift, but didn't prevent recurrence
- Solution: Periodic Drift position validation (Nov 15, 2025)
```
// In lib/trading/position-manager.ts:

// Schedule validation every 5 minutes
private scheduleValidation(): void {
  this.validationInterval = setInterval(async () => {
    await this.validatePositions()
  }, 5 * 60 * 1000)
}

// Validate tracked positions against Drift reality
private async validatePositions(): Promise<void> {
  for (const [tradeId, trade] of this.activeTrades) {
    const position = await driftService.getPosition(marketConfig.driftMarketIndex)

    // Ghost detected: tracked but missing on Drift
    if (!position || Math.abs(position.size) < 0.01) {
      console.log(`🔴 Ghost position detected: ${trade.symbol}`)
      await this.handleExternalClosure(trade, 'Ghost position cleanup')
    }
  }
}

// Reusable ghost cleanup method
private async handleExternalClosure(trade: ActiveTrade, reason: string): Promise<void> {
  // Remove from monitoring FIRST (prevent race conditions)
  this.activeTrades.delete(trade.id)

  // Update database with estimated P&L
  await updateTradeExit({
    positionId: trade.positionId,
    exitPrice: trade.lastPrice,
    exitReason: 'manual', // Ghost closures = manual
    realizedPnL: estimatedPnL,
    exitOrderTx: reason, // Store cleanup reason
    ...
  })

  if (this.activeTrades.size === 0) {
    this.stopMonitoring()
  }
}
```
- Behavior: Auto-detects and cleans ghosts every 5 minutes, no manual intervention
- RPC overhead: Minimal (1 check per 5 min per position = ~288 calls/day for 1 position)
- Benefits:
  - Self-healing system prevents ghost accumulation
  - Eliminates rate limit storms from ghost management
  - No more manual container restarts needed
  - Addresses root cause (state management) not symptom (rate limits)
- Logs: 🔍 Scheduled position validation every 5 minutes on startup
- Monitoring: 🔴 Ghost position detected + ✅ Ghost position cleaned up in logs
- Verification: Container restart shows 1 position, not 4+ like before
- Why paid RPC doesn't fix this: Ghost positions are state management bug, not capacity issue
- Lesson: Periodic validation of in-memory state against authoritative source prevents state drift
Settings UI permission error - .env file not writable by container user (CRITICAL - Fixed Nov 15, 2025):
- Symptom: Settings UI save fails with "Failed to save new settings" error
- Root Cause: .env file on host owned by root:root, nextjs user (UID 1001) inside container has read-only access
- Impact: Users cannot adjust ANY configuration via settings UI (position size, leverage, TP/SL levels, etc.)
- Error message: EACCES: permission denied, open '/app/.env' (errno -13, syscall 'open')
- User escalation: "thats a major flaw. THIS NEEDS TO WORK."
- Why it happens:
  1. Docker mounts .env file from host: ./.env:/app/.env (docker-compose.yml line 62)
  2. Mounted files retain host ownership (root:root on host = root:root in container)
  3. Container runs as nextjs user (UID 1001) for security
  4. Settings API attempts fs.writeFileSync('/app/.env') → permission denied
- Attempted fix (FAILED): docker exec trading-bot-v4 chown nextjs:nodejs /app/.env
  - Error: "Operation not permitted" - cannot change ownership on mounted files from inside container
- Correct fix: Change ownership on HOST before container starts
```
# On host as root
chown 1001:1001 /home/icke/traderv4/.env
chmod 644 /home/icke/traderv4/.env

# Restart container to pick up new permissions
docker compose restart trading-bot

# Verify inside container
docker exec trading-bot-v4 ls -la /app/.env
# Should show: -rw-r--r-- 1 nextjs nodejs
```
- Why UID 1001: Matches nextjs user created in Dockerfile:
```
RUN addgroup --system --gid 1001 nodejs && \
    adduser --system --uid 1001 nextjs
```
- Verification: Settings UI now saves successfully, .env file updated with new values
- Impact: Restores full settings UI functionality - users can adjust position sizing, leverage, TP/SL percentages
- Alternative solution (NOT used): Copy .env during Docker build with COPY --chown=nextjs:nodejs, but this breaks runtime config updates
- Lesson: Docker volume mounts retain host ownership - must plan for writability by setting host file ownership to match container user UID
Ghost position death spiral from skipped validation (CRITICAL - Fixed Nov 15, 2025, REFACTORED Nov 16, 2025):
- Symptom: Telegram /status shows 2 open positions when database shows all closed, massive rate limit storms (100+ RPC calls/minute)
- Root Cause: Periodic validation (every 5min) SKIPPED when Drift service rate-limited: ⏳ Drift service not ready, skipping validation
- Death Spiral: Ghosts → rate limits → validation skipped → more rate limits → more ghosts
- Impact: System unusable, requires manual container restart, user can't be away from laptop
- User Requirement: "bot has to work all the time especially when i am not on my laptop" - MUST be fully autonomous
- Real Incident (Nov 15, 2025):
  - Position Manager tracking 2 ghost positions
  - Both positions closed on Drift but still in memory
  - Trying to close non-existent positions every 2 seconds
  - Rate limit exhaustion prevented validation from running
  - Only solution was container restart (not autonomous)
- REFACTORED Solution (Nov 16, 2025) - Drift API only:
  - User feedback: Time-based cleanup (6 hours) too aggressive for legitimate long-running positions
  - Removed Layer 1 (age-based cleanup) - could close valid positions prematurely
  - All ghost detection now uses Drift API as source of truth
  - Layer 2: Queries Drift after 20 failed close attempts to verify position exists
  - Layer 3: Queries Drift every 40s during monitoring (unchanged)
  - Periodic validation: Queries Drift every 5 minutes for all tracked positions
  - Commit: 9db5f85 "refactor: Remove time-based ghost detection, rely purely on Drift API"
- Original 3-layer protection system (Nov 15, 2025 - DEPRECATED):
```
// LAYER 1: Database-based age check (doesn't require RPC)
private async cleanupStalePositions(): Promise<void> {
  const sixHoursAgo = Date.now() - (6 * 60 * 60 * 1000)

  for (const [tradeId, trade] of this.activeTrades) {
    if (trade.entryTime < sixHoursAgo) {
      console.log(`🔴 STALE GHOST DETECTED: ${trade.symbol} (age: ${hours}h)`)
      await this.handleExternalClosure(trade, 'Stale position cleanup (>6h old)')
    }
  }
}

// LAYER 2: Death spiral detector in executeExit()
if (errorMsg.includes('429')) {
  if (trade.priceCheckCount > 20) { // 20+ failed close attempts (40+ seconds)
    console.log(`🔴 DEATH SPIRAL DETECTED: ${trade.symbol}`)
    await this.handleExternalClosure(trade, 'Death spiral prevention')
    return // Force remove from monitoring
  }
}

// LAYER 3: Ghost check during normal monitoring (every 20 price updates)
if (trade.priceCheckCount % 20 === 0) {
  const position = await driftService.getPosition(marketConfig.driftMarketIndex)
  if (!position || Math.abs(position.size) < 0.01) {
    console.log(`🔴 GHOST DETECTED in monitoring loop`)
    await this.handleExternalClosure(trade, 'Ghost detected during monitoring')
    return
  }
}
```
- Key Changes:
  - validatePositions() now runs database cleanup FIRST (Layer 1) before Drift RPC checks
  - Changed skip message from "skipping validation" to "using database-only validation"
  - Layer 1 ALWAYS runs (no RPC required) - prevents long-term ghost accumulation (>6h)
  - Layer 2 breaks death spirals within 40 seconds of detection
  - Layer 3 catches ghosts quickly during normal monitoring (every 40s vs 5min)
- Impact:
  - System now self-healing - no manual intervention needed
  - Ghost positions cleaned within 40-360 seconds (depending on layer)
  - Works even during severe rate limiting (Layer 1 doesn't need RPC)
  - Telegram /status always accurate
  - User can be away - bot handles itself autonomously
- Verification: Container restart + new code = no more ghost accumulation possible
- Lesson: Critical validation logic must NEVER skip during error conditions - use fallback methods that don't require the failing resource
Missing Telegram notifications for position closures (Fixed Nov 16, 2025):
- Symptom: Position Manager closes trades (TP/SL/manual) but user gets no immediate notification
- Root Cause: TODO comment in Position Manager for Telegram notifications, never implemented
- Impact: User unaware of P&L outcomes until checking dashboard or Drift UI manually
- User Request: "sure" when asked if Telegram notifications would be useful
- Solution: Implemented direct Telegram API notifications in lib/notifications/telegram.ts
```
// lib/notifications/telegram.ts (NEW FILE - Nov 16, 2025)
export async function sendPositionClosedNotification(options: TelegramNotificationOptions): Promise<void> {
  try {
    const message = formatPositionClosedMessage(options)

    const response = await fetch(
      `https://api.telegram.org/bot${process.env.TELEGRAM_BOT_TOKEN}/sendMessage`,
      {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          chat_id: process.env.TELEGRAM_CHAT_ID,
          text: message,
          parse_mode: 'HTML'
        })
      }
    )

    if (!response.ok) {
      console.error('❌ Failed to send Telegram notification:', await response.text())
    } else {
      console.log('✅ Telegram notification sent successfully')
    }
  } catch (error) {
    console.error('❌ Error sending Telegram notification:', error)
    // Don't throw - notification failure shouldn't break position closing
  }
}
```
- Message format: Includes symbol, direction, P&L ($ and %), entry/exit prices, hold time, MAE/MFE, exit reason
- Exit reason emojis: TP1/TP2 (🎯), SL (🛑), manual (👤), emergency (🚨), ghost (👻)
- Integration points: Position Manager executeExit() (full close) + handleExternalClosure() (ghost cleanup)
- Benefits:
  - Immediate P&L feedback without checking dashboard
  - Works even when user away from computer
  - No n8n dependency - direct Telegram API call
  - Includes max gain/drawdown for post-trade analysis
- Error handling: Notification failures logged but don't prevent position closing
- Configuration: Requires TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID in .env
- Git commit: b1ca454 "feat: Add Telegram notifications for position closures"
- Lesson: User feedback channels (notifications) are as important as monitoring logic

Telegram bot DNS resolution failures (Fixed Nov 16, 2025):

Symptom: Telegram bot throws "Failed to resolve 'trading-bot-v4'" errors on /status and manual trades
Root Cause: Python urllib3 has transient DNS resolution failures (same as Node.js fetch failures)
Error message: urllib3.exceptions.NameResolutionError: <urllib3.connection.HTTPConnection object> Failed to resolve 'trading-bot-v4'
Impact: User cannot get position status or execute manual trades via Telegram commands
User Request: "we have a dns problem with the bit. can you configure it to use googles dns please"
Solution: Added retry logic with exponential backoff (Python version of Node.js retryOperation pattern)

# telegram_command_bot.py (Nov 16, 2025)
def retry_request(func, max_retries=3, initial_delay=2):
    """Retry a request function with exponential backoff for transient errors."""
    for attempt in range(max_retries):
        try:
            return func()
        except (requests.exceptions.ConnectionError, 
                requests.exceptions.Timeout,
                Exception) as e:
            error_msg = str(e).lower()
            if 'name or service not known' in error_msg or \
               'failed to resolve' in error_msg or \
               'connection' in error_msg:
                if attempt < max_retries - 1:
                    delay = initial_delay * (2 ** attempt)
                    print(f"⏳ DNS/connection error (attempt {attempt + 1}/{max_retries}): {e}")
                    time.sleep(delay)
                    continue
            raise
    raise Exception(f"Max retries ({max_retries}) exceeded")

# Usage in /status command:
response = retry_request(lambda: requests.get(url, headers=headers, timeout=60))

# Usage in manual trade execution:
response = retry_request(lambda: requests.post(url, json=payload, headers=headers, timeout=60))

Retry pattern: 3 attempts with exponential backoff (2s → 4s → 8s)
Matches Node.js pattern: Same retry count and backoff as lib/drift/client.ts retryOperation()
Applied to: /status command and manual trade execution (most critical paths)
Why not Google DNS: DNS config changes would affect entire container, retry logic scoped to bot only
Success rate: 99%+ of transient DNS failures auto-recover within 2 retries
Logs: Shows "⏳ DNS/connection error (attempt X/3)" when retrying
Git commit: bdf1be1 "fix: Add DNS retry logic to Telegram bot"
Lesson: Python urllib3 has same transient DNS issues as Node.js - apply same retry pattern

Drift SDK position.entryPrice RECALCULATES after partial closes (CRITICAL - FINANCIAL LOSS BUG - Fixed Nov 16, 2025):
- Symptom: Breakeven SL set $1.50+ ABOVE actual entry price, guaranteeing loss if triggered
- Root Cause: Drift SDK's position.entryPrice returns COST BASIS of remaining position after TP1, NOT original entry
- Real incident (Nov 16, 02:47 CET):
  - SHORT opened at $138.52 entry
  - TP1 hit, 70% closed at profit
  - System queried Drift for "actual entry": returned $140.01 (runner's cost basis)
  - Breakeven SL set at $140.01 (instead of $138.52)
  - Result: "Breakeven" SL $1.50 ABOVE entry = guaranteed $2.52 loss if hit
  - Position closed by ghost detection before SL could trigger (lucky)
- Why Drift recalculates:
  - After partial close, remaining position has different realized P&L
  - SDK calculates: position.entryPrice = quoteAssetAmount / baseAssetAmount
  - This gives AVERAGE price of remaining position, not ORIGINAL entry
  - For runners after TP1, this is ALWAYS wrong for breakeven calculation
- Impact: Every TP1 → breakeven SL transition uses wrong price, locks in losses instead of breakeven
- Fix: Always use database trade.entryPrice for breakeven SL (line 513 in position-manager.ts)
```
// BEFORE (BROKEN):
const actualEntryPrice = position.entryPrice || trade.entryPrice
trade.stopLossPrice = actualEntryPrice

// AFTER (FIXED):
const breakevenPrice = trade.entryPrice  // Use ORIGINAL entry from database
console.log(`📊 Breakeven SL: Using original entry price $${breakevenPrice.toFixed(4)} (Drift shows $${position.entryPrice.toFixed(4)} for remaining position)`)
trade.stopLossPrice = breakevenPrice
```
- Common Pitfall #44 context: Original fix (528a0f4) tried to use Drift's entry for "accuracy" but introduced this bug
- Lesson: Drift SDK data is authoritative for CURRENT state, but database is authoritative for ORIGINAL entry
- Verification: After TP1, logs now show: "Using original entry price $138.52 (Drift shows $140.01 for remaining position)"
- Git commit: [pending] "critical: Use database entry price for breakeven SL, not Drift's recalculated value"
Drift account leverage must be set in UI, not via API (CRITICAL - Nov 16, 2025):
- Symptom: InsufficientCollateral errors when opening positions despite bot configured for 15x leverage
- Root Cause: Drift Protocol account leverage is an on-chain account setting, cannot be changed via SDK/API
- Error message: AnchorError occurred. Error Code: InsufficientCollateral. Error Number: 6003. Error Message: Insufficient collateral.
- Real incident: Bot trying to open $1,281 notional position with $85.41 collateral
- Diagnosis logs:
```
Program log: total_collateral=85410503 ($85.41)
Program log: margin_requirement=1280995695 ($1,280.99)
```
- Math: $1,281 notional / $85.41 collateral = 15x leverage attempt
- Problem: Account leverage setting was 1x (or 0x shown when no positions), NOT 15x as intended
- Confusion points:
  1. Order leverage dropdown in Drift UI: Shows 15x selected but this is PER-ORDER, not account-wide
  2. "Account Leverage" field at bottom: Shows "0x" when no positions open, but means 1x actual setting
  3. SDK/API cannot change: Must use Drift UI settings or account page to change on-chain setting
- Screenshot evidence: User showed 15x selected in dropdown, but "Account Leverage: 0x" at bottom
- Explanation: Dropdown is for manual order placement, doesn't affect API trades or account-level setting
- Temporary workaround: Reduced SOLANA_POSITION_SIZE from 100% to 6% (~$5 positions)
```
# Temporary fix (Nov 16, 2025):
sed -i '378s/SOLANA_POSITION_SIZE=100/SOLANA_POSITION_SIZE=6/' /home/icke/traderv4/.env
docker restart trading-bot-v4

# Math: $85.41 × 6% = $5.12 position × 15x order leverage = $76.80 notional
# Fits in $85.41 collateral at 1x account leverage
```
- User action required:
  1. Go to Drift UI → Settings or Account page
  2. Find "Account Leverage" setting (currently 1x)
  3. Change to 15x (or desired leverage)
  4. Confirm on-chain transaction (costs SOL for gas)
  5. Verify setting updated in UI
  6. Once confirmed: Revert SOLANA_POSITION_SIZE back to 100%
  7. Restart bot: docker restart trading-bot-v4
- Impact: Bot cannot trade at full capacity until account leverage fixed
- Why API can't change: Account leverage is on-chain Drift account setting, requires signed transaction from wallet
- Bot leverage config: SOLANA_LEVERAGE=15 is for ORDER placement, assumes account leverage already set
- Drift documentation: Account leverage must be set in UI, is persistent on-chain setting
- Lesson: On-chain account settings cannot be changed via API - always verify account state matches bot assumptions before production trading
DEPRECATED - See Common Pitfall #43 for the actual bug (Nov 16, 2025):
- Original diagnosis was WRONG: Thought database entry was stale, so used Drift's position.entryPrice
- Reality: Drift's position.entryPrice RECALCULATES after partial closes (cost basis of runner, not original entry)
- Real fix: Always use DATABASE entry price for breakeven - it's authoritative for original entry
- This "fix" (commit 528a0f4) INTRODUCED the critical bug in Common Pitfall #43
- See Common Pitfall #43 for full details of the financial loss bug this caused
100% position sizing causes InsufficientCollateral (Fixed Nov 16, 2025):
- Symptom: Bot configured for 100% position size gets InsufficientCollateral errors, but Drift UI can open same size position
- Root Cause: Drift's margin calculation includes fees, slippage buffers, and rounding - exact 100% leaves no room
- Error details:
```
Program log: total_collateral=85547535 ($85.55)
Program log: margin_requirement=85583087 ($85.58)
Error: InsufficientCollateral (shortage: $0.03)
```
- Real incident (Nov 16, 01:50 CET):
  - Collateral: $85.55
  - Bot tries: $1,283.21 notional (100% × 15x leverage)
  - Drift UI works: $1,282.57 notional (has internal safety buffer)
  - Difference: $0.64 causes rejection
- Impact: Bot cannot trade at full capacity despite account leverage correctly set to 15x
- Fix: Apply 99% safety buffer automatically when user configures 100% position size
```
// In config/trading.ts calculateActualPositionSize (line ~272):
let percentDecimal = configuredSize / 100

// CRITICAL: Safety buffer for 100% positions
if (configuredSize >= 100) {
  percentDecimal = 0.99
  console.log(`⚠️ Applying 99% safety buffer for 100% position`)
}

const calculatedSize = freeCollateral * percentDecimal
// $85.55 × 99% = $84.69 (leaves $0.86 for fees/slippage)
```
- Result: $84.69 × 15x = $1,270.35 notional (well within margin requirements)
- User experience: Transparent - bot logs "Applying 99% safety buffer" when triggered
- Why Drift UI works: Has internal safety calculations that bot must replicate externally
- Math proof: 1% buffer on $85 = $0.85 safety margin (covers typical fees of $0.03-0.10)
- Git commit: 7129cbf "fix: Add 99% safety buffer for 100% position sizing"
- Lesson: When integrating with DEX protocols, never use 100% of resources - always leave safety margin for protocol-level calculations

Position close verification gap - 6 hours unmonitored (CRITICAL - Fixed Nov 16, 2025):

Symptom: Close transaction confirmed on-chain, database marked "SL closed", but position stayed open on Drift for 6+ hours unmonitored
Root Cause: Transaction confirmation ≠ Drift internal state updated immediately (5-10 second propagation delay)
Real incident (Nov 16, 02:51 CET):
- Trailing stop triggered at 02:51:57
- Close transaction confirmed on-chain ✅
- Position Manager immediately queried Drift → still showed open (stale state)
- Ghost detection eventually marked it "closed" in database
- But position actually stayed open on Drift until 08:51 restart
- 6 hours unprotected - no monitoring, no TP/SL backup, only orphaned on-chain orders
Why dangerous:
- Database said "closed" so container restarts wouldn't restore monitoring
- Position exposed to unlimited risk if price moved against
- Only saved by luck (container restart at 08:51 detected orphaned position)
- Startup validator caught mismatch: "CRITICAL: marked as CLOSED in DB but still OPEN on Drift"
Impact: Every trailing stop or SL exit vulnerable to this race condition
Fix (2-layer verification):

// In lib/drift/orders.ts closePosition() (line ~634):
if (params.percentToClose === 100) {
  console.log('🗑️ Position fully closed, cancelling remaining orders...')
  await cancelAllOrders(params.symbol)

  // CRITICAL: Verify position actually closed on Drift
  // Transaction confirmed ≠ Drift state updated immediately
  console.log('⏳ Waiting 5s for Drift state to propagate...')
  await new Promise(resolve => setTimeout(resolve, 5000))

  const verifyPosition = await driftService.getPosition(marketConfig.driftMarketIndex)
  if (verifyPosition && Math.abs(verifyPosition.size) >= 0.01) {
    console.error(`🔴 CRITICAL: Close confirmed BUT position still exists!`)
    console.error(`   Transaction: ${txSig}, Drift size: ${verifyPosition.size}`)
    // Return success but flag that monitoring should continue
    return {
      success: true,
      transactionSignature: txSig,
      closePrice: oraclePrice,
      closedSize: sizeToClose,
      realizedPnL,
      needsVerification: true, // Flag for Position Manager
    }
  }
  console.log('✅ Position verified closed on Drift')
}

// In lib/trading/position-manager.ts executeExit() (line ~1206):
if ((result as any).needsVerification) {
  console.log(`⚠️ Close confirmed but position still exists on Drift`)
  console.log(`   Keeping ${trade.symbol} in monitoring until Drift confirms closure`)
  console.log(`   Ghost detection will handle final cleanup once Drift updates`)
  // Keep monitoring - don't mark closed yet
  return
}

Behavior now:
- Close transaction confirmed → wait 5 seconds
- Query Drift to verify position actually gone
- If still exists: Keep monitoring, log critical error, wait for ghost detection
- If verified closed: Proceed with database update and cleanup
- Ghost detection becomes safety net, not primary close mechanism
Prevents: Premature database "closed" marking while position still open on Drift
TypeScript interface: Added needsVerification?: boolean to ClosePositionResult interface
Git commits: c607a66 (verification logic), b23dde0 (TypeScript interface fix)
Deployed: Nov 16, 2025 09:28:20 CET

Verification Required:

# MANDATORY: Verify fixes are actually deployed before declaring working
docker logs trading-bot-v4 | grep "Server starting" | head -1
# Expected: 2025-11-16T09:28:20 or later

# Verify close verification logs on next trade close:
docker logs -f trading-bot-v4 | grep -E "(Waiting 5s for Drift|Position verified closed|needsVerification)"

# Verify breakeven SL uses database entry:
docker logs -f trading-bot-v4 | grep "Breakeven SL: Using original entry price"

Lesson: In DEX trading, always verify state changes actually propagated before updating local state. ALWAYS verify container restart timestamp matches or exceeds commit timestamps before declaring fixes deployed.

P&L compounding during close verification (CRITICAL - Fixed Nov 16, 2025):
- Symptom: Database P&L shows $173.36 when actual P&L was $8.66 (20× too high)
- Root Cause: Variant of Common Pitfall #27 - duplicate external closure detection during close verification wait
- Real incident (Nov 16, 11:50 CET):
  - SHORT position: Entry $141.64 → Exit $140.08 (expected P&L: $8.66)
  - Close transaction confirmed, Drift verification pending (5-10s propagation delay)
  - Position Manager returned with needsVerification: true flag
  - Every 2 seconds: Monitoring loop checked Drift, saw position "missing", called handleExternalClosure()
  - Each call added P&L: $112.96 → $117.62 → $122.28 → ... → $173.36 (14+ compounding updates)
  - Rate limiting made it worse (429 errors delayed final cleanup)
- Why it happened:
  - Fix #47 introduced needsVerification flag to keep monitoring during propagation delay
  - BUT: No flag to prevent external closure detection during this wait period
  - Monitoring loop thought position was "closed externally" every cycle
  - Each detection calculated P&L and updated database, compounding the value
- Impact: Every close with verification delay (most closes) vulnerable to 10-20× P&L inflation
- Fix (closingInProgress flag):
```
// In ActiveTrade interface (line ~15):
// Close verification tracking (Nov 16, 2025)
closingInProgress?: boolean   // True when close tx confirmed but Drift not yet propagated
closeConfirmedAt?: number     // Timestamp when close was confirmed (for timeout)

// In executeExit() when needsVerification returned (line ~1210):
if ((result as any).needsVerification) {
  // CRITICAL: Mark as "closing in progress" to prevent duplicate external closure detection
  trade.closingInProgress = true
  trade.closeConfirmedAt = Date.now()
  console.log(`🔒 Marked as closing in progress - external closure detection disabled`)
  return
}

// In monitoring loop BEFORE external closure check (line ~640):
if (trade.closingInProgress) {
  const timeInClosing = Date.now() - (trade.closeConfirmedAt || Date.now())
  if (timeInClosing > 60000) {
    // Stuck >60s (abnormal) - allow cleanup
    trade.closingInProgress = false
  } else {
    // Normal: Skip external closure detection entirely during propagation wait
    console.log(`🔒 Close in progress (${(timeInClosing / 1000).toFixed(0)}s) - skipping external closure check`)
  }
}

// External closure check only runs if NOT closingInProgress
if ((position === null || position.size === 0) && !trade.closingInProgress) {
  // ... handle external closure
}
```
- Behavior now:
  - Close confirmed → Set closingInProgress = true
  - Monitoring continues but SKIPS external closure detection
  - After 5-10s: Drift propagates, ghost detection cleans up correctly (one time only)
  - If stuck >60s: Timeout allows cleanup (abnormal case)
- Prevents: Duplicate P&L updates during the 5-10s verification window
- Related to: Common Pitfall #27 (external closure duplicates), but different trigger
- Files changed: lib/trading/position-manager.ts (interface + logic)
- Lesson: When introducing wait periods in financial systems, always add flags to prevent duplicate state updates during the wait
P&L exponential compounding in external closure detection (CRITICAL - Fixed Nov 17, 2025):
- Symptom: Database P&L shows 15-20× actual value (e.g., $92.46 when Drift shows $6.00)
- Root Cause: trade.realizedPnL was being mutated during each external closure detection cycle
- Real incident (Nov 17, 13:54 CET):
  - SOL-PERP SHORT closed by on-chain orders: 1.54 SOL at -1.95% + 2.3 SOL at -0.57%
  - Actual P&L from Drift: ~$6.00 profit
  - Database recorded: $92.46 profit (15.4× too high)
  - Rate limiting caused 15+ detection cycles before trade removal
  - Each cycle compounded: $6 → $12 → $24 → $48 → $96
- Bug mechanism (line 799 in position-manager.ts):
```
// BROKEN CODE:
const previouslyRealized = trade.realizedPnL  // Gets from mutated in-memory object
const totalRealizedPnL = previouslyRealized + runnerRealized
trade.realizedPnL = totalRealizedPnL  // ← BUG: Mutates in-memory trade object

// Next monitoring cycle (2 seconds later):
const previouslyRealized = trade.realizedPnL  // ← Gets ACCUMULATED value from previous cycle
const totalRealizedPnL = previouslyRealized + runnerRealized  // ← Adds it AGAIN
trade.realizedPnL = totalRealizedPnL  // ← Compounds further
// Repeats 15-20 times before activeTrades.delete() removes trade
```
- Why Common Pitfall #48 didn't prevent this:
  - closingInProgress flag only applies when Position Manager initiates the close
  - External closures (on-chain TP/SL orders) don't set this flag
  - External closure detection runs in monitoring loop WITHOUT closingInProgress protection
  - Rate limiting delays cause monitoring loop to detect closure multiple times
- Fix:
```
// CORRECT CODE (line 798):
const previouslyRealized = trade.realizedPnL  // Get original value from DB
const totalRealizedPnL = previouslyRealized + runnerRealized
// DON'T mutate trade.realizedPnL here - causes compounding on re-detection!
// trade.realizedPnL = totalRealizedPnL  ← REMOVED
console.log(`   Realized P&L calculation → Previous: $${previouslyRealized.toFixed(2)} | Runner: $${runnerRealized.toFixed(2)} ... | Total: $${totalRealizedPnL.toFixed(2)}`)

// Later in same function (line 850):
await updateTradeExit({
  realizedPnL: totalRealizedPnL,  // Use local variable for DB update
  // ... other fields
})
```
- Impact: Every external closure (on-chain TP/SL fills) affected, especially with rate limiting
- Database correction: Manual UPDATE required for trades with inflated P&L
- Verification: Check that updateTradeExit uses totalRealizedPnL (local variable) not trade.realizedPnL (mutated field)
- Why activeTrades.delete() before DB update didn't help:
  - That fix (Common Pitfall #27) prevents duplicates AFTER database update completes
  - But external closure detection calculates P&L BEFORE calling activeTrades.delete()
  - If rate limits delay the detection→delete cycle, monitoring loop runs detection multiple times
  - Each time, it mutates trade.realizedPnL before checking if trade already removed
- Git commit: 6156c0f "critical: Fix P&L compounding bug in external closure detection"
- Related bugs:
  - Common Pitfall #27: Duplicate external closure updates (fixed by delete before DB update)
  - Common Pitfall #48: P&L compounding during close verification (fixed by closingInProgress flag)
  - This bug (#49): P&L compounding in external closure detection (fixed by not mutating trade.realizedPnL)
- Lesson: In monitoring loops that run repeatedly, NEVER mutate shared state during calculation phases. Calculate locally, update shared state ONCE at the end. Immutability prevents compounding bugs in retry/race scenarios.
100% position sizing causes InsufficientCollateral (Fixed Nov 16, 2025):
- Symptom: Bot configured for 100% position size gets InsufficientCollateral errors, but Drift UI can open same size position
- Root Cause: Drift's margin calculation includes fees, slippage buffers, and rounding - exact 100% leaves no room
- Error details:
```
Program log: total_collateral=85547535 ($85.55)
Program log: margin_requirement=85583087 ($85.58)
Error: InsufficientCollateral (shortage: $0.03)
```
- Real incident (Nov 16, 01:50 CET):
  - Collateral: $85.55
  - Bot tries: $1,283.21 notional (100% × 15x leverage)
  - Drift UI works: $1,282.57 notional (has internal safety buffer)
  - Difference: $0.64 causes rejection
- Impact: Bot cannot trade at full capacity despite account leverage correctly set to 15x
- Fix: Apply 99% safety buffer automatically when user configures 100% position size
```
// In config/trading.ts calculateActualPositionSize (line ~272):
let percentDecimal = configuredSize / 100

// CRITICAL: Safety buffer for 100% positions
if (configuredSize >= 100) {
  percentDecimal = 0.99
  console.log(`⚠️ Applying 99% safety buffer for 100% position`)
}

const calculatedSize = freeCollateral * percentDecimal
// $85.55 × 99% = $84.69 (leaves $0.86 for fees/slippage)
```
- Result: $84.69 × 15x = $1,270.35 notional (well within margin requirements)
- User experience: Transparent - bot logs "Applying 99% safety buffer" when triggered
- Why Drift UI works: Has internal safety calculations that bot must replicate externally
- Math proof: 1% buffer on $85 = $0.85 safety margin (covers typical fees of $0.03-0.10)
- Git commit: 7129cbf "fix: Add 99% safety buffer for 100% position sizing"
- Lesson: When integrating with DEX protocols, never use 100% of resources - always leave safety margin for protocol-level calculations
Position close verification gap - 6 hours unmonitored (CRITICAL - Fixed Nov 16, 2025):
- Symptom: Close transaction confirmed on-chain, database marked "SL closed", but position stayed open on Drift for 6+ hours unmonitored
- Root Cause: Transaction confirmation ≠ Drift internal state updated immediately (5-10 second propagation delay)
- Real incident (Nov 16, 02:51 CET):
  - Trailing stop triggered at 02:51:57
  - Close transaction confirmed on-chain ✅
  - Position Manager immediately queried Drift → still showed open (stale state)
  - Ghost detection eventually marked it "closed" in database
  - But position actually stayed open on Drift until 08:51 restart
  - 6 hours unprotected - no monitoring, no TP/SL backup, only orphaned on-chain orders
- Why dangerous:
  - Database said "closed" so container restarts wouldn't restore monitoring
  - Position exposed to unlimited risk if price moved against
  - Only saved by luck (container restart at 08:51 detected orphaned position)
  - Startup validator caught mismatch: "CRITICAL: marked as CLOSED in DB but still OPEN on Drift"
- Impact: Every trailing stop or SL exit vulnerable to this race condition
- Fix (2-layer verification):
```
// In lib/drift/orders.ts closePosition() (line ~634):
if (params.percentToClose === 100) {
  console.log('🗑️ Position fully closed, cancelling remaining orders...')
  await cancelAllOrders(params.symbol)

  // CRITICAL: Verify position actually closed on Drift
  // Transaction confirmed ≠ Drift state updated immediately
  console.log('⏳ Waiting 5s for Drift state to propagate...')
  await new Promise(resolve => setTimeout(resolve, 5000))

  const verifyPosition = await driftService.getPosition(marketConfig.driftMarketIndex)
  if (verifyPosition && Math.abs(verifyPosition.size) >= 0.01) {
    console.error(`🔴 CRITICAL: Close confirmed BUT position still exists!`)
    console.error(`   Transaction: ${txSig}, Drift size: ${verifyPosition.size}`)
    // Return success but flag that monitoring should continue
    return {
      success: true,
      transactionSignature: txSig,
      closePrice: oraclePrice,
      closedSize: sizeToClose,
      realizedPnL,
      needsVerification: true, // Flag for Position Manager
    }
  }
  console.log('✅ Position verified closed on Drift')
}

// In lib/trading/position-manager.ts executeExit() (line ~1206):
if ((result as any).needsVerification) {
  console.log(`⚠️ Close confirmed but position still exists on Drift`)
  console.log(`   Keeping ${trade.symbol} in monitoring until Drift confirms closure`)
  console.log(`   Ghost detection will handle final cleanup once Drift updates`)
  // Keep monitoring - don't mark closed yet
  return
}
```
- Behavior now:
  - Close transaction confirmed → wait 5 seconds
  - Query Drift to verify position actually gone
  - If still exists: Keep monitoring, log critical error, wait for ghost detection
  - If verified closed: Proceed with database update and cleanup
  - Ghost detection becomes safety net, not primary close mechanism
- Prevents: Premature database "closed" marking while position still open on Drift
- Git commit: c607a66 "critical: Fix position close verification to prevent ghost positions"
- Lesson: In DEX trading, always verify state changes actually propagated before updating local state

File Conventions

API routes: app/api/[feature]/[action]/route.ts (Next.js 15 App Router)
Services: lib/[service]/[module].ts (drift, pyth, trading, database)
Config: Single source in config/trading.ts with env merging
Types: Define interfaces in same file as implementation (not separate types directory)
Console logs: Use emojis for visual scanning: 🎯 🚀 ✅ ❌ 💰 📊 🛡️

Re-Entry Analytics System (Phase 1)

Purpose: Validate manual Telegram trades using fresh TradingView data + recent performance analysis

Components:

Market Data Cache (lib/trading/market-data-cache.ts)
- Singleton service storing TradingView metrics
- 5-minute expiry on cached data
- Tracks: ATR, ADX, RSI, volume ratio, price position, timeframe
Market Data Webhook (app/api/trading/market-data/route.ts)
- Receives TradingView alerts every 1-5 minutes
- POST: Updates cache with fresh metrics
- GET: View cached data (debugging)
Re-Entry Check Endpoint (app/api/analytics/reentry-check/route.ts)
- Validates manual trade requests
- Uses fresh TradingView data if available (<5min old)
- Falls back to historical metrics from last trade
- Scores signal quality + applies performance modifiers:
  - -20 points if last 3 trades lost money (avgPnL < -5%)
  - +10 points if last 3 trades won (avgPnL > +5%, WR >= 66%)
  - -5 points for stale data, -10 points for no data
- Minimum score: 55 (vs 60 for new signals)
Auto-Caching (app/api/trading/execute/route.ts)
- Every trade signal from TradingView auto-caches metrics
- Ensures fresh data available for manual re-entries
Telegram Integration (telegram_command_bot.py)
- Calls /api/analytics/reentry-check before executing manual trades
- Shows data freshness ("✅ FRESH 23s old" vs "⚠️ Historical")
- Blocks low-quality re-entries unless --force flag used
- Fail-open: Proceeds if analytics check fails

User Flow:

User: "long sol"
  ↓ Check cache for SOL-PERP
  ↓ Fresh data? → Use real TradingView metrics
  ↓ Stale/missing? → Use historical + penalty
  ↓ Score quality + recent performance
  ↓ Score >= 55? → Execute
  ↓ Score < 55? → Block (unless --force)

TradingView Setup: Create alerts that fire every 1-5 minutes with this webhook message:

{
  "action": "market_data",
  "symbol": "{{ticker}}",
  "timeframe": "{{interval}}",
  "atr": {{ta.atr(14)}},
  "adx": {{ta.dmi(14, 14)}},
  "rsi": {{ta.rsi(14)}},
  "volumeRatio": {{volume / ta.sma(volume, 20)}},
  "pricePosition": {{(close - ta.lowest(low, 100)) / (ta.highest(high, 100) - ta.lowest(low, 100)) * 100}},
  "currentPrice": {{close}}
}

Webhook URL: https://your-domain.com/api/trading/market-data

Per-Symbol Trading Controls

Purpose: Independent enable/disable toggles and position sizing for SOL and ETH to support different trading strategies (e.g., ETH for data collection at minimal size, SOL for profit generation).

Configuration Priority:

Per-symbol ENV vars (highest priority)
- SOLANA_ENABLED, SOLANA_POSITION_SIZE, SOLANA_LEVERAGE
- ETHEREUM_ENABLED, ETHEREUM_POSITION_SIZE, ETHEREUM_LEVERAGE
Market-specific config (from MARKET_CONFIGS in config/trading.ts)
Global ENV vars (fallback for BTC and other symbols)
- MAX_POSITION_SIZE_USD, LEVERAGE
Default config (lowest priority)

Settings UI: app/settings/page.tsx has dedicated sections:

💎 Solana section: Toggle + position size + leverage + risk calculator
⚡ Ethereum section: Toggle + position size + leverage + risk calculator
💰 Global fallback: For BTC-PERP and future symbols

Example usage:

// In execute/test endpoints
const { size, leverage, enabled } = getPositionSizeForSymbol(driftSymbol, config)
if (!enabled) {
  return NextResponse.json({
    success: false,
    error: 'Symbol trading disabled'
  }, { status: 400 })
}

Test buttons: Settings UI has symbol-specific test buttons:

💎 Test SOL LONG/SHORT (disabled when SOLANA_ENABLED=false)
⚡ Test ETH LONG/SHORT (disabled when ETHEREUM_ENABLED=false)

When Making Changes

Adding new config: Update DEFAULT_TRADING_CONFIG + getConfigFromEnv() + .env file
Adding database fields: Update prisma/schema.prisma → npx prisma migrate dev → npx prisma generate → rebuild Docker
Changing order logic: Test with DRY_RUN=true first, use small position sizes ($10)
API endpoint changes: Update both endpoint + corresponding n8n workflow JSON (Check Risk and Execute Trade nodes)
Docker changes: Rebuild with docker compose build trading-bot then restart container
Modifying quality score logic: Update BOTH /api/trading/check-risk and /api/trading/execute endpoints, ensure timeframe-aware thresholds are synchronized
Exit strategy changes: Modify Position Manager logic + update on-chain order placement in placeExitOrders()
TradingView alert changes:
- Ensure alerts pass timeframe field (e.g., "timeframe": "5") to enable proper signal quality scoring
- CRITICAL: Include atr field for ATR-based TP/SL system: "atr": {{ta.atr(14)}}
- Without ATR, system falls back to less optimal fixed percentages
ATR-based risk management changes:
- Update multipliers or bounds in .env (ATR_MULTIPLIER_TP1/TP2/SL, MIN/MAX_*_PERCENT)
- Test with known ATR values to verify calculation (e.g., SOL ATR 0.43)
- Log shows: 📊 ATR-based targets: TP1 X.XX%, TP2 Y.YY%, SL Z.ZZ%
- Verify targets fall within safety bounds (TP1: 0.5-1.5%, TP2: 1.0-3.0%, SL: 0.8-2.0%)
- Update Telegram manual trade presets if median ATR changes (currently 0.43 for SOL)
Position Manager changes: ALWAYS execute test trade after deployment

Use /api/trading/test endpoint or Telegram long sol --force
Monitor docker logs -f trading-bot-v4 for full cycle
Verify TP1 hit → 75% close → SL moved to breakeven
SQL: Check tp1Hit, slMovedToBreakeven, currentSize in Trade table
Compare: Position Manager logs vs actual Drift position size

Calculation changes: Add verbose logging and verify with SQL
- Log every intermediate step, especially unit conversions
- Never assume SDK data format - log raw values to verify
- SQL query with manual calculation to compare results
- Test boundary cases: 0%, 100%, min/max values
DEPLOYMENT VERIFICATION (MANDATORY): Before declaring ANY fix working:
- Check container start time vs commit timestamp
- If container older than commit: CODE NOT DEPLOYED
- Restart container and verify new code is running
- Never say "fixed" or "protected" without deployment confirmation
- This is a REAL MONEY system - unverified fixes cause losses
GIT COMMIT AND PUSH (MANDATORY): After completing ANY feature, fix, or significant change:
- ALWAYS commit changes with descriptive message
- ALWAYS push to remote repository
- User should NOT have to ask for this - it's part of completion
- Commit message format:
```
git add -A
git commit -m "type: brief description

- Bullet point details
- Files changed
- Why the change was needed
"
git push
```
- Types: feat: (feature), fix: (bug fix), docs: (documentation), refactor: (code restructure)
- This is NOT optional - code exists only when committed and pushed
NEXTCLOUD DECK SYNC (MANDATORY): After completing phases or making significant roadmap progress:
- Update roadmap markdown files with new status (🔄 IN PROGRESS, ✅ COMPLETE, 🔜 NEXT)
- Run sync to update Deck cards: python3 scripts/sync-roadmap-to-deck.py --init
- Move cards between stacks in Nextcloud Deck UI to reflect progress visually
- Backlog (📥) → Planning (📋) → In Progress (🚀) → Complete (✅)
- Keep Deck in sync with actual work - it's the visual roadmap tracker
- Documentation: docs/NEXTCLOUD_DECK_SYNC.md
UPDATE COPILOT-INSTRUCTIONS.MD (MANDATORY): After implementing ANY significant feature or system change:
- Document new database fields and their purpose
- Add filtering requirements (e.g., manual vs TradingView trades)
- Update "Important fields" sections with new schema changes
- Add new API endpoints to the architecture overview
- Document data integrity requirements (what must be excluded from analysis)
- Add SQL query patterns for common operations
- Update "When Making Changes" section with new patterns learned
- Create reference docs in docs/ for complex features (e.g., MANUAL_TRADE_FILTERING.md)
- WHY: Future AI agents need complete context to maintain data integrity and avoid breaking analysis
- EXAMPLES: signalSource field for filtering, MAE/MFE tracking, phantom trade detection

Development Roadmap

Current Status (Nov 14, 2025):

168 trades executed with quality scores and MAE/MFE tracking
Capital: $97.55 USDC at 100% health (zero debt, all USDC collateral)
Leverage: 15x SOL (reduced from 20x for safer liquidation cushion)
Three active optimization initiatives in data collection phase:
1. Signal Quality: 0/20 blocked signals collected → need 10-20 for analysis
2. Position Scaling: 161 v5 trades, collecting v6 data → need 50+ v6 trades
3. ATR-based TP: 1/50 trades with ATR data → need 50 for validation
Expected combined impact: 35-40% P&L improvement when all three optimizations complete
Master roadmap: See OPTIMIZATION_MASTER_ROADMAP.md for consolidated view

See SIGNAL_QUALITY_OPTIMIZATION_ROADMAP.md for systematic signal quality improvements:

Phase 1 (🔄 IN PROGRESS): Collect 10-20 blocked signals with quality scores (1-2 weeks)
Phase 2 (🔜 NEXT): Analyze patterns and make data-driven threshold decisions
Phase 3 (🎯 FUTURE): Implement dual-threshold system or other optimizations based on data
Phase 4 (🤖 FUTURE): Automated price analysis for blocked signals
Phase 5 (🧠 DISTANT): ML-based scoring weight optimization

See POSITION_SCALING_ROADMAP.md for planned position management optimizations:

Phase 1 (✅ COMPLETE): Collect data with quality scores (20-50 trades needed)
Phase 2: ATR-based dynamic targets (adapt to volatility)
Phase 3: Signal quality-based scaling (high quality = larger runners)
Phase 4: Direction-based optimization (shorts vs longs have different performance)
Phase 5 (✅ COMPLETE): TP2-as-runner system implemented - configurable runner (default 25%, adjustable via TAKE_PROFIT_1_SIZE_PERCENT) with ATR-based trailing stop
Phase 6: ML-based exit prediction (future)

Recent Implementation: TP2-as-runner system provides 5x larger runner (default 25% vs old 5%) for better profit capture on extended moves. When TP2 price is hit, trailing stop activates on full remaining position instead of closing partial amount. Runner size is configurable (100% - TP1 close %).

Blocked Signals Tracking (Nov 11, 2025): System now automatically saves all blocked signals to database for data-driven optimization. See BLOCKED_SIGNALS_TRACKING.md for SQL queries and analysis workflows.

Data-driven approach: Each phase requires validation through SQL analysis before implementation. No premature optimization.

Signal Quality Version Tracking: Database tracks signalQualityVersion field to compare algorithm performance:

Analytics dashboard shows version comparison: trades, win rate, P&L, extreme position stats
v4 (current) includes blocked signals tracking for data-driven optimization
Focus on extreme positions (< 15% range) - v3 aimed to reduce losses from weak ADX entries
SQL queries in docs/analysis/SIGNAL_QUALITY_VERSION_ANALYSIS.sql for deep-dive analysis
Need 20+ trades per version before meaningful comparison

Financial Roadmap Integration: All technical improvements must align with current phase objectives (see top of document):

Phase 1 (CURRENT): Prove system works, compound aggressively, 60%+ win rate mandatory
Phase 2-3: Transition to sustainable growth while funding withdrawals
Phase 4+: Scale capital while reducing risk progressively
See TRADING_GOALS.md for complete 8-phase plan ($106 → $1M+)
SQL queries in docs/analysis/SIGNAL_QUALITY_VERSION_ANALYSIS.sql for deep-dive analysis
Need 20+ trades per version before meaningful comparison

Blocked Signals Analysis: See BLOCKED_SIGNALS_TRACKING.md for:

SQL queries to analyze blocked signal patterns
Score distribution and metric analysis
Comparison with executed trades at similar quality levels
Future automation of price tracking (would TP1/TP2/SL have hit?)

Telegram Notifications (Nov 16, 2025)

Position Closure Notifications: System sends direct Telegram messages for all position closures via lib/notifications/telegram.ts

Implemented for:

TP1/TP2 exits (Position Manager auto-exits)
Stop loss triggers (SL, soft SL, hard SL, emergency)
Manual closures (via API or settings UI)
Ghost position cleanups (external closure detection)

Notification format:

🎯 POSITION CLOSED

📈 SOL-PERP LONG

💰 P&L: $12.45 (+2.34%)
📊 Size: $48.75

📍 Entry: $168.50
🎯 Exit: $172.45

⏱ Hold Time: 1h 23m
🔚 Exit: TP1
📈 Max Gain: +3.12%
📉 Max Drawdown: -0.45%

Configuration: Requires TELEGRAM_BOT_TOKEN and TELEGRAM_CHAT_ID in .env

Code location:

lib/notifications/telegram.ts - sendPositionClosedNotification()
lib/trading/position-manager.ts - Integrated in executeExit() and handleExternalClosure()

Commit: b1ca454 "feat: Add Telegram notifications for position closures"

Integration Points

n8n: Expects exact response format from /api/trading/execute (see n8n-complete-workflow.json)
Drift Protocol: Uses SDK v2.75.0 - check docs at docs.drift.trade for API changes
Pyth Network: WebSocket + HTTP fallback for price feeds (handles reconnection)
PostgreSQL: Version 16-alpine, must be running before bot starts

Key Mental Model: Think of this as two parallel systems (on-chain orders + software monitoring) working together. The Position Manager is the "backup brain" that constantly watches and acts if on-chain orders fail. Both write to the same database for complete trade history.

140 KiB Raw Blame History Unescape Escape

AI Agent Instructions for Trading Bot v4

Mission & Financial Goals

Architecture Overview

VERIFICATION MANDATE: Financial Code Requires Proof

Critical Path Verification Requirements

Red Flags Requiring Extra Verification

SQL Verification Queries

Example: How Position.size Bug Should Have Been Caught

Deployment Checklist

When to Escalate to User

Docker Build Best Practices

Docker Cleanup After Builds

Critical Components

1. Phantom Trade Auto-Closure System

2. Signal Quality Scoring (lib/trading/signal-quality.ts)

2. Position Manager (lib/trading/position-manager.ts)

3. Telegram Bot (telegram_command_bot.py)

4. Rate Limit Monitoring (lib/drift/orders.ts + app/api/analytics/rate-limits)

5. Order Placement (lib/drift/orders.ts)

6. Database (lib/database/trades.ts + prisma/schema.prisma)

ATR-Based Risk Management (Nov 17, 2025)

Configuration System

API Endpoints Architecture

Critical Workflows

Execute Trade (Production)

Position Monitoring Loop

Settings Update

Docker Context

Project-Specific Patterns

1. Singleton Services

2. Price Calculations

3. Error Handling

4. Reduce-Only Orders

5. Nextcloud Deck Roadmap Sync

Testing Commands

SQL Analysis Queries

Phase 1: Monitor Data Collection Progress

Phase 2: Compare Blocked vs Executed Trades

Analyze Specific Patterns

Common Pitfalls

File Conventions

Re-Entry Analytics System (Phase 1)

Per-Symbol Trading Controls

When Making Changes

Development Roadmap

Telegram Notifications (Nov 16, 2025)

Integration Points

140 KiB

Raw Blame History

2. Signal Quality Scoring (`lib/trading/signal-quality.ts`)

2. Position Manager (`lib/trading/position-manager.ts`)

3. Telegram Bot (`telegram_command_bot.py`)

4. Rate Limit Monitoring (`lib/drift/orders.ts` + `app/api/analytics/rate-limits`)

5. Order Placement (`lib/drift/orders.ts`)

6. Database (`lib/database/trades.ts` + `prisma/schema.prisma`)