diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md index 1b43bb2..fc69261 100644 --- a/.github/copilot-instructions.md +++ b/.github/copilot-instructions.md @@ -393,9 +393,11 @@ await positionManager.addTrade(activeTrade) - Closes positions via `closePosition()` market orders when targets hit - Acts as backup if on-chain orders don't fill - State persistence: Saves to database, restores on restart via `configSnapshot.positionManagerState` +- **Startup validation:** On container restart, cross-checks last 24h "closed" trades against Drift to detect orphaned positions (see `lib/startup/init-position-manager.ts`) - **Grace period for new trades:** Skips "external closure" detection for positions <30 seconds old (Drift positions take 5-10s to propagate) - **Exit reason detection:** Uses trade state flags (`tp1Hit`, `tp2Hit`) and realized P&L to determine exit reason, NOT current price (avoids misclassification when price moves after order fills) - **Real P&L calculation:** Calculates actual profit based on entry vs exit price, not SDK's potentially incorrect values +- **Rate limit-aware exit:** On 429 errors during close, keeps trade in monitoring (doesn't mark closed), retries naturally on next price update ### 3. Telegram Bot (`telegram_command_bot.py`) **Purpose:** Python-based Telegram bot for manual trading commands and position status monitoring @@ -443,12 +445,20 @@ const health = await driftService.getAccountHealth() ### 4. Rate Limit Monitoring (`lib/drift/orders.ts` + `app/api/analytics/rate-limits`) **Purpose:** Track and analyze Solana RPC rate limiting (429 errors) to prevent silent failures -**Retry mechanism with exponential backoff:** +**Helius RPC Limits (Free Tier):** +- **Burst:** 100 requests/second +- **Sustained:** 10 requests/second +- **Monthly:** 100k requests +- See `docs/HELIUS_RATE_LIMITS.md` for upgrade recommendations + +**Retry mechanism with exponential backoff (Nov 14, 2025 - Updated):** ```typescript await retryWithBackoff(async () => { return await driftClient.cancelOrders(...) -}, maxRetries = 3, baseDelay = 2000) +}, maxRetries = 3, baseDelay = 5000) // Increased from 2s to 5s ``` +**Progression:** 5s → 10s → 20s (vs old 2s → 4s → 8s) +**Rationale:** Gives Helius time to recover, reduces cascade pressure by 2.5x **Database logging:** Three event types in SystemEvent table: - `rate_limit_hit`: Each 429 error (logged with attempt #, delay, error snippet) @@ -463,12 +473,19 @@ Returns: Total hits/recoveries/failures, hourly patterns, recovery times, succes **Key behaviors:** - Only RPC calls wrapped: `cancelAllOrders()`, `placeExitOrders()`, `closePosition()` -- Position Manager 2s loop does NOT make RPC calls (only price checks via Pyth WebSocket) -- Exponential backoff: 2s → 4s → 8s delays on retry +- Position Manager monitoring: Event-driven via Pyth WebSocket (not polling) +- Rate limit-aware exit: Position Manager keeps monitoring on 429 errors (retries naturally) - Logs to both console and database for post-trade analysis **Monitoring queries:** See `docs/RATE_LIMIT_MONITORING.md` for SQL queries +**Startup Position Validation (Nov 14, 2025 - Added):** +On container startup, cross-checks last 24h of "closed" trades against actual Drift positions: +- If DB says closed but Drift shows open → reopens in DB to restore Position Manager tracking +- Prevents orphaned positions from failed close transactions +- Logs: `🔴 CRITICAL: ${symbol} marked as CLOSED in DB but still OPEN on Drift!` +- Implementation: `lib/startup/init-position-manager.ts` - `validateOpenTrades()` + ### 5. Order Placement (`lib/drift/orders.ts`) **Critical functions:** - `openPosition()` - Opens market position with transaction confirmation @@ -508,7 +525,7 @@ Solana RPC endpoints return 429 errors under load. Always use retry logic for or export async function retryWithBackoff( operation: () => Promise, maxRetries: number = 3, - initialDelay: number = 2000 + initialDelay: number = 5000 // Increased from 2000ms to 5000ms (Nov 14, 2025) ): Promise { for (let attempt = 0; attempt < maxRetries; attempt++) { try { @@ -529,6 +546,7 @@ export async function retryWithBackoff( // Usage in cancelAllOrders await retryWithBackoff(() => driftClient.cancelOrders(...)) ``` +**Note:** Increased from 2s to 5s base delay to give Helius RPC more recovery time. See `docs/HELIUS_RATE_LIMITS.md` for detailed analysis. Without this, order cancellations fail silently during TP1→breakeven order updates, leaving ghost orders that cause incorrect fills. **Dual Stop System** (USE_DUAL_STOPS=true): @@ -625,6 +643,7 @@ const driftSymbol = normalizeTradingViewSymbol(body.symbol) - `/api/trading/check-risk` - Pre-execution validation (duplicate check, quality score, **per-symbol cooldown**, rate limits, **symbol enabled check**, **saves blocked signals automatically**) - `/api/trading/test` - Test trades from settings UI (no auth required, **respects symbol enable/disable**) - `/api/trading/close` - Manual position closing (requires symbol normalization) +- `/api/trading/sync-positions` - **Force Position Manager sync with Drift** (POST, requires auth) - restores tracking for orphaned positions - `/api/trading/cancel-orders` - **Manual order cleanup** (for stuck/ghost orders after rate limit failures) - `/api/trading/positions` - Query open positions from Drift - `/api/trading/market-data` - Webhook for TradingView market data updates (GET for debug, POST for data) @@ -1094,7 +1113,7 @@ trade.realizedPnL += actualRealizedPnL // NOT: result.realizedPnL from SDK - **Fix:** Automatic retry in `lib/drift/client.ts` - `retryOperation()` wrapper: ```typescript // Detects transient errors: fetch failed, EAI_AGAIN, ENOTFOUND, ETIMEDOUT - // Retries up to 3 times with 2s delay between attempts + // Retries up to 3 times with 2s delay between attempts (DNS-specific, separate from rate limit retries) // Fails fast on non-transient errors (auth, config, permanent network issues) await this.retryOperation(async () => { // Initialize Drift SDK, subscribe, get user account @@ -1102,6 +1121,7 @@ trade.realizedPnL += actualRealizedPnL // NOT: result.realizedPnL from SDK ``` - **Success logs:** `⚠️ Drift initialization failed (attempt 1/3): fetch failed` → `⏳ Retrying in 2000ms...` → `✅ Drift service initialized successfully` - **Impact:** 99% of transient DNS failures now auto-recover, preventing missed trades + - **Note:** DNS retries use 2s delays (fast recovery), rate limit retries use 5s delays (RPC cooldown) - **Documentation:** See `docs/DNS_RETRY_LOGIC.md` for monitoring queries and metrics 29. **Declaring fixes "working" before deployment (CRITICAL - Nov 13, 2025):**