# Rate Limit Monitoring - SQL Queries ## Quick Access ```bash # View rate limit analytics via API curl http://localhost:3001/api/analytics/rate-limits | python3 -m json.tool # Direct database queries docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 ``` ## Common Queries ### 1. Recent Rate Limit Events (Last 24 Hours) ```sql SELECT "eventType", message, details, TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as time FROM "SystemEvent" WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted') AND "createdAt" > NOW() - INTERVAL '24 hours' ORDER BY "createdAt" DESC LIMIT 20; ``` ### 2. Rate Limit Statistics (Last 7 Days) ```sql SELECT "eventType", COUNT(*) as occurrences, MIN("createdAt") as first_seen, MAX("createdAt") as last_seen FROM "SystemEvent" WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted') AND "createdAt" > NOW() - INTERVAL '7 days' GROUP BY "eventType" ORDER BY occurrences DESC; ``` ### 3. Rate Limit Pattern by Hour (Find Peak Times) ```sql SELECT EXTRACT(HOUR FROM "createdAt") as hour, COUNT(*) as rate_limit_hits, COUNT(DISTINCT DATE("createdAt")) as days_affected FROM "SystemEvent" WHERE "eventType" = 'rate_limit_hit' AND "createdAt" > NOW() - INTERVAL '7 days' GROUP BY EXTRACT(HOUR FROM "createdAt") ORDER BY rate_limit_hits DESC; ``` ### 4. Recovery Time Analysis ```sql SELECT (details->>'retriesNeeded')::int as retries, (details->>'totalTimeMs')::int as recovery_ms, TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as recovered_at FROM "SystemEvent" WHERE "eventType" = 'rate_limit_recovered' AND "createdAt" > NOW() - INTERVAL '7 days' ORDER BY recovery_ms DESC; ``` ### 5. Failed Recoveries (Exhausted Retries) ```sql SELECT details->>'errorMessage' as error, (details->>'totalTimeMs')::int as failed_after_ms, TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as failed_at FROM "SystemEvent" WHERE "eventType" = 'rate_limit_exhausted' AND "createdAt" > NOW() - INTERVAL '7 days' ORDER BY "createdAt" DESC; ``` ### 6. Rate Limit Health Score (Last 24h) ```sql SELECT COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) as total_hits, COUNT(CASE WHEN "eventType" = 'rate_limit_recovered' THEN 1 END) as recovered, COUNT(CASE WHEN "eventType" = 'rate_limit_exhausted' THEN 1 END) as failed, CASE WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) = 0 THEN '✅ HEALTHY' WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_exhausted' THEN 1 END) > 0 THEN '🔴 CRITICAL' WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) > 10 THEN '⚠️ WARNING' ELSE '✅ HEALTHY' END as health_status, ROUND(100.0 * COUNT(CASE WHEN "eventType" = 'rate_limit_recovered' THEN 1 END) / NULLIF(COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END), 0), 1) as recovery_rate FROM "SystemEvent" WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted') AND "createdAt" > NOW() - INTERVAL '24 hours'; ``` ## What to Watch For ### 🔴 Critical Alerts - **rate_limit_exhausted** events: Order placement/cancellation failed completely - Recovery rate below 80%: System struggling to handle rate limits - Multiple exhausted events in short time: RPC endpoint may be degraded ### ⚠️ Warnings - More than 10 rate_limit_hit events per hour: High trading frequency - Recovery times > 10 seconds: Backoff delays stacking up - Rate limits during specific hours: Identify peak Solana network times ### ✅ Healthy Patterns - 100% recovery rate: All rate limits handled successfully - Recovery times 2-4 seconds: Retries working efficiently - Zero rate_limit_exhausted events: No failed operations ## Optimization Actions **If seeing frequent rate limits:** 1. Increase `baseDelay` in `retryWithBackoff()` (currently 2000ms) 2. Add delay between `cancelAllOrders()` and `placeExitOrders()` (currently immediate) 3. Consider using a faster RPC endpoint (Helius Pro, Triton, etc.) 4. Batch order operations if possible **If seeing exhausted retries:** 1. Increase `maxRetries` from 3 to 5 2. Increase exponential backoff multiplier (currently 2x) 3. Check RPC endpoint health/status page 4. Consider implementing circuit breaker pattern ## Live Monitoring Commands ```bash # Watch rate limits in real-time docker logs -f trading-bot-v4 | grep -i "rate limit" # Count rate limit events today docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c " SELECT COUNT(*) FROM \"SystemEvent\" WHERE \"eventType\" = 'rate_limit_hit' AND DATE(\"createdAt\") = CURRENT_DATE;" # Check latest rate limit event docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c " SELECT * FROM \"SystemEvent\" WHERE \"eventType\" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted') ORDER BY \"createdAt\" DESC LIMIT 1;" ``` ## Integration with Alerts When implementing automated alerts, trigger on: - Any `rate_limit_exhausted` event (critical) - More than 5 `rate_limit_hit` events in 5 minutes (warning) - Recovery rate below 90% over 1 hour (warning) Log format examples: ``` ✅ Retry successful after 2341ms (1 retries) ⏳ Rate limited (429), retrying in 2s... (attempt 1/3) ❌ RATE LIMIT EXHAUSTED: Failed after 3 retries and 14523ms ```