Files
trading_bot_v4/docs/RATE_LIMIT_MONITORING.md
mindesbunister 03e91fc18d feat: ATR-based trailing stop + rate limit monitoring
MAJOR FIXES:
- ATR-based trailing stop for runners (was fixed 0.3%, now adapts to volatility)
- Fixes runners with +7-9% MFE exiting for losses
- Typical improvement: 2.24x more room (0.3% → 0.67% at 0.45% ATR)
- Enhanced rate limit logging with database tracking
- New /api/analytics/rate-limits endpoint for monitoring

DETAILS:
- Position Manager: Calculate trailing as (atrAtEntry / price × 100) × multiplier
- Config: TRAILING_STOP_ATR_MULTIPLIER=1.5, MIN=0.25%, MAX=0.9%
- Settings UI: Added ATR multiplier controls
- Rate limits: Log hits/recoveries/exhaustions to SystemEvent table
- Documentation: ATR_TRAILING_STOP_FIX.md + RATE_LIMIT_MONITORING.md

IMPACT:
- Runners can now capture big moves (like morning's $172→$162 SOL drop)
- Rate limit visibility prevents silent failures
- Data-driven optimization for RPC endpoint health
2025-11-11 14:51:41 +01:00

5.2 KiB

Rate Limit Monitoring - SQL Queries

Quick Access

# View rate limit analytics via API
curl http://localhost:3001/api/analytics/rate-limits | python3 -m json.tool

# Direct database queries
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4

Common Queries

1. Recent Rate Limit Events (Last 24 Hours)

SELECT 
  "eventType",
  message,
  details,
  TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as time
FROM "SystemEvent"
WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted')
  AND "createdAt" > NOW() - INTERVAL '24 hours'
ORDER BY "createdAt" DESC
LIMIT 20;

2. Rate Limit Statistics (Last 7 Days)

SELECT 
  "eventType",
  COUNT(*) as occurrences,
  MIN("createdAt") as first_seen,
  MAX("createdAt") as last_seen
FROM "SystemEvent"
WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted')
  AND "createdAt" > NOW() - INTERVAL '7 days'
GROUP BY "eventType"
ORDER BY occurrences DESC;

3. Rate Limit Pattern by Hour (Find Peak Times)

SELECT 
  EXTRACT(HOUR FROM "createdAt") as hour,
  COUNT(*) as rate_limit_hits,
  COUNT(DISTINCT DATE("createdAt")) as days_affected
FROM "SystemEvent"
WHERE "eventType" = 'rate_limit_hit'
  AND "createdAt" > NOW() - INTERVAL '7 days'
GROUP BY EXTRACT(HOUR FROM "createdAt")
ORDER BY rate_limit_hits DESC;

4. Recovery Time Analysis

SELECT 
  (details->>'retriesNeeded')::int as retries,
  (details->>'totalTimeMs')::int as recovery_ms,
  TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as recovered_at
FROM "SystemEvent"
WHERE "eventType" = 'rate_limit_recovered'
  AND "createdAt" > NOW() - INTERVAL '7 days'
ORDER BY recovery_ms DESC;

5. Failed Recoveries (Exhausted Retries)

SELECT 
  details->>'errorMessage' as error,
  (details->>'totalTimeMs')::int as failed_after_ms,
  TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as failed_at
FROM "SystemEvent"
WHERE "eventType" = 'rate_limit_exhausted'
  AND "createdAt" > NOW() - INTERVAL '7 days'
ORDER BY "createdAt" DESC;

6. Rate Limit Health Score (Last 24h)

SELECT 
  COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) as total_hits,
  COUNT(CASE WHEN "eventType" = 'rate_limit_recovered' THEN 1 END) as recovered,
  COUNT(CASE WHEN "eventType" = 'rate_limit_exhausted' THEN 1 END) as failed,
  CASE 
    WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) = 0 THEN '✅ HEALTHY'
    WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_exhausted' THEN 1 END) > 0 THEN '🔴 CRITICAL'
    WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) > 10 THEN '⚠️ WARNING'
    ELSE '✅ HEALTHY'
  END as health_status,
  ROUND(100.0 * COUNT(CASE WHEN "eventType" = 'rate_limit_recovered' THEN 1 END) / 
        NULLIF(COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END), 0), 1) as recovery_rate
FROM "SystemEvent"
WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted')
  AND "createdAt" > NOW() - INTERVAL '24 hours';

What to Watch For

🔴 Critical Alerts

  • rate_limit_exhausted events: Order placement/cancellation failed completely
  • Recovery rate below 80%: System struggling to handle rate limits
  • Multiple exhausted events in short time: RPC endpoint may be degraded

⚠️ Warnings

  • More than 10 rate_limit_hit events per hour: High trading frequency
  • Recovery times > 10 seconds: Backoff delays stacking up
  • Rate limits during specific hours: Identify peak Solana network times

Healthy Patterns

  • 100% recovery rate: All rate limits handled successfully
  • Recovery times 2-4 seconds: Retries working efficiently
  • Zero rate_limit_exhausted events: No failed operations

Optimization Actions

If seeing frequent rate limits:

  1. Increase baseDelay in retryWithBackoff() (currently 2000ms)
  2. Add delay between cancelAllOrders() and placeExitOrders() (currently immediate)
  3. Consider using a faster RPC endpoint (Helius Pro, Triton, etc.)
  4. Batch order operations if possible

If seeing exhausted retries:

  1. Increase maxRetries from 3 to 5
  2. Increase exponential backoff multiplier (currently 2x)
  3. Check RPC endpoint health/status page
  4. Consider implementing circuit breaker pattern

Live Monitoring Commands

# Watch rate limits in real-time
docker logs -f trading-bot-v4 | grep -i "rate limit"

# Count rate limit events today
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c "
SELECT COUNT(*) FROM \"SystemEvent\" 
WHERE \"eventType\" = 'rate_limit_hit' 
AND DATE(\"createdAt\") = CURRENT_DATE;"

# Check latest rate limit event
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c "
SELECT * FROM \"SystemEvent\" 
WHERE \"eventType\" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted') 
ORDER BY \"createdAt\" DESC LIMIT 1;"

Integration with Alerts

When implementing automated alerts, trigger on:

  • Any rate_limit_exhausted event (critical)
  • More than 5 rate_limit_hit events in 5 minutes (warning)
  • Recovery rate below 90% over 1 hour (warning)

Log format examples:

✅ Retry successful after 2341ms (1 retries)
⏳ Rate limited (429), retrying in 2s... (attempt 1/3)
❌ RATE LIMIT EXHAUSTED: Failed after 3 retries and 14523ms