Files
trading_bot_v4/docs/RATE_LIMIT_MONITORING.md
mindesbunister 03e91fc18d feat: ATR-based trailing stop + rate limit monitoring
MAJOR FIXES:
- ATR-based trailing stop for runners (was fixed 0.3%, now adapts to volatility)
- Fixes runners with +7-9% MFE exiting for losses
- Typical improvement: 2.24x more room (0.3% → 0.67% at 0.45% ATR)
- Enhanced rate limit logging with database tracking
- New /api/analytics/rate-limits endpoint for monitoring

DETAILS:
- Position Manager: Calculate trailing as (atrAtEntry / price × 100) × multiplier
- Config: TRAILING_STOP_ATR_MULTIPLIER=1.5, MIN=0.25%, MAX=0.9%
- Settings UI: Added ATR multiplier controls
- Rate limits: Log hits/recoveries/exhaustions to SystemEvent table
- Documentation: ATR_TRAILING_STOP_FIX.md + RATE_LIMIT_MONITORING.md

IMPACT:
- Runners can now capture big moves (like morning's $172→$162 SOL drop)
- Rate limit visibility prevents silent failures
- Data-driven optimization for RPC endpoint health
2025-11-11 14:51:41 +01:00

161 lines
5.2 KiB
Markdown

# Rate Limit Monitoring - SQL Queries
## Quick Access
```bash
# View rate limit analytics via API
curl http://localhost:3001/api/analytics/rate-limits | python3 -m json.tool
# Direct database queries
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4
```
## Common Queries
### 1. Recent Rate Limit Events (Last 24 Hours)
```sql
SELECT
"eventType",
message,
details,
TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as time
FROM "SystemEvent"
WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted')
AND "createdAt" > NOW() - INTERVAL '24 hours'
ORDER BY "createdAt" DESC
LIMIT 20;
```
### 2. Rate Limit Statistics (Last 7 Days)
```sql
SELECT
"eventType",
COUNT(*) as occurrences,
MIN("createdAt") as first_seen,
MAX("createdAt") as last_seen
FROM "SystemEvent"
WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted')
AND "createdAt" > NOW() - INTERVAL '7 days'
GROUP BY "eventType"
ORDER BY occurrences DESC;
```
### 3. Rate Limit Pattern by Hour (Find Peak Times)
```sql
SELECT
EXTRACT(HOUR FROM "createdAt") as hour,
COUNT(*) as rate_limit_hits,
COUNT(DISTINCT DATE("createdAt")) as days_affected
FROM "SystemEvent"
WHERE "eventType" = 'rate_limit_hit'
AND "createdAt" > NOW() - INTERVAL '7 days'
GROUP BY EXTRACT(HOUR FROM "createdAt")
ORDER BY rate_limit_hits DESC;
```
### 4. Recovery Time Analysis
```sql
SELECT
(details->>'retriesNeeded')::int as retries,
(details->>'totalTimeMs')::int as recovery_ms,
TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as recovered_at
FROM "SystemEvent"
WHERE "eventType" = 'rate_limit_recovered'
AND "createdAt" > NOW() - INTERVAL '7 days'
ORDER BY recovery_ms DESC;
```
### 5. Failed Recoveries (Exhausted Retries)
```sql
SELECT
details->>'errorMessage' as error,
(details->>'totalTimeMs')::int as failed_after_ms,
TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as failed_at
FROM "SystemEvent"
WHERE "eventType" = 'rate_limit_exhausted'
AND "createdAt" > NOW() - INTERVAL '7 days'
ORDER BY "createdAt" DESC;
```
### 6. Rate Limit Health Score (Last 24h)
```sql
SELECT
COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) as total_hits,
COUNT(CASE WHEN "eventType" = 'rate_limit_recovered' THEN 1 END) as recovered,
COUNT(CASE WHEN "eventType" = 'rate_limit_exhausted' THEN 1 END) as failed,
CASE
WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) = 0 THEN '✅ HEALTHY'
WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_exhausted' THEN 1 END) > 0 THEN '🔴 CRITICAL'
WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) > 10 THEN '⚠️ WARNING'
ELSE '✅ HEALTHY'
END as health_status,
ROUND(100.0 * COUNT(CASE WHEN "eventType" = 'rate_limit_recovered' THEN 1 END) /
NULLIF(COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END), 0), 1) as recovery_rate
FROM "SystemEvent"
WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted')
AND "createdAt" > NOW() - INTERVAL '24 hours';
```
## What to Watch For
### 🔴 Critical Alerts
- **rate_limit_exhausted** events: Order placement/cancellation failed completely
- Recovery rate below 80%: System struggling to handle rate limits
- Multiple exhausted events in short time: RPC endpoint may be degraded
### ⚠️ Warnings
- More than 10 rate_limit_hit events per hour: High trading frequency
- Recovery times > 10 seconds: Backoff delays stacking up
- Rate limits during specific hours: Identify peak Solana network times
### ✅ Healthy Patterns
- 100% recovery rate: All rate limits handled successfully
- Recovery times 2-4 seconds: Retries working efficiently
- Zero rate_limit_exhausted events: No failed operations
## Optimization Actions
**If seeing frequent rate limits:**
1. Increase `baseDelay` in `retryWithBackoff()` (currently 2000ms)
2. Add delay between `cancelAllOrders()` and `placeExitOrders()` (currently immediate)
3. Consider using a faster RPC endpoint (Helius Pro, Triton, etc.)
4. Batch order operations if possible
**If seeing exhausted retries:**
1. Increase `maxRetries` from 3 to 5
2. Increase exponential backoff multiplier (currently 2x)
3. Check RPC endpoint health/status page
4. Consider implementing circuit breaker pattern
## Live Monitoring Commands
```bash
# Watch rate limits in real-time
docker logs -f trading-bot-v4 | grep -i "rate limit"
# Count rate limit events today
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c "
SELECT COUNT(*) FROM \"SystemEvent\"
WHERE \"eventType\" = 'rate_limit_hit'
AND DATE(\"createdAt\") = CURRENT_DATE;"
# Check latest rate limit event
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c "
SELECT * FROM \"SystemEvent\"
WHERE \"eventType\" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted')
ORDER BY \"createdAt\" DESC LIMIT 1;"
```
## Integration with Alerts
When implementing automated alerts, trigger on:
- Any `rate_limit_exhausted` event (critical)
- More than 5 `rate_limit_hit` events in 5 minutes (warning)
- Recovery rate below 90% over 1 hour (warning)
Log format examples:
```
✅ Retry successful after 2341ms (1 retries)
⏳ Rate limited (429), retrying in 2s... (attempt 1/3)
❌ RATE LIMIT EXHAUSTED: Failed after 3 retries and 14523ms
```