MAJOR FIXES: - ATR-based trailing stop for runners (was fixed 0.3%, now adapts to volatility) - Fixes runners with +7-9% MFE exiting for losses - Typical improvement: 2.24x more room (0.3% → 0.67% at 0.45% ATR) - Enhanced rate limit logging with database tracking - New /api/analytics/rate-limits endpoint for monitoring DETAILS: - Position Manager: Calculate trailing as (atrAtEntry / price × 100) × multiplier - Config: TRAILING_STOP_ATR_MULTIPLIER=1.5, MIN=0.25%, MAX=0.9% - Settings UI: Added ATR multiplier controls - Rate limits: Log hits/recoveries/exhaustions to SystemEvent table - Documentation: ATR_TRAILING_STOP_FIX.md + RATE_LIMIT_MONITORING.md IMPACT: - Runners can now capture big moves (like morning's $172→$162 SOL drop) - Rate limit visibility prevents silent failures - Data-driven optimization for RPC endpoint health
161 lines
5.2 KiB
Markdown
161 lines
5.2 KiB
Markdown
# Rate Limit Monitoring - SQL Queries
|
|
|
|
## Quick Access
|
|
```bash
|
|
# View rate limit analytics via API
|
|
curl http://localhost:3001/api/analytics/rate-limits | python3 -m json.tool
|
|
|
|
# Direct database queries
|
|
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4
|
|
```
|
|
|
|
## Common Queries
|
|
|
|
### 1. Recent Rate Limit Events (Last 24 Hours)
|
|
```sql
|
|
SELECT
|
|
"eventType",
|
|
message,
|
|
details,
|
|
TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as time
|
|
FROM "SystemEvent"
|
|
WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted')
|
|
AND "createdAt" > NOW() - INTERVAL '24 hours'
|
|
ORDER BY "createdAt" DESC
|
|
LIMIT 20;
|
|
```
|
|
|
|
### 2. Rate Limit Statistics (Last 7 Days)
|
|
```sql
|
|
SELECT
|
|
"eventType",
|
|
COUNT(*) as occurrences,
|
|
MIN("createdAt") as first_seen,
|
|
MAX("createdAt") as last_seen
|
|
FROM "SystemEvent"
|
|
WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted')
|
|
AND "createdAt" > NOW() - INTERVAL '7 days'
|
|
GROUP BY "eventType"
|
|
ORDER BY occurrences DESC;
|
|
```
|
|
|
|
### 3. Rate Limit Pattern by Hour (Find Peak Times)
|
|
```sql
|
|
SELECT
|
|
EXTRACT(HOUR FROM "createdAt") as hour,
|
|
COUNT(*) as rate_limit_hits,
|
|
COUNT(DISTINCT DATE("createdAt")) as days_affected
|
|
FROM "SystemEvent"
|
|
WHERE "eventType" = 'rate_limit_hit'
|
|
AND "createdAt" > NOW() - INTERVAL '7 days'
|
|
GROUP BY EXTRACT(HOUR FROM "createdAt")
|
|
ORDER BY rate_limit_hits DESC;
|
|
```
|
|
|
|
### 4. Recovery Time Analysis
|
|
```sql
|
|
SELECT
|
|
(details->>'retriesNeeded')::int as retries,
|
|
(details->>'totalTimeMs')::int as recovery_ms,
|
|
TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as recovered_at
|
|
FROM "SystemEvent"
|
|
WHERE "eventType" = 'rate_limit_recovered'
|
|
AND "createdAt" > NOW() - INTERVAL '7 days'
|
|
ORDER BY recovery_ms DESC;
|
|
```
|
|
|
|
### 5. Failed Recoveries (Exhausted Retries)
|
|
```sql
|
|
SELECT
|
|
details->>'errorMessage' as error,
|
|
(details->>'totalTimeMs')::int as failed_after_ms,
|
|
TO_CHAR("createdAt", 'MM-DD HH24:MI:SS') as failed_at
|
|
FROM "SystemEvent"
|
|
WHERE "eventType" = 'rate_limit_exhausted'
|
|
AND "createdAt" > NOW() - INTERVAL '7 days'
|
|
ORDER BY "createdAt" DESC;
|
|
```
|
|
|
|
### 6. Rate Limit Health Score (Last 24h)
|
|
```sql
|
|
SELECT
|
|
COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) as total_hits,
|
|
COUNT(CASE WHEN "eventType" = 'rate_limit_recovered' THEN 1 END) as recovered,
|
|
COUNT(CASE WHEN "eventType" = 'rate_limit_exhausted' THEN 1 END) as failed,
|
|
CASE
|
|
WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) = 0 THEN '✅ HEALTHY'
|
|
WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_exhausted' THEN 1 END) > 0 THEN '🔴 CRITICAL'
|
|
WHEN COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END) > 10 THEN '⚠️ WARNING'
|
|
ELSE '✅ HEALTHY'
|
|
END as health_status,
|
|
ROUND(100.0 * COUNT(CASE WHEN "eventType" = 'rate_limit_recovered' THEN 1 END) /
|
|
NULLIF(COUNT(CASE WHEN "eventType" = 'rate_limit_hit' THEN 1 END), 0), 1) as recovery_rate
|
|
FROM "SystemEvent"
|
|
WHERE "eventType" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted')
|
|
AND "createdAt" > NOW() - INTERVAL '24 hours';
|
|
```
|
|
|
|
## What to Watch For
|
|
|
|
### 🔴 Critical Alerts
|
|
- **rate_limit_exhausted** events: Order placement/cancellation failed completely
|
|
- Recovery rate below 80%: System struggling to handle rate limits
|
|
- Multiple exhausted events in short time: RPC endpoint may be degraded
|
|
|
|
### ⚠️ Warnings
|
|
- More than 10 rate_limit_hit events per hour: High trading frequency
|
|
- Recovery times > 10 seconds: Backoff delays stacking up
|
|
- Rate limits during specific hours: Identify peak Solana network times
|
|
|
|
### ✅ Healthy Patterns
|
|
- 100% recovery rate: All rate limits handled successfully
|
|
- Recovery times 2-4 seconds: Retries working efficiently
|
|
- Zero rate_limit_exhausted events: No failed operations
|
|
|
|
## Optimization Actions
|
|
|
|
**If seeing frequent rate limits:**
|
|
1. Increase `baseDelay` in `retryWithBackoff()` (currently 2000ms)
|
|
2. Add delay between `cancelAllOrders()` and `placeExitOrders()` (currently immediate)
|
|
3. Consider using a faster RPC endpoint (Helius Pro, Triton, etc.)
|
|
4. Batch order operations if possible
|
|
|
|
**If seeing exhausted retries:**
|
|
1. Increase `maxRetries` from 3 to 5
|
|
2. Increase exponential backoff multiplier (currently 2x)
|
|
3. Check RPC endpoint health/status page
|
|
4. Consider implementing circuit breaker pattern
|
|
|
|
## Live Monitoring Commands
|
|
|
|
```bash
|
|
# Watch rate limits in real-time
|
|
docker logs -f trading-bot-v4 | grep -i "rate limit"
|
|
|
|
# Count rate limit events today
|
|
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c "
|
|
SELECT COUNT(*) FROM \"SystemEvent\"
|
|
WHERE \"eventType\" = 'rate_limit_hit'
|
|
AND DATE(\"createdAt\") = CURRENT_DATE;"
|
|
|
|
# Check latest rate limit event
|
|
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c "
|
|
SELECT * FROM \"SystemEvent\"
|
|
WHERE \"eventType\" IN ('rate_limit_hit', 'rate_limit_recovered', 'rate_limit_exhausted')
|
|
ORDER BY \"createdAt\" DESC LIMIT 1;"
|
|
```
|
|
|
|
## Integration with Alerts
|
|
|
|
When implementing automated alerts, trigger on:
|
|
- Any `rate_limit_exhausted` event (critical)
|
|
- More than 5 `rate_limit_hit` events in 5 minutes (warning)
|
|
- Recovery rate below 90% over 1 hour (warning)
|
|
|
|
Log format examples:
|
|
```
|
|
✅ Retry successful after 2341ms (1 retries)
|
|
⏳ Rate limited (429), retrying in 2s... (attempt 1/3)
|
|
❌ RATE LIMIT EXHAUSTED: Failed after 3 retries and 14523ms
|
|
```
|