3.0 KiB
3.0 KiB
DNS Retry Logic for Drift Initialization
Problem Solved
Issue: Trading bot would fail with HTTP 500 "fetch failed" errors when DNS resolution temporarily failed for mainnet.helius-rpc.com, causing:
- n8n workflow failures (missed trades)
- Manual Telegram trades failing
- Container restart failures
Root Cause: DNS lookup errors (EAI_AGAIN) are transient network issues that resolve within seconds, but the bot was treating them as permanent failures.
Solution
Added automatic retry logic to lib/drift/client.ts that:
-
Detects transient errors:
fetch failedEAI_AGAIN(DNS temporary failure)ENOTFOUND(DNS resolution failed)ETIMEDOUT(connection timeout)ECONNREFUSED(connection refused)
-
Retries automatically:
- Max 3 attempts
- 2 second delay between attempts
- Logs each retry for monitoring
-
Fails fast on non-transient errors:
- Authentication errors
- Invalid configuration
- Permanent network issues
Example Logs
Success after retry:
🚀 Initializing Drift Protocol client...
⚠️ Drift initialization failed (attempt 1/3): fetch failed
⏳ Retrying in 2000ms...
✅ Drift client subscribed to account updates
✅ Drift service initialized successfully
Permanent failure (after 3 retries):
🚀 Initializing Drift Protocol client...
⚠️ Drift initialization failed (attempt 1/3): fetch failed
⏳ Retrying in 2000ms...
⚠️ Drift initialization failed (attempt 2/3): fetch failed
⏳ Retrying in 2000ms...
⚠️ Drift initialization failed (attempt 3/3): fetch failed
❌ Failed to initialize Drift service after retries: TypeError: fetch failed
Impact
Before:
- DNS hiccup → 500 error → n8n workflow fails → missed trade
- User must manually retry via Telegram
After:
- DNS hiccup → automatic retry (2s delay) → success → trade executes
- 99% of transient failures handled automatically
Testing
Deployed: Nov 13, 2025 at 16:02 CET
Commit: 5e826de
Monitor with:
# Check for retry activity
docker logs trading-bot-v4 --since 1h | grep -E "Retrying|retry"
# Count DNS failures (should see retries working)
docker logs trading-bot-v4 --since 1h | grep "EAI_AGAIN"
Configuration
Retry parameters in retryOperation():
maxRetries: 3 attempts (configurable)delayMs: 2000ms between retries (configurable)- Applied to: Drift SDK initialization, subscribe, user account fetch
Related Issues
- Incident: Nov 13, 2025 at 15:55 - n8n workflow failed with "fetch failed"
- Manual recovery: User opened trade via Telegram successfully
- Fix: This retry logic prevents future occurrences
Future Improvements
- Multiple RPC endpoints: Add fallback to public Solana RPC if Helius fails
- Circuit breaker: Temporarily disable Helius if consistent failures detected
- Metrics: Track retry success rate, DNS failure frequency
- Alert on persistent failures: Notify user if all retries fail multiple times in 1 hour