# DNS Retry Logic for Drift Initialization ## Problem Solved **Issue:** Trading bot would fail with HTTP 500 "fetch failed" errors when DNS resolution temporarily failed for `mainnet.helius-rpc.com`, causing: - n8n workflow failures (missed trades) - Manual Telegram trades failing - Container restart failures **Root Cause:** DNS lookup errors (`EAI_AGAIN`) are transient network issues that resolve within seconds, but the bot was treating them as permanent failures. ## Solution Added automatic retry logic to `lib/drift/client.ts` that: 1. **Detects transient errors:** - `fetch failed` - `EAI_AGAIN` (DNS temporary failure) - `ENOTFOUND` (DNS resolution failed) - `ETIMEDOUT` (connection timeout) - `ECONNREFUSED` (connection refused) 2. **Retries automatically:** - Max 3 attempts - 2 second delay between attempts - Logs each retry for monitoring 3. **Fails fast on non-transient errors:** - Authentication errors - Invalid configuration - Permanent network issues ## Example Logs **Success after retry:** ``` 🚀 Initializing Drift Protocol client... ⚠️ Drift initialization failed (attempt 1/3): fetch failed ⏳ Retrying in 2000ms... ✅ Drift client subscribed to account updates ✅ Drift service initialized successfully ``` **Permanent failure (after 3 retries):** ``` 🚀 Initializing Drift Protocol client... ⚠️ Drift initialization failed (attempt 1/3): fetch failed ⏳ Retrying in 2000ms... ⚠️ Drift initialization failed (attempt 2/3): fetch failed ⏳ Retrying in 2000ms... ⚠️ Drift initialization failed (attempt 3/3): fetch failed ❌ Failed to initialize Drift service after retries: TypeError: fetch failed ``` ## Impact **Before:** - DNS hiccup → 500 error → n8n workflow fails → missed trade - User must manually retry via Telegram **After:** - DNS hiccup → automatic retry (2s delay) → success → trade executes - 99% of transient failures handled automatically ## Testing **Deployed:** Nov 13, 2025 at 16:02 CET **Commit:** 5e826de **Monitor with:** ```bash # Check for retry activity docker logs trading-bot-v4 --since 1h | grep -E "Retrying|retry" # Count DNS failures (should see retries working) docker logs trading-bot-v4 --since 1h | grep "EAI_AGAIN" ``` ## Configuration Retry parameters in `retryOperation()`: - `maxRetries`: 3 attempts (configurable) - `delayMs`: 2000ms between retries (configurable) - Applied to: Drift SDK initialization, subscribe, user account fetch ## Related Issues - **Incident:** Nov 13, 2025 at 15:55 - n8n workflow failed with "fetch failed" - **Manual recovery:** User opened trade via Telegram successfully - **Fix:** This retry logic prevents future occurrences ## Future Improvements 1. **Multiple RPC endpoints:** Add fallback to public Solana RPC if Helius fails 2. **Circuit breaker:** Temporarily disable Helius if consistent failures detected 3. **Metrics:** Track retry success rate, DNS failure frequency 4. **Alert on persistent failures:** Notify user if all retries fail multiple times in 1 hour