# High Availability Setup for Trading Bot v4 ## Architecture: Active-Passive Failover **Primary Server (Active):** Runs trading bot 24/7 **Secondary Server (Passive):** Monitors primary, takes over on failure ### Why Active-Passive (Not Active-Active)? - **Prevents duplicate trades** - CRITICAL for financial system - **Single source of truth** - One Position Manager tracking state - **No split-brain scenarios** - Only one bot executes trades - **Database consistency** - No conflicting writes --- ## Setup Instructions ### 1. Prerequisites **Primary Server:** `root@192.168.1.100` (update in scripts) **Secondary Server:** `root@72.62.39.24` Both servers need: - Docker & Docker Compose installed - Trading bot project at `/home/icke/traderv4` - Same `.env` file (especially DRIFT_WALLET_PRIVATE_KEY) - Same n8n workflows configured ### 2. Initial Sync (Already Done via rsync ✅) ```bash # From primary server rsync -avz --exclude 'node_modules' --exclude '.next' \ /home/icke/traderv4/ root@72.62.39.24:/home/icke/traderv4/ ``` ### 3. Database Synchronization **Option A: Manual Sync (Simpler, Recommended for Start)** On primary: ```bash docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > /tmp/trading_bot_backup.sql rsync -avz /tmp/trading_bot_backup.sql root@72.62.39.24:/tmp/ ``` On secondary: ```bash docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/trading_bot_backup.sql ``` Run this daily via cron on primary: ```bash 0 2 * * * /home/icke/traderv4/ha-setup/sync-db-daily.sh ``` **Option B: Streaming Replication (Advanced)** ```bash # On primary bash ha-setup/setup-db-replication.sh primary # On secondary bash ha-setup/setup-db-replication.sh secondary ``` ### 4. Setup Health Monitoring Make scripts executable: ```bash chmod +x ha-setup/*.sh ``` **Test healthcheck on both servers:** ```bash bash ha-setup/healthcheck.sh # Should output: ✅ HEALTHY: All checks passed ``` ### 5. Start Failover Controller (SECONDARY ONLY) **Edit configuration first:** ```bash nano ha-setup/failover-controller.sh # Update PRIMARY_HOST with actual IP # Update SECONDARY_HOST if needed ``` **Run as systemd service:** ```bash sudo cp ha-setup/trading-bot-ha.service /etc/systemd/system/ sudo systemctl daemon-reload sudo systemctl enable trading-bot-ha sudo systemctl start trading-bot-ha ``` **Check status:** ```bash sudo systemctl status trading-bot-ha sudo journalctl -u trading-bot-ha -f ``` ### 6. SSH Key Setup (Password-less Auth) Secondary needs SSH access to primary for health checks: ```bash # On secondary ssh-keygen -t ed25519 -f /root/.ssh/trading_bot_ha ssh-copy-id -i /root/.ssh/trading_bot_ha root@192.168.1.100 # Test connection ssh root@192.168.1.100 "docker ps | grep trading-bot" ``` --- ## How It Works ### Normal Operation (Primary Active) 1. **Primary:** Trading bot running, executing trades 2. **Secondary:** Failover controller checks primary every 15s 3. **Secondary:** Bot container STOPPED (passive standby) ### Failover Scenario 1. **Primary fails** (server down, docker crash, API unresponsive) 2. **Secondary detects** 3 consecutive failed health checks (45s) 3. **Telegram alert sent:** "🚨 HA FAILOVER: Primary failed, activating secondary" 4. **Secondary starts** trading bot container 5. **Trading continues** on secondary with same wallet/config ### Recovery Scenario 1. **Primary recovers** (you fix it, restart, etc.) 2. **Secondary detects** primary is healthy again 3. **Secondary stops** its trading bot (returns to standby) 4. **Telegram alert:** "Primary recovered, secondary deactivated" 5. **Primary resumes** as active node --- ## Monitoring & Maintenance ### Check HA Status **On secondary:** ```bash # View failover controller logs sudo journalctl -u trading-bot-ha -f --lines=50 # Check if secondary is active docker ps | grep trading-bot-v4 ``` **On primary:** ```bash # Run healthcheck manually bash ha-setup/healthcheck.sh # Check container status docker ps | grep trading-bot-v4 ``` ### Manual Failover Testing **Simulate primary failure:** ```bash # On primary, stop trading bot docker compose stop trading-bot # Watch secondary logs - should activate within 45s # On secondary sudo journalctl -u trading-bot-ha -f ``` **Restore primary:** ```bash # On primary, restart trading bot docker compose up -d trading-bot # Watch secondary - should deactivate within 15s ``` ### Database Sync Schedule **Daily sync from primary to secondary:** On primary, add to crontab: ```bash crontab -e # Add: 0 2 * * * /home/icke/traderv4/ha-setup/sync-db-daily.sh >> /var/log/trading-bot-db-sync.log 2>&1 ``` **Before failover events:** Secondary uses last synced DB state (max 24h old trade history) **After failover:** Secondary continues with current state, syncs back to primary when recovered --- ## Important Notes ### Financial Safety - **NEVER run both servers actively** - would cause duplicate trades and wallet conflicts - **Failover controller ensures** only one active at a time - **Same wallet key** required on both servers - **Same n8n webhook endpoint** - update TradingView alerts if needed ### Database Consistency - **Daily sync:** Keeps secondary within 24h of primary - **Trade history:** May have small gap after failover (acceptable) - **Position Manager:** Rebuilds state from Drift Protocol on startup - **No financial loss:** Drift Protocol is source of truth for positions ### Network Requirements - **Secondary → Primary:** SSH access (port 22) for health checks - **Both → Internet:** For Drift Protocol, Telegram, n8n webhooks - **n8n:** Can run on both or centralized (needs webhook routing) ### Testing Recommendations 1. **Week 1:** Run without failover, just monitor health checks 2. **Week 2:** Test manual failover (stop primary, verify secondary takes over) 3. **Week 3:** Test recovery (restart primary, verify secondary stops) 4. **Week 4:** Enable automatic failover for production --- ## Troubleshooting ### Secondary Won't Start After Failover ```bash # Check logs docker logs trading-bot-v4 # Check .env file exists ls -la /home/icke/traderv4/.env # Check Drift initialization docker logs trading-bot-v4 | grep "Drift" ``` ### Split-Brain (Both Servers Active) **EMERGENCY - Stop both immediately:** ```bash # On both servers docker compose stop trading-bot ``` **Then restart only primary:** ```bash # On primary only docker compose up -d trading-bot ``` **Check Drift positions:** ```bash curl -s http://localhost:3001/api/trading/positions \ -H "Authorization: Bearer ${API_SECRET_KEY}" | jq . ``` ### Health Check False Positives Adjust thresholds in `failover-controller.sh`: ```bash CHECK_INTERVAL=30 # Slower checks (reduce network load) MAX_FAILURES=5 # More tolerant (reduce false failovers) ``` --- ## Cost Analysis **Primary Server:** Always running (existing cost) **Secondary Server:** Always running, but mostly idle **Benefits:** - **99.9% uptime** vs 95% single server - **~4.5 hours/year** max downtime (failover time) - **Financial protection** - no missed trades during outages - **Peace of mind** - sleep without worrying about server crashes **Worth it?** YES - For a financial system, redundancy is essential. --- ## Future Enhancements 1. **Geographic redundancy:** Secondary in different datacenter/region 2. **Load balancer:** Route n8n webhooks to active server automatically 3. **Database streaming replication:** Real-time sync (0 data loss) 4. **Multi-region:** Three servers (US, EU, Asia) for global coverage 5. **Health dashboard:** Web UI showing HA status and metrics