Created comprehensive HA roadmap with 6 phases: - Phase 1: Warm standby (CURRENT - manual failover) - Phase 2: Database replication - Phase 3: Health monitoring - Phase 4: Reverse proxy + floating IP - Phase 5: Automated failover - Phase 6: Geographic redundancy Includes: - Decision gates based on capital and stability - Cost-benefit analysis - Scripts for healthcheck, failover, DB sync - Recommendation to defer full HA until capital > $5k Secondary server ready at 72.62.39.24 for emergency manual failover. Related: User concern about system uptime, but full HA complexity not justified at current scale (~$600 capital). Revisit in Q1 2026.
High Availability Setup for Trading Bot v4
Architecture: Active-Passive Failover
Primary Server (Active): Runs trading bot 24/7 Secondary Server (Passive): Monitors primary, takes over on failure
Why Active-Passive (Not Active-Active)?
- Prevents duplicate trades - CRITICAL for financial system
- Single source of truth - One Position Manager tracking state
- No split-brain scenarios - Only one bot executes trades
- Database consistency - No conflicting writes
Setup Instructions
1. Prerequisites
Primary Server: root@192.168.1.100 (update in scripts)
Secondary Server: root@72.62.39.24
Both servers need:
- Docker & Docker Compose installed
- Trading bot project at
/home/icke/traderv4 - Same
.envfile (especially DRIFT_WALLET_PRIVATE_KEY) - Same n8n workflows configured
2. Initial Sync (Already Done via rsync ✅)
# From primary server
rsync -avz --exclude 'node_modules' --exclude '.next' \
/home/icke/traderv4/ root@72.62.39.24:/home/icke/traderv4/
3. Database Synchronization
Option A: Manual Sync (Simpler, Recommended for Start)
On primary:
docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > /tmp/trading_bot_backup.sql
rsync -avz /tmp/trading_bot_backup.sql root@72.62.39.24:/tmp/
On secondary:
docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/trading_bot_backup.sql
Run this daily via cron on primary:
0 2 * * * /home/icke/traderv4/ha-setup/sync-db-daily.sh
Option B: Streaming Replication (Advanced)
# On primary
bash ha-setup/setup-db-replication.sh primary
# On secondary
bash ha-setup/setup-db-replication.sh secondary
4. Setup Health Monitoring
Make scripts executable:
chmod +x ha-setup/*.sh
Test healthcheck on both servers:
bash ha-setup/healthcheck.sh
# Should output: ✅ HEALTHY: All checks passed
5. Start Failover Controller (SECONDARY ONLY)
Edit configuration first:
nano ha-setup/failover-controller.sh
# Update PRIMARY_HOST with actual IP
# Update SECONDARY_HOST if needed
Run as systemd service:
sudo cp ha-setup/trading-bot-ha.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable trading-bot-ha
sudo systemctl start trading-bot-ha
Check status:
sudo systemctl status trading-bot-ha
sudo journalctl -u trading-bot-ha -f
6. SSH Key Setup (Password-less Auth)
Secondary needs SSH access to primary for health checks:
# On secondary
ssh-keygen -t ed25519 -f /root/.ssh/trading_bot_ha
ssh-copy-id -i /root/.ssh/trading_bot_ha root@192.168.1.100
# Test connection
ssh root@192.168.1.100 "docker ps | grep trading-bot"
How It Works
Normal Operation (Primary Active)
- Primary: Trading bot running, executing trades
- Secondary: Failover controller checks primary every 15s
- Secondary: Bot container STOPPED (passive standby)
Failover Scenario
- Primary fails (server down, docker crash, API unresponsive)
- Secondary detects 3 consecutive failed health checks (45s)
- Telegram alert sent: "🚨 HA FAILOVER: Primary failed, activating secondary"
- Secondary starts trading bot container
- Trading continues on secondary with same wallet/config
Recovery Scenario
- Primary recovers (you fix it, restart, etc.)
- Secondary detects primary is healthy again
- Secondary stops its trading bot (returns to standby)
- Telegram alert: "Primary recovered, secondary deactivated"
- Primary resumes as active node
Monitoring & Maintenance
Check HA Status
On secondary:
# View failover controller logs
sudo journalctl -u trading-bot-ha -f --lines=50
# Check if secondary is active
docker ps | grep trading-bot-v4
On primary:
# Run healthcheck manually
bash ha-setup/healthcheck.sh
# Check container status
docker ps | grep trading-bot-v4
Manual Failover Testing
Simulate primary failure:
# On primary, stop trading bot
docker compose stop trading-bot
# Watch secondary logs - should activate within 45s
# On secondary
sudo journalctl -u trading-bot-ha -f
Restore primary:
# On primary, restart trading bot
docker compose up -d trading-bot
# Watch secondary - should deactivate within 15s
Database Sync Schedule
Daily sync from primary to secondary:
On primary, add to crontab:
crontab -e
# Add:
0 2 * * * /home/icke/traderv4/ha-setup/sync-db-daily.sh >> /var/log/trading-bot-db-sync.log 2>&1
Before failover events: Secondary uses last synced DB state (max 24h old trade history) After failover: Secondary continues with current state, syncs back to primary when recovered
Important Notes
Financial Safety
- NEVER run both servers actively - would cause duplicate trades and wallet conflicts
- Failover controller ensures only one active at a time
- Same wallet key required on both servers
- Same n8n webhook endpoint - update TradingView alerts if needed
Database Consistency
- Daily sync: Keeps secondary within 24h of primary
- Trade history: May have small gap after failover (acceptable)
- Position Manager: Rebuilds state from Drift Protocol on startup
- No financial loss: Drift Protocol is source of truth for positions
Network Requirements
- Secondary → Primary: SSH access (port 22) for health checks
- Both → Internet: For Drift Protocol, Telegram, n8n webhooks
- n8n: Can run on both or centralized (needs webhook routing)
Testing Recommendations
- Week 1: Run without failover, just monitor health checks
- Week 2: Test manual failover (stop primary, verify secondary takes over)
- Week 3: Test recovery (restart primary, verify secondary stops)
- Week 4: Enable automatic failover for production
Troubleshooting
Secondary Won't Start After Failover
# Check logs
docker logs trading-bot-v4
# Check .env file exists
ls -la /home/icke/traderv4/.env
# Check Drift initialization
docker logs trading-bot-v4 | grep "Drift"
Split-Brain (Both Servers Active)
EMERGENCY - Stop both immediately:
# On both servers
docker compose stop trading-bot
Then restart only primary:
# On primary only
docker compose up -d trading-bot
Check Drift positions:
curl -s http://localhost:3001/api/trading/positions \
-H "Authorization: Bearer ${API_SECRET_KEY}" | jq .
Health Check False Positives
Adjust thresholds in failover-controller.sh:
CHECK_INTERVAL=30 # Slower checks (reduce network load)
MAX_FAILURES=5 # More tolerant (reduce false failovers)
Cost Analysis
Primary Server: Always running (existing cost) Secondary Server: Always running, but mostly idle
Benefits:
- 99.9% uptime vs 95% single server
- ~4.5 hours/year max downtime (failover time)
- Financial protection - no missed trades during outages
- Peace of mind - sleep without worrying about server crashes
Worth it? YES - For a financial system, redundancy is essential.
Future Enhancements
- Geographic redundancy: Secondary in different datacenter/region
- Load balancer: Route n8n webhooks to active server automatically
- Database streaming replication: Real-time sync (0 data loss)
- Multi-region: Three servers (US, EU, Asia) for global coverage
- Health dashboard: Web UI showing HA status and metrics