feat: Automated failover system with certificate sync and DNS monitoring
Certificate Synchronization (COMPLETE): - Created cert-push-to-hostinger.sh on srvrevproxy02 - Hourly cron job pushes /etc/letsencrypt/ from srvrevproxy02 to Hostinger - SSH key authentication (id_ed25519_hostinger) configured - 22MB of Let's Encrypt certificates synced successfully - Automatic nginx reload on Hostinger after sync - Log: /var/log/cert-push-hostinger.log DNS Failover Monitor (READY): - Python script: dns-failover-monitor.py on Hostinger - INWX API integration for automatic DNS updates - Health monitoring every 30s, failover after 3 failures (90s) - Systemd service with auto-restart - Setup script: setup-inwx-env.sh for INWX credentials - Log: /var/log/dns-failover.log Architecture: - Primary: srvrevproxy02 (10.0.0.29) - Certificate source - Secondary: Hostinger (72.62.39.24) - Failover target - Nginx on Hostinger now uses flow.egonetix.de certificate Next Steps: - Run /root/setup-inwx-env.sh on Hostinger - Enter INWX credentials - Start monitoring: systemctl start dns-failover
This commit is contained in:
281
docs/DEPLOY_SECONDARY_MANUAL.md
Normal file
281
docs/DEPLOY_SECONDARY_MANUAL.md
Normal file
@@ -0,0 +1,281 @@
|
||||
# Manual Deployment to Secondary Server (Hostinger VPS)
|
||||
|
||||
## Quick Start - Deploy Secondary Now
|
||||
|
||||
### Step 1: Complete the Code Sync (if not finished)
|
||||
|
||||
```bash
|
||||
# Wait for rsync to complete or run it manually
|
||||
rsync -avz --delete \
|
||||
--exclude 'node_modules' \
|
||||
--exclude '.next' \
|
||||
--exclude '.git' \
|
||||
--exclude 'logs/*' \
|
||||
--exclude 'postgres-data' \
|
||||
/home/icke/traderv4/ root@72.62.39.24:/home/icke/traderv4/
|
||||
```
|
||||
|
||||
### Step 2: Backup and Sync Database
|
||||
|
||||
```bash
|
||||
# Dump database from primary
|
||||
docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > /tmp/trading_bot_backup.sql
|
||||
|
||||
# Copy to secondary
|
||||
scp /tmp/trading_bot_backup.sql root@72.62.39.24:/tmp/trading_bot_backup.sql
|
||||
```
|
||||
|
||||
### Step 3: Deploy on Secondary
|
||||
|
||||
```bash
|
||||
# SSH to secondary
|
||||
ssh root@72.62.39.24
|
||||
|
||||
cd /home/icke/traderv4
|
||||
|
||||
# Start PostgreSQL
|
||||
docker compose up -d postgres
|
||||
|
||||
# Wait for PostgreSQL to be ready
|
||||
sleep 10
|
||||
|
||||
# Restore database
|
||||
docker exec -i trading-bot-postgres psql -U postgres -c "DROP DATABASE IF EXISTS trading_bot_v4; CREATE DATABASE trading_bot_v4;"
|
||||
docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/trading_bot_backup.sql
|
||||
|
||||
# Verify database
|
||||
docker exec trading-bot-postgres psql -U postgres trading_bot_v4 -c "SELECT COUNT(*) FROM \"Trade\";"
|
||||
|
||||
# Build trading bot
|
||||
docker compose build trading-bot
|
||||
|
||||
# Start trading bot (but keep it inactive - secondary waits in standby)
|
||||
docker compose up -d trading-bot
|
||||
|
||||
# Check logs
|
||||
docker logs -f trading-bot-v4
|
||||
```
|
||||
|
||||
### Step 4: Verify Everything Works
|
||||
|
||||
```bash
|
||||
# Check all containers running
|
||||
docker ps
|
||||
|
||||
# Should see:
|
||||
# - trading-bot-v4 (your bot)
|
||||
# - trading-bot-postgres
|
||||
# - n8n (already running)
|
||||
|
||||
# Test health endpoint
|
||||
curl http://localhost:3001/api/health
|
||||
|
||||
# Check database connection
|
||||
docker exec trading-bot-postgres psql -U postgres -c "\l"
|
||||
```
|
||||
|
||||
## Ongoing Sync Strategy
|
||||
|
||||
### Option A: PostgreSQL Streaming Replication (Best)
|
||||
|
||||
**Setup once, sync forever in real-time (1-2 second lag)**
|
||||
|
||||
See `HA_DATABASE_SYNC_STRATEGY.md` for complete setup guide.
|
||||
|
||||
Quick version:
|
||||
|
||||
```bash
|
||||
# On PRIMARY
|
||||
docker exec trading-bot-postgres psql -U postgres -c "
|
||||
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'ReplPass2024!';
|
||||
"
|
||||
|
||||
docker exec trading-bot-postgres bash -c "cat >> /var/lib/postgresql/data/postgresql.conf << CONF
|
||||
wal_level = replica
|
||||
max_wal_senders = 3
|
||||
wal_keep_size = 64
|
||||
CONF"
|
||||
|
||||
docker exec trading-bot-postgres bash -c "echo 'host replication replicator 72.62.39.24/32 md5' >> /var/lib/postgresql/data/pg_hba.conf"
|
||||
|
||||
docker restart trading-bot-postgres
|
||||
|
||||
# On SECONDARY
|
||||
docker compose down postgres
|
||||
rm -rf postgres-data/
|
||||
mkdir -p postgres-data
|
||||
|
||||
docker run --rm \
|
||||
-v $(pwd)/postgres-data:/var/lib/postgresql/data \
|
||||
-e PGPASSWORD='ReplPass2024!' \
|
||||
postgres:16-alpine \
|
||||
pg_basebackup -h <hetzner-ip> -p 5432 -U replicator -D /var/lib/postgresql/data -P -R
|
||||
|
||||
docker compose up -d postgres
|
||||
|
||||
# Verify
|
||||
docker exec trading-bot-postgres psql -U postgres -c "SELECT * FROM pg_stat_wal_receiver;"
|
||||
```
|
||||
|
||||
### Option B: Cron Job Backup (Simple but 6hr lag)
|
||||
|
||||
```bash
|
||||
# On PRIMARY - Create sync script
|
||||
cat > /root/sync-to-secondary.sh << 'SCRIPT'
|
||||
#!/bin/bash
|
||||
LOG="/var/log/secondary-sync.log"
|
||||
echo "[$(date)] Starting sync..." >> $LOG
|
||||
|
||||
# Sync code
|
||||
rsync -avz --delete \
|
||||
--exclude 'node_modules' --exclude '.next' --exclude '.git' \
|
||||
/home/icke/traderv4/ root@72.62.39.24:/home/icke/traderv4/ >> $LOG 2>&1
|
||||
|
||||
# Sync database
|
||||
docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 | \
|
||||
ssh root@72.62.39.24 "docker exec -i trading-bot-postgres psql -U postgres -c 'DROP DATABASE IF EXISTS trading_bot_v4; CREATE DATABASE trading_bot_v4;' && docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4" >> $LOG 2>&1
|
||||
|
||||
echo "[$(date)] Sync complete" >> $LOG
|
||||
SCRIPT
|
||||
|
||||
chmod +x /root/sync-to-secondary.sh
|
||||
|
||||
# Test it
|
||||
/root/sync-to-secondary.sh
|
||||
|
||||
# Schedule every 6 hours
|
||||
crontab -e
|
||||
# Add: 0 */6 * * * /root/sync-to-secondary.sh
|
||||
```
|
||||
|
||||
## Health Monitor Setup
|
||||
|
||||
Create health monitor to automatically switch DNS on failure:
|
||||
|
||||
```bash
|
||||
# Create health monitor script (run on laptop or third server)
|
||||
cat > ~/trading-bot-monitor.py << 'SCRIPT'
|
||||
#!/usr/bin/env python3
|
||||
import requests
|
||||
import time
|
||||
import os
|
||||
|
||||
CLOUDFLARE_API_TOKEN = "your-token"
|
||||
CLOUDFLARE_ZONE_ID = "your-zone-id"
|
||||
CLOUDFLARE_RECORD_ID = "your-record-id"
|
||||
|
||||
PRIMARY_IP = "hetzner-ip"
|
||||
SECONDARY_IP = "72.62.39.24"
|
||||
|
||||
PRIMARY_URL = f"http://{PRIMARY_IP}:3001/api/health"
|
||||
SECONDARY_URL = f"http://{SECONDARY_IP}:3001/api/health"
|
||||
|
||||
TELEGRAM_BOT_TOKEN = os.getenv("TELEGRAM_BOT_TOKEN")
|
||||
TELEGRAM_CHAT_ID = os.getenv("TELEGRAM_CHAT_ID")
|
||||
|
||||
current_active = "primary"
|
||||
|
||||
def send_telegram(message):
|
||||
try:
|
||||
url = f"https://api.telegram.org/bot{TELEGRAM_BOT_TOKEN}/sendMessage"
|
||||
requests.post(url, json={"chat_id": TELEGRAM_CHAT_ID, "text": message}, timeout=10)
|
||||
except:
|
||||
pass
|
||||
|
||||
def check_health(url):
|
||||
try:
|
||||
response = requests.get(url, timeout=10)
|
||||
return response.status_code == 200
|
||||
except:
|
||||
return False
|
||||
|
||||
def update_cloudflare_dns(ip):
|
||||
url = f"https://api.cloudflare.com/client/v4/zones/{CLOUDFLARE_ZONE_ID}/dns_records/{CLOUDFLARE_RECORD_ID}"
|
||||
headers = {"Authorization": f"Bearer {CLOUDFLARE_API_TOKEN}", "Content-Type": "application/json"}
|
||||
data = {"type": "A", "name": "flow.egonetix.de", "content": ip, "ttl": 120, "proxied": False}
|
||||
|
||||
response = requests.put(url, json=data, headers=headers, timeout=10)
|
||||
return response.status_code == 200
|
||||
|
||||
print("Health monitor started")
|
||||
send_telegram("🏥 Trading Bot Health Monitor Started")
|
||||
|
||||
while True:
|
||||
primary_healthy = check_health(PRIMARY_URL)
|
||||
secondary_healthy = check_health(SECONDARY_URL)
|
||||
|
||||
print(f"Primary: {'✅' if primary_healthy else '❌'} | Secondary: {'✅' if secondary_healthy else '❌'}")
|
||||
|
||||
if current_active == "primary" and not primary_healthy and secondary_healthy:
|
||||
print("FAILOVER: Switching to secondary")
|
||||
if update_cloudflare_dns(SECONDARY_IP):
|
||||
current_active = "secondary"
|
||||
send_telegram(f"🚨 FAILOVER: Primary DOWN, switched to Secondary ({SECONDARY_IP})")
|
||||
|
||||
elif current_active == "secondary" and primary_healthy:
|
||||
print("RECOVERY: Switching back to primary")
|
||||
if update_cloudflare_dns(PRIMARY_IP):
|
||||
current_active = "primary"
|
||||
send_telegram(f"✅ RECOVERY: Primary restored ({PRIMARY_IP})")
|
||||
|
||||
time.sleep(30)
|
||||
SCRIPT
|
||||
|
||||
chmod +x ~/trading-bot-monitor.py
|
||||
|
||||
# Run in background
|
||||
nohup python3 ~/trading-bot-monitor.py > ~/monitor.log 2>&1 &
|
||||
```
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
- [ ] Secondary server has all code from primary
|
||||
- [ ] Secondary has same .env file (same wallet key!)
|
||||
- [ ] PostgreSQL running on secondary
|
||||
- [ ] Database restored and contains trades
|
||||
- [ ] Trading bot built successfully
|
||||
- [ ] Trading bot starts without errors
|
||||
- [ ] Health endpoint responds on secondary
|
||||
- [ ] n8n running on secondary (already was)
|
||||
- [ ] Sync strategy chosen and configured
|
||||
- [ ] Health monitor running (if automated failover desired)
|
||||
- [ ] DNS ready to switch (Cloudflare setup)
|
||||
|
||||
## Test Failover
|
||||
|
||||
```bash
|
||||
# 1. Stop primary bot
|
||||
ssh root@hetzner-ip "cd /home/icke/traderv4 && docker compose stop trading-bot"
|
||||
|
||||
# 2. Verify secondary takes over (if health monitor running)
|
||||
# OR manually update DNS to point to 72.62.39.24
|
||||
|
||||
# 3. Send test webhook to secondary
|
||||
curl -X POST http://72.62.39.24:3001/api/trading/execute \
|
||||
-H "Content-Type: application/json" \
|
||||
-H "Authorization: Bearer your-api-key" \
|
||||
-d '{"test": true}'
|
||||
|
||||
# 4. Check logs
|
||||
ssh root@72.62.39.24 "docker logs --tail=50 trading-bot-v4"
|
||||
|
||||
# 5. Restart primary
|
||||
ssh root@hetzner-ip "cd /home/icke/traderv4 && docker compose start trading-bot"
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
**Your secondary server is now a full replica:**
|
||||
- ✅ Same code as primary
|
||||
- ✅ Same database (snapshot)
|
||||
- ✅ Same configuration (.env)
|
||||
- ✅ Ready to take over if primary fails
|
||||
|
||||
**Choose sync strategy:**
|
||||
- 🔄 **PostgreSQL Streaming Replication** - Real-time, 1-2s lag (BEST)
|
||||
- ⏰ **Cron Job** - Simple, 6-hour lag (OK for testing)
|
||||
|
||||
**Enable automated failover:**
|
||||
- 🤖 Run health monitor script (switches DNS automatically)
|
||||
- 📱 Gets Telegram alerts on failover/recovery
|
||||
- ⚡ 30-60 second failover time
|
||||
Reference in New Issue
Block a user