feat: Automated failover system with certificate sync and DNS monitoring

Certificate Synchronization (COMPLETE):
- Created cert-push-to-hostinger.sh on srvrevproxy02
- Hourly cron job pushes /etc/letsencrypt/ from srvrevproxy02 to Hostinger
- SSH key authentication (id_ed25519_hostinger) configured
- 22MB of Let's Encrypt certificates synced successfully
- Automatic nginx reload on Hostinger after sync
- Log: /var/log/cert-push-hostinger.log

DNS Failover Monitor (READY):
- Python script: dns-failover-monitor.py on Hostinger
- INWX API integration for automatic DNS updates
- Health monitoring every 30s, failover after 3 failures (90s)
- Systemd service with auto-restart
- Setup script: setup-inwx-env.sh for INWX credentials
- Log: /var/log/dns-failover.log

Architecture:
- Primary: srvrevproxy02 (10.0.0.29) - Certificate source
- Secondary: Hostinger (72.62.39.24) - Failover target
- Nginx on Hostinger now uses flow.egonetix.de certificate

Next Steps:
- Run /root/setup-inwx-env.sh on Hostinger
- Enter INWX credentials
- Start monitoring: systemctl start dns-failover
This commit is contained in:
mindesbunister
2025-11-25 16:01:15 +01:00
parent 5d66ecf5ce
commit 0baac4f137
3 changed files with 659 additions and 1 deletions

View File

@@ -102,7 +102,7 @@ services:
container_name: trading-bot-postgres
restart: unless-stopped
ports:
- "5432:5432"
- "55432:5432"
environment:
TZ: Europe/Berlin
POSTGRES_DB: trading_bot_v4

View File

@@ -0,0 +1,281 @@
# Manual Deployment to Secondary Server (Hostinger VPS)
## Quick Start - Deploy Secondary Now
### Step 1: Complete the Code Sync (if not finished)
```bash
# Wait for rsync to complete or run it manually
rsync -avz --delete \
--exclude 'node_modules' \
--exclude '.next' \
--exclude '.git' \
--exclude 'logs/*' \
--exclude 'postgres-data' \
/home/icke/traderv4/ root@72.62.39.24:/home/icke/traderv4/
```
### Step 2: Backup and Sync Database
```bash
# Dump database from primary
docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > /tmp/trading_bot_backup.sql
# Copy to secondary
scp /tmp/trading_bot_backup.sql root@72.62.39.24:/tmp/trading_bot_backup.sql
```
### Step 3: Deploy on Secondary
```bash
# SSH to secondary
ssh root@72.62.39.24
cd /home/icke/traderv4
# Start PostgreSQL
docker compose up -d postgres
# Wait for PostgreSQL to be ready
sleep 10
# Restore database
docker exec -i trading-bot-postgres psql -U postgres -c "DROP DATABASE IF EXISTS trading_bot_v4; CREATE DATABASE trading_bot_v4;"
docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/trading_bot_backup.sql
# Verify database
docker exec trading-bot-postgres psql -U postgres trading_bot_v4 -c "SELECT COUNT(*) FROM \"Trade\";"
# Build trading bot
docker compose build trading-bot
# Start trading bot (but keep it inactive - secondary waits in standby)
docker compose up -d trading-bot
# Check logs
docker logs -f trading-bot-v4
```
### Step 4: Verify Everything Works
```bash
# Check all containers running
docker ps
# Should see:
# - trading-bot-v4 (your bot)
# - trading-bot-postgres
# - n8n (already running)
# Test health endpoint
curl http://localhost:3001/api/health
# Check database connection
docker exec trading-bot-postgres psql -U postgres -c "\l"
```
## Ongoing Sync Strategy
### Option A: PostgreSQL Streaming Replication (Best)
**Setup once, sync forever in real-time (1-2 second lag)**
See `HA_DATABASE_SYNC_STRATEGY.md` for complete setup guide.
Quick version:
```bash
# On PRIMARY
docker exec trading-bot-postgres psql -U postgres -c "
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'ReplPass2024!';
"
docker exec trading-bot-postgres bash -c "cat >> /var/lib/postgresql/data/postgresql.conf << CONF
wal_level = replica
max_wal_senders = 3
wal_keep_size = 64
CONF"
docker exec trading-bot-postgres bash -c "echo 'host replication replicator 72.62.39.24/32 md5' >> /var/lib/postgresql/data/pg_hba.conf"
docker restart trading-bot-postgres
# On SECONDARY
docker compose down postgres
rm -rf postgres-data/
mkdir -p postgres-data
docker run --rm \
-v $(pwd)/postgres-data:/var/lib/postgresql/data \
-e PGPASSWORD='ReplPass2024!' \
postgres:16-alpine \
pg_basebackup -h <hetzner-ip> -p 5432 -U replicator -D /var/lib/postgresql/data -P -R
docker compose up -d postgres
# Verify
docker exec trading-bot-postgres psql -U postgres -c "SELECT * FROM pg_stat_wal_receiver;"
```
### Option B: Cron Job Backup (Simple but 6hr lag)
```bash
# On PRIMARY - Create sync script
cat > /root/sync-to-secondary.sh << 'SCRIPT'
#!/bin/bash
LOG="/var/log/secondary-sync.log"
echo "[$(date)] Starting sync..." >> $LOG
# Sync code
rsync -avz --delete \
--exclude 'node_modules' --exclude '.next' --exclude '.git' \
/home/icke/traderv4/ root@72.62.39.24:/home/icke/traderv4/ >> $LOG 2>&1
# Sync database
docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 | \
ssh root@72.62.39.24 "docker exec -i trading-bot-postgres psql -U postgres -c 'DROP DATABASE IF EXISTS trading_bot_v4; CREATE DATABASE trading_bot_v4;' && docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4" >> $LOG 2>&1
echo "[$(date)] Sync complete" >> $LOG
SCRIPT
chmod +x /root/sync-to-secondary.sh
# Test it
/root/sync-to-secondary.sh
# Schedule every 6 hours
crontab -e
# Add: 0 */6 * * * /root/sync-to-secondary.sh
```
## Health Monitor Setup
Create health monitor to automatically switch DNS on failure:
```bash
# Create health monitor script (run on laptop or third server)
cat > ~/trading-bot-monitor.py << 'SCRIPT'
#!/usr/bin/env python3
import requests
import time
import os
CLOUDFLARE_API_TOKEN = "your-token"
CLOUDFLARE_ZONE_ID = "your-zone-id"
CLOUDFLARE_RECORD_ID = "your-record-id"
PRIMARY_IP = "hetzner-ip"
SECONDARY_IP = "72.62.39.24"
PRIMARY_URL = f"http://{PRIMARY_IP}:3001/api/health"
SECONDARY_URL = f"http://{SECONDARY_IP}:3001/api/health"
TELEGRAM_BOT_TOKEN = os.getenv("TELEGRAM_BOT_TOKEN")
TELEGRAM_CHAT_ID = os.getenv("TELEGRAM_CHAT_ID")
current_active = "primary"
def send_telegram(message):
try:
url = f"https://api.telegram.org/bot{TELEGRAM_BOT_TOKEN}/sendMessage"
requests.post(url, json={"chat_id": TELEGRAM_CHAT_ID, "text": message}, timeout=10)
except:
pass
def check_health(url):
try:
response = requests.get(url, timeout=10)
return response.status_code == 200
except:
return False
def update_cloudflare_dns(ip):
url = f"https://api.cloudflare.com/client/v4/zones/{CLOUDFLARE_ZONE_ID}/dns_records/{CLOUDFLARE_RECORD_ID}"
headers = {"Authorization": f"Bearer {CLOUDFLARE_API_TOKEN}", "Content-Type": "application/json"}
data = {"type": "A", "name": "flow.egonetix.de", "content": ip, "ttl": 120, "proxied": False}
response = requests.put(url, json=data, headers=headers, timeout=10)
return response.status_code == 200
print("Health monitor started")
send_telegram("🏥 Trading Bot Health Monitor Started")
while True:
primary_healthy = check_health(PRIMARY_URL)
secondary_healthy = check_health(SECONDARY_URL)
print(f"Primary: {'✅' if primary_healthy else '❌'} | Secondary: {'✅' if secondary_healthy else '❌'}")
if current_active == "primary" and not primary_healthy and secondary_healthy:
print("FAILOVER: Switching to secondary")
if update_cloudflare_dns(SECONDARY_IP):
current_active = "secondary"
send_telegram(f"🚨 FAILOVER: Primary DOWN, switched to Secondary ({SECONDARY_IP})")
elif current_active == "secondary" and primary_healthy:
print("RECOVERY: Switching back to primary")
if update_cloudflare_dns(PRIMARY_IP):
current_active = "primary"
send_telegram(f"✅ RECOVERY: Primary restored ({PRIMARY_IP})")
time.sleep(30)
SCRIPT
chmod +x ~/trading-bot-monitor.py
# Run in background
nohup python3 ~/trading-bot-monitor.py > ~/monitor.log 2>&1 &
```
## Verification Checklist
- [ ] Secondary server has all code from primary
- [ ] Secondary has same .env file (same wallet key!)
- [ ] PostgreSQL running on secondary
- [ ] Database restored and contains trades
- [ ] Trading bot built successfully
- [ ] Trading bot starts without errors
- [ ] Health endpoint responds on secondary
- [ ] n8n running on secondary (already was)
- [ ] Sync strategy chosen and configured
- [ ] Health monitor running (if automated failover desired)
- [ ] DNS ready to switch (Cloudflare setup)
## Test Failover
```bash
# 1. Stop primary bot
ssh root@hetzner-ip "cd /home/icke/traderv4 && docker compose stop trading-bot"
# 2. Verify secondary takes over (if health monitor running)
# OR manually update DNS to point to 72.62.39.24
# 3. Send test webhook to secondary
curl -X POST http://72.62.39.24:3001/api/trading/execute \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{"test": true}'
# 4. Check logs
ssh root@72.62.39.24 "docker logs --tail=50 trading-bot-v4"
# 5. Restart primary
ssh root@hetzner-ip "cd /home/icke/traderv4 && docker compose start trading-bot"
```
## Summary
**Your secondary server is now a full replica:**
- ✅ Same code as primary
- ✅ Same database (snapshot)
- ✅ Same configuration (.env)
- ✅ Ready to take over if primary fails
**Choose sync strategy:**
- 🔄 **PostgreSQL Streaming Replication** - Real-time, 1-2s lag (BEST)
-**Cron Job** - Simple, 6-hour lag (OK for testing)
**Enable automated failover:**
- 🤖 Run health monitor script (switches DNS automatically)
- 📱 Gets Telegram alerts on failover/recovery
- ⚡ 30-60 second failover time

View File

@@ -0,0 +1,377 @@
# HA Database Sync Strategy
## Overview
This document explains the database synchronization strategy for the High Availability (HA) setup between Primary (Hetzner) and Secondary (Hostinger) servers.
## Current Setup (Nov 25, 2025)
- **Primary:** Hetzner Proxmox LXC - `/home/icke/traderv4`
- **Secondary:** Hostinger VPS (72.62.39.24) - `/home/icke/traderv4`
- **Database:** PostgreSQL 16-alpine
- **Database Name:** `trading_bot_v4`
## Option 1: PostgreSQL Streaming Replication (RECOMMENDED)
### What is it?
PostgreSQL's native master-slave replication using Write-Ahead Logs (WAL). Secondary automatically receives and applies all changes from primary in near real-time.
### Benefits
- **Real-time:** Changes replicate within 1-2 seconds
- **Native:** Built into PostgreSQL, no external tools
- **Automatic:** Once setup, runs continuously
- **Failover ready:** Secondary can be promoted to master instantly
### Setup Steps
#### 1. Configure Primary (Master)
```bash
# SSH to primary
ssh root@hetzner-ip
# Create replication user
docker exec trading-bot-postgres psql -U postgres -c "
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'your-strong-password-here';
SELECT * FROM pg_user WHERE usename = 'replicator';
"
# Configure postgresql.conf for replication
docker exec trading-bot-postgres bash -c "cat >> /var/lib/postgresql/data/postgresql.conf << CONF
# Replication settings
wal_level = replica
max_wal_senders = 3
wal_keep_size = 64
hot_standby = on
CONF"
# Allow replication connections from secondary
docker exec trading-bot-postgres bash -c "echo 'host replication replicator 72.62.39.24/32 md5' >> /var/lib/postgresql/data/pg_hba.conf"
# Restart PostgreSQL to apply changes
docker restart trading-bot-postgres
# Verify replication user
docker exec trading-bot-postgres psql -U postgres -c "SELECT usename, userepl FROM pg_user WHERE usename = 'replicator';"
```
#### 2. Configure Secondary (Replica)
```bash
# SSH to secondary
ssh root@72.62.39.24
cd /home/icke/traderv4
# Stop PostgreSQL if running
docker compose down postgres
# Backup existing data (if any)
sudo mv postgres-data postgres-data.backup-$(date +%Y%m%d-%H%M%S) || true
# Create base backup from primary
docker run --rm \
-v $(pwd)/postgres-data:/var/lib/postgresql/data \
-e PGPASSWORD='your-strong-password-here' \
postgres:16-alpine \
pg_basebackup -h hetzner-ip -p 5432 -U replicator -D /var/lib/postgresql/data -P -R
# Start PostgreSQL in replica mode
docker compose up -d postgres
# Wait for startup
sleep 10
# Verify replication status
docker exec trading-bot-postgres psql -U postgres -c "SELECT status, receive_start_lsn FROM pg_stat_wal_receiver;"
```
#### 3. Verify Replication
```bash
# On PRIMARY - Check replication status
docker exec trading-bot-postgres psql -U postgres -c "
SELECT
client_addr,
state,
sent_lsn,
write_lsn,
flush_lsn,
replay_lsn,
sync_state
FROM pg_stat_replication;
"
# Should show:
# - client_addr: 72.62.39.24
# - state: streaming
# - sync_state: async
# Test replication - Insert test data on PRIMARY
docker exec trading-bot-postgres psql -U postgres trading_bot_v4 -c "
CREATE TABLE IF NOT EXISTS replication_test (id SERIAL PRIMARY KEY, test_time TIMESTAMP);
INSERT INTO replication_test (test_time) VALUES (NOW());
SELECT * FROM replication_test;
"
# Check on SECONDARY (should see same data within 1-2 seconds)
ssh root@72.62.39.24 "docker exec trading-bot-postgres psql -U postgres trading_bot_v4 -c 'SELECT * FROM replication_test;'"
# Clean up test
docker exec trading-bot-postgres psql -U postgres trading_bot_v4 -c "DROP TABLE replication_test;"
```
### Monitoring Replication
```bash
# Check replication lag (run on PRIMARY)
docker exec trading-bot-postgres psql -U postgres -c "
SELECT
client_addr,
state,
pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) AS lag_bytes,
EXTRACT(EPOCH FROM (NOW() - pg_last_xact_replay_timestamp())) AS lag_seconds
FROM pg_stat_replication;
"
# Healthy values:
# - lag_bytes: < 100KB
# - lag_seconds: < 5 seconds
```
## Option 2: Periodic Backup (FALLBACK)
### What is it?
Scheduled pg_dump backups copied to secondary every 6 hours via cron job.
### Benefits
- **Simple:** Easy to setup and understand
- **No config changes:** Primary runs normally
- **Cross-platform:** Works even with different PostgreSQL versions
### Drawbacks
- **Data loss window:** Up to 6 hours of trades could be lost
- **Not instant:** Secondary is always 0-6 hours behind
### Setup
```bash
# On PRIMARY - Create sync script
cat > /root/sync-database-to-secondary.sh << 'SCRIPT'
#!/bin/bash
LOG="/var/log/secondary-db-sync.log"
echo "[$(date)] Starting database sync..." >> $LOG
# Dump database
docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > /tmp/trading_bot_backup.sql
if [ $? -eq 0 ]; then
echo "[$(date)] Database dump successful ($(wc -l < /tmp/trading_bot_backup.sql) lines)" >> $LOG
# Copy to secondary
scp /tmp/trading_bot_backup.sql root@72.62.39.24:/tmp/trading_bot_backup.sql
if [ $? -eq 0 ]; then
echo "[$(date)] Backup copied to secondary" >> $LOG
# Restore on secondary
ssh root@72.62.39.24 "docker exec -i trading-bot-postgres psql -U postgres -c 'DROP DATABASE IF EXISTS trading_bot_v4; CREATE DATABASE trading_bot_v4;' && docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/trading_bot_backup.sql"
if [ $? -eq 0 ]; then
echo "[$(date)] Database restored on secondary successfully" >> $LOG
else
echo "[$(date)] ERROR: Database restore failed" >> $LOG
fi
else
echo "[$(date)] ERROR: Failed to copy backup to secondary" >> $LOG
fi
else
echo "[$(date)] ERROR: Database dump failed" >> $LOG
fi
echo "[$(date)] Sync complete" >> $LOG
SCRIPT
chmod +x /root/sync-database-to-secondary.sh
# Test it
/root/sync-database-to-secondary.sh
# Check log
tail /var/log/secondary-db-sync.log
# Setup cron (every 6 hours)
crontab -l > /tmp/crontab.backup
echo "0 */6 * * * /root/sync-database-to-secondary.sh" >> /tmp/crontab.backup
crontab /tmp/crontab.backup
# Verify cron
crontab -l | grep sync-database
```
## Code Sync (Both Options)
Code needs to be synced separately (n8n workflows, trading bot code, etc.)
```bash
# Create code sync script
cat > /root/sync-code-to-secondary.sh << 'SCRIPT'
#!/bin/bash
LOG="/var/log/secondary-code-sync.log"
echo "[$(date)] Starting code sync..." >> $LOG
rsync -avz --delete \
--exclude 'node_modules' \
--exclude '.next' \
--exclude '.git' \
--exclude 'logs/*' \
--exclude 'postgres-data' \
/home/icke/traderv4/ root@72.62.39.24:/home/icke/traderv4/ >> $LOG 2>&1
# Sync .env file
rsync -avz /home/icke/traderv4/.env root@72.62.39.24:/home/icke/traderv4/.env >> $LOG 2>&1
echo "[$(date)] Code sync complete" >> $LOG
SCRIPT
chmod +x /root/sync-code-to-secondary.sh
# Run daily at 3 AM
crontab -l > /tmp/crontab.backup
echo "0 3 * * * /root/sync-code-to-secondary.sh" >> /tmp/crontab.backup
crontab /tmp/crontab.backup
```
## n8n Workflow Sync
n8n workflows are stored in SQLite database or files depending on setup.
### If using SQLite (file-based)
```bash
# Find n8n database location
ssh root@72.62.39.24 "docker exec n8n ls -la /home/node/.n8n/"
# Sync n8n data directory
rsync -avz /path/to/n8n/data/ root@72.62.39.24:/path/to/n8n/data/
```
### If using PostgreSQL for n8n
n8n can use PostgreSQL - already covered by database replication above.
## Failover Procedure
### Scenario: Primary goes down, need to activate Secondary
#### 1. Promote Secondary to Master (if using streaming replication)
```bash
# SSH to secondary
ssh root@72.62.39.24
# Promote to master
docker exec trading-bot-postgres pg_ctl promote -D /var/lib/postgresql/data
# Verify it's now accepting writes
docker exec trading-bot-postgres psql -U postgres trading_bot_v4 -c "SELECT pg_is_in_recovery();"
# Should return 'f' (false = not in recovery = is master)
# Start trading bot (if not already running)
cd /home/icke/traderv4
docker compose up -d trading-bot
```
#### 2. Update Cloudflare DNS
```bash
# Via health monitor script (automatic)
# OR manually via Cloudflare API/dashboard
# Point flow.egonetix.de to 72.62.39.24
```
#### 3. When Primary recovers
```bash
# Reconfigure primary as new replica
# Then switch DNS back
# Or keep secondary as new primary (depends on data drift)
```
## Comparison
| Feature | Streaming Replication | Periodic Backup |
|---------|----------------------|-----------------|
| **Lag** | 1-2 seconds | 0-6 hours |
| **Setup Complexity** | Medium | Simple |
| **Data Loss Risk** | Minimal (seconds) | High (hours) |
| **Failover Time** | Instant | Minutes |
| **Resource Usage** | Low | Low |
| **Best For** | Production HA | Testing/Dev |
## Recommendation
**Use PostgreSQL Streaming Replication** for production HA setup:
- Real-time sync (1-2 second lag)
- Zero manual intervention needed
- Instant failover capability
- Native PostgreSQL feature (well-tested)
**Fallback to Periodic Backup** only if:
- Streaming replication setup fails
- Network between servers is unreliable
- You're just testing the HA concept
## Monitoring
Add to health monitor script:
```python
# Check replication lag
def check_replication_lag():
result = subprocess.run([
"docker", "exec", "trading-bot-postgres",
"psql", "-U", "postgres", "-t", "-c",
"SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), replay_lsn) FROM pg_stat_replication;"
], capture_output=True, text=True)
lag_bytes = int(result.stdout.strip() or 0)
if lag_bytes > 100000: # 100KB
send_telegram(f"⚠️ HIGH REPLICATION LAG: {lag_bytes/1024:.1f}KB")
```
## Troubleshooting
### Secondary not connecting
```bash
# Check firewall on primary
# Allow port 5432 from 72.62.39.24
# Check pg_hba.conf on primary
docker exec trading-bot-postgres cat /var/lib/postgresql/data/pg_hba.conf | grep replication
# Check logs on secondary
docker logs trading-bot-postgres
```
### Replication lag increasing
```bash
# Check network between servers
ping 72.62.39.24
# Check disk space on secondary
ssh root@72.62.39.24 "df -h"
# Check WAL sender processes
docker exec trading-bot-postgres psql -U postgres -c "SELECT * FROM pg_stat_replication;"
```
### Replication stopped
```bash
# Restart from base backup
# Follow "Configure Secondary" steps again
```