feat: Add High Availability setup roadmap and scripts

Created comprehensive HA roadmap with 6 phases:
- Phase 1: Warm standby (CURRENT - manual failover)
- Phase 2: Database replication
- Phase 3: Health monitoring
- Phase 4: Reverse proxy + floating IP
- Phase 5: Automated failover
- Phase 6: Geographic redundancy

Includes:
- Decision gates based on capital and stability
- Cost-benefit analysis
- Scripts for healthcheck, failover, DB sync
- Recommendation to defer full HA until capital > $5k

Secondary server ready at 72.62.39.24 for emergency manual failover.

Related: User concern about system uptime, but full HA complexity
not justified at current scale (~$600 capital). Revisit in Q1 2026.
This commit is contained in:
mindesbunister
2025-11-19 20:52:12 +01:00
parent d28da02089
commit 880aae9a77
7 changed files with 936 additions and 0 deletions

218
HA_SETUP_ROADMAP.md Normal file
View File

@@ -0,0 +1,218 @@
# High Availability Setup Roadmap
**Status:** 🎯 FUTURE
**Priority:** Medium
**Estimated Effort:** 2-3 days full implementation
**Dependencies:** Stable production system, consistent profitability
---
## Current State (Nov 19, 2025)
**Warm Standby Ready:**
- Secondary server at `root@72.62.39.24` with rsync'd code
- Can manually failover in 10-15 minutes if primary fails
- Single-server operation prevents duplicate trades
**Not Automated:**
- Manual DNS/webhook updates required
- No automatic failover detection
- No reverse proxy/load balancer setup
---
## Phase 1: Warm Standby Maintenance (CURRENT)
**Goal:** Keep secondary server ready for manual failover
**Tasks:**
- [ ] Daily rsync from primary to secondary (automated)
- [ ] Weekly startup test on secondary (verify working)
- [ ] Document manual failover procedure
- [ ] Test database restore on secondary
**Acceptance Criteria:**
- Can start secondary and verify trading bot works within 5 minutes
- Secondary has code/config updated within 24 hours of primary
- Clear runbook for emergency failover
**Timeline:** 1 day setup + ongoing maintenance
---
## Phase 2: Database Replication (NEXT)
**Goal:** Zero data loss on failover
**Tasks:**
- [ ] Setup PostgreSQL streaming replication
- [ ] Configure replication user and permissions
- [ ] Test replica lag monitoring
- [ ] Automate replica promotion on failover
**Acceptance Criteria:**
- Secondary database max 5 seconds behind primary
- Trade history preserved during failover
- Automatic replica promotion script tested
**Timeline:** 2-3 days
---
## Phase 3: Health Monitoring & Alerts (NEXT)
**Goal:** Know when primary fails, prepare for manual intervention
**Tasks:**
- [ ] Deploy healthcheck script on both servers
- [ ] Setup monitoring dashboard (Grafana/simple webpage)
- [ ] Telegram alerts for primary failures
- [ ] Create failover decision flowchart
**Acceptance Criteria:**
- Telegram alert within 60 seconds of primary failure
- Dashboard shows primary/secondary status
- Clear steps for manual failover documented
**Timeline:** 1-2 days
---
## Phase 4: Reverse Proxy + Floating IP (FUTURE)
**Goal:** Automatic traffic routing to active server
**Options:**
### Option A: Floating IP (Simplest)
- Use cloud provider's floating IP (DigitalOcean, AWS EIP)
- IP automatically moves between servers
- Requires: Cloud infrastructure, not bare metal
### Option B: DNS-based Failover
- Use DNS provider with health checks (Cloudflare, Route53)
- Automatic DNS updates on failure
- 1-5 minute TTL delay for propagation
### Option C: Reverse Proxy
- HAProxy or nginx in front of both servers
- Health checks route to active server
- Requires: Third server for proxy (single point of failure)
**Tasks:**
- [ ] Evaluate infrastructure options (cloud vs bare metal)
- [ ] Choose failover mechanism (Floating IP vs DNS vs Proxy)
- [ ] Implement automatic traffic routing
- [ ] Test failover scenarios (primary crash, network partition)
**Acceptance Criteria:**
- TradingView webhooks automatically route to active server
- Failover completes within 2 minutes with zero manual intervention
- No duplicate trades during failover window
- n8n workflows continue without reconfiguration
**Timeline:** 3-5 days (depends on option chosen)
---
## Phase 5: Automated Failover Controller (FUTURE)
**Goal:** Fully autonomous HA system
**Tasks:**
- [ ] Deploy failover controller on secondary
- [ ] Configure automatic container startup on failure detection
- [ ] Implement split-brain prevention
- [ ] Test recovery scenarios (primary comes back online)
- [ ] Setup automatic database sync on recovery
**Acceptance Criteria:**
- Secondary automatically activates within 60 seconds of primary failure
- Primary automatically resumes when recovered
- No manual intervention required for 99% of failures
- Telegram notifications for all state changes
**Timeline:** 2-3 days
---
## Phase 6: Geographic Redundancy (DISTANT FUTURE)
**Goal:** Multi-region deployment for global reliability
**Considerations:**
- Secondary in different geographic region (US vs EU)
- Protects against regional outages
- Lower latency for global users
- Requires: More complex routing, higher costs
**Timeline:** 1+ weeks
---
## Decision Gates
**Proceed to Phase 2+ when:**
- Trading system profitable for 3+ consecutive months
- Capital > $10,000 (downtime = significant money loss)
- User frequently unavailable (travel, sleep schedule, etc.)
- Primary server has experienced 2+ unplanned outages
**Stay in Phase 1 when:**
- System still in testing/optimization phase
- User can manually intervene within 30 minutes most of the time
- Capital < $5,000 (manual failover acceptable)
- Primary server stable (99%+ uptime)
---
## Cost-Benefit Analysis
### Current State (Warm Standby)
- **Cost:** ~$10-20/month for secondary server
- **Benefit:** 10-15 min manual failover vs hours of setup from scratch
- **ROI:** Good - cheap insurance
### Full HA (All Phases)
- **Cost:** ~$50-100/month (servers, floating IP, monitoring)
- **Time:** 1-2 weeks of development
- **Benefit:** 99.9% uptime, automatic failover, peace of mind
- **ROI:** Only worth it when trading capital justifies the cost
### Break-Even Point
- If trading $10k+ capital at 15% monthly returns = $1,500/month
- 1 hour downtime = ~$2 lost opportunity
- 24 hour downtime = ~$50 lost + potential missed exit = $100-500 risk
- HA pays for itself after 1-2 major outages
---
## Current Recommendation (Nov 19, 2025)
**Stay in Phase 1** (Warm Standby) because:
- Capital still under $1,000
- System in active optimization (indicator testing, quality tuning)
- User available for manual intervention most of the time
- Primary server stable
**Revisit in Q1 2026** when:
- Capital reaches $5,000+ (Phase 2 target)
- System proven profitable over 3+ months
- Trading strategy stabilized (v8 indicator validated)
---
## Related Files
- `/home/icke/traderv4/ha-setup/` - HA scripts (created but not deployed)
- `TRADING_GOALS.md` - Financial roadmap (HA aligns with Phase 4-5)
- `OPTIMIZATION_MASTER_ROADMAP.md` - System improvements (HA is infrastructure)
---
## Notes
- **Manual failover is acceptable for now** - 10-15 min downtime won't cause financial loss at current scale
- **Focus on profitability first** - HA is luxury when system isn't making consistent money yet
- **Complexity vs benefit** - Full HA adds operational overhead that may not be worth it yet
- **Revisit quarterly** - As capital grows, HA becomes more important

298
ha-setup/README.md Normal file
View File

@@ -0,0 +1,298 @@
# High Availability Setup for Trading Bot v4
## Architecture: Active-Passive Failover
**Primary Server (Active):** Runs trading bot 24/7
**Secondary Server (Passive):** Monitors primary, takes over on failure
### Why Active-Passive (Not Active-Active)?
- **Prevents duplicate trades** - CRITICAL for financial system
- **Single source of truth** - One Position Manager tracking state
- **No split-brain scenarios** - Only one bot executes trades
- **Database consistency** - No conflicting writes
---
## Setup Instructions
### 1. Prerequisites
**Primary Server:** `root@192.168.1.100` (update in scripts)
**Secondary Server:** `root@72.62.39.24`
Both servers need:
- Docker & Docker Compose installed
- Trading bot project at `/home/icke/traderv4`
- Same `.env` file (especially DRIFT_WALLET_PRIVATE_KEY)
- Same n8n workflows configured
### 2. Initial Sync (Already Done via rsync ✅)
```bash
# From primary server
rsync -avz --exclude 'node_modules' --exclude '.next' \
/home/icke/traderv4/ root@72.62.39.24:/home/icke/traderv4/
```
### 3. Database Synchronization
**Option A: Manual Sync (Simpler, Recommended for Start)**
On primary:
```bash
docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > /tmp/trading_bot_backup.sql
rsync -avz /tmp/trading_bot_backup.sql root@72.62.39.24:/tmp/
```
On secondary:
```bash
docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/trading_bot_backup.sql
```
Run this daily via cron on primary:
```bash
0 2 * * * /home/icke/traderv4/ha-setup/sync-db-daily.sh
```
**Option B: Streaming Replication (Advanced)**
```bash
# On primary
bash ha-setup/setup-db-replication.sh primary
# On secondary
bash ha-setup/setup-db-replication.sh secondary
```
### 4. Setup Health Monitoring
Make scripts executable:
```bash
chmod +x ha-setup/*.sh
```
**Test healthcheck on both servers:**
```bash
bash ha-setup/healthcheck.sh
# Should output: ✅ HEALTHY: All checks passed
```
### 5. Start Failover Controller (SECONDARY ONLY)
**Edit configuration first:**
```bash
nano ha-setup/failover-controller.sh
# Update PRIMARY_HOST with actual IP
# Update SECONDARY_HOST if needed
```
**Run as systemd service:**
```bash
sudo cp ha-setup/trading-bot-ha.service /etc/systemd/system/
sudo systemctl daemon-reload
sudo systemctl enable trading-bot-ha
sudo systemctl start trading-bot-ha
```
**Check status:**
```bash
sudo systemctl status trading-bot-ha
sudo journalctl -u trading-bot-ha -f
```
### 6. SSH Key Setup (Password-less Auth)
Secondary needs SSH access to primary for health checks:
```bash
# On secondary
ssh-keygen -t ed25519 -f /root/.ssh/trading_bot_ha
ssh-copy-id -i /root/.ssh/trading_bot_ha root@192.168.1.100
# Test connection
ssh root@192.168.1.100 "docker ps | grep trading-bot"
```
---
## How It Works
### Normal Operation (Primary Active)
1. **Primary:** Trading bot running, executing trades
2. **Secondary:** Failover controller checks primary every 15s
3. **Secondary:** Bot container STOPPED (passive standby)
### Failover Scenario
1. **Primary fails** (server down, docker crash, API unresponsive)
2. **Secondary detects** 3 consecutive failed health checks (45s)
3. **Telegram alert sent:** "🚨 HA FAILOVER: Primary failed, activating secondary"
4. **Secondary starts** trading bot container
5. **Trading continues** on secondary with same wallet/config
### Recovery Scenario
1. **Primary recovers** (you fix it, restart, etc.)
2. **Secondary detects** primary is healthy again
3. **Secondary stops** its trading bot (returns to standby)
4. **Telegram alert:** "Primary recovered, secondary deactivated"
5. **Primary resumes** as active node
---
## Monitoring & Maintenance
### Check HA Status
**On secondary:**
```bash
# View failover controller logs
sudo journalctl -u trading-bot-ha -f --lines=50
# Check if secondary is active
docker ps | grep trading-bot-v4
```
**On primary:**
```bash
# Run healthcheck manually
bash ha-setup/healthcheck.sh
# Check container status
docker ps | grep trading-bot-v4
```
### Manual Failover Testing
**Simulate primary failure:**
```bash
# On primary, stop trading bot
docker compose stop trading-bot
# Watch secondary logs - should activate within 45s
# On secondary
sudo journalctl -u trading-bot-ha -f
```
**Restore primary:**
```bash
# On primary, restart trading bot
docker compose up -d trading-bot
# Watch secondary - should deactivate within 15s
```
### Database Sync Schedule
**Daily sync from primary to secondary:**
On primary, add to crontab:
```bash
crontab -e
# Add:
0 2 * * * /home/icke/traderv4/ha-setup/sync-db-daily.sh >> /var/log/trading-bot-db-sync.log 2>&1
```
**Before failover events:** Secondary uses last synced DB state (max 24h old trade history)
**After failover:** Secondary continues with current state, syncs back to primary when recovered
---
## Important Notes
### Financial Safety
- **NEVER run both servers actively** - would cause duplicate trades and wallet conflicts
- **Failover controller ensures** only one active at a time
- **Same wallet key** required on both servers
- **Same n8n webhook endpoint** - update TradingView alerts if needed
### Database Consistency
- **Daily sync:** Keeps secondary within 24h of primary
- **Trade history:** May have small gap after failover (acceptable)
- **Position Manager:** Rebuilds state from Drift Protocol on startup
- **No financial loss:** Drift Protocol is source of truth for positions
### Network Requirements
- **Secondary → Primary:** SSH access (port 22) for health checks
- **Both → Internet:** For Drift Protocol, Telegram, n8n webhooks
- **n8n:** Can run on both or centralized (needs webhook routing)
### Testing Recommendations
1. **Week 1:** Run without failover, just monitor health checks
2. **Week 2:** Test manual failover (stop primary, verify secondary takes over)
3. **Week 3:** Test recovery (restart primary, verify secondary stops)
4. **Week 4:** Enable automatic failover for production
---
## Troubleshooting
### Secondary Won't Start After Failover
```bash
# Check logs
docker logs trading-bot-v4
# Check .env file exists
ls -la /home/icke/traderv4/.env
# Check Drift initialization
docker logs trading-bot-v4 | grep "Drift"
```
### Split-Brain (Both Servers Active)
**EMERGENCY - Stop both immediately:**
```bash
# On both servers
docker compose stop trading-bot
```
**Then restart only primary:**
```bash
# On primary only
docker compose up -d trading-bot
```
**Check Drift positions:**
```bash
curl -s http://localhost:3001/api/trading/positions \
-H "Authorization: Bearer ${API_SECRET_KEY}" | jq .
```
### Health Check False Positives
Adjust thresholds in `failover-controller.sh`:
```bash
CHECK_INTERVAL=30 # Slower checks (reduce network load)
MAX_FAILURES=5 # More tolerant (reduce false failovers)
```
---
## Cost Analysis
**Primary Server:** Always running (existing cost)
**Secondary Server:** Always running, but mostly idle
**Benefits:**
- **99.9% uptime** vs 95% single server
- **~4.5 hours/year** max downtime (failover time)
- **Financial protection** - no missed trades during outages
- **Peace of mind** - sleep without worrying about server crashes
**Worth it?** YES - For a financial system, redundancy is essential.
---
## Future Enhancements
1. **Geographic redundancy:** Secondary in different datacenter/region
2. **Load balancer:** Route n8n webhooks to active server automatically
3. **Database streaming replication:** Real-time sync (0 data loss)
4. **Multi-region:** Three servers (US, EU, Asia) for global coverage
5. **Health dashboard:** Web UI showing HA status and metrics

View File

@@ -0,0 +1,126 @@
#!/bin/bash
#
# HA Failover Controller
# Monitors primary server and activates secondary on failure
#
# IMPORTANT: Run this ONLY on SECONDARY server
# Primary should always be active unless failed
#
set -eu
PRIMARY_HOST="root@192.168.1.100" # Update with primary IP
SECONDARY_HOST="root@72.62.39.24"
CHECK_INTERVAL=15 # seconds between checks
MAX_FAILURES=3 # failures before failover
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_DIR="/home/icke/traderv4"
FAILURE_COUNT=0
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" | tee -a /var/log/trading-bot-ha.log
}
telegram_notify() {
local message="$1"
# Use the Telegram bot to send notification
if [ -f "${PROJECT_DIR}/.env" ]; then
source "${PROJECT_DIR}/.env"
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d chat_id="${TELEGRAM_CHAT_ID}" \
-d text="🚨 HA FAILOVER: ${message}" \
-d parse_mode="HTML" > /dev/null
fi
}
check_primary_health() {
# SSH to primary and run healthcheck
ssh -o ConnectTimeout=5 -o BatchMode=yes "${PRIMARY_HOST}" \
"cd ${PROJECT_DIR} && bash ha-setup/healthcheck.sh" &>/dev/null
return $?
}
is_secondary_active() {
docker ps --filter "name=trading-bot-v4" --filter "status=running" | grep -q "trading-bot-v4"
return $?
}
start_secondary() {
log "🚀 Starting secondary (failover activation)..."
cd "${PROJECT_DIR}"
docker compose up -d trading-bot
sleep 10
if docker ps --filter "name=trading-bot-v4" --filter "status=running" | grep -q "trading-bot-v4"; then
log "✅ Secondary activated successfully"
telegram_notify "Secondary server activated (primary failed ${MAX_FAILURES} health checks)"
return 0
else
log "❌ Failed to start secondary"
telegram_notify "⚠️ CRITICAL: Secondary failed to start after primary failure!"
return 1
fi
}
stop_secondary() {
log "🛑 Stopping secondary (primary recovered)..."
cd "${PROJECT_DIR}"
docker compose stop trading-bot
if ! is_secondary_active; then
log "✅ Secondary stopped successfully"
telegram_notify "Primary server recovered, secondary deactivated"
return 0
else
log "❌ Failed to stop secondary"
return 1
fi
}
main_loop() {
log "🎯 HA Failover Controller started (Secondary mode)"
log "Monitoring primary: ${PRIMARY_HOST}"
log "Check interval: ${CHECK_INTERVAL}s, Max failures: ${MAX_FAILURES}"
while true; do
if check_primary_health; then
# Primary is healthy
if [ $FAILURE_COUNT -gt 0 ]; then
log "✅ Primary recovered (was at ${FAILURE_COUNT} failures)"
FAILURE_COUNT=0
fi
# If secondary is running, stop it (primary should be active)
if is_secondary_active; then
log "⚠️ Secondary is active but primary is healthy - stopping secondary"
stop_secondary
fi
else
# Primary is unhealthy
FAILURE_COUNT=$((FAILURE_COUNT + 1))
log "❌ Primary health check failed (${FAILURE_COUNT}/${MAX_FAILURES})"
if [ $FAILURE_COUNT -ge $MAX_FAILURES ]; then
if ! is_secondary_active; then
log "🚨 PRIMARY FAILED - Activating secondary..."
telegram_notify "Primary server failed ${MAX_FAILURES} consecutive health checks. Activating secondary..."
start_secondary
else
log " Secondary already active (primary still failing)"
fi
fi
fi
sleep $CHECK_INTERVAL
done
}
# Ensure running as root (needs docker access)
if [ "$EUID" -ne 0 ]; then
log "❌ Must run as root (needs docker and SSH access)"
exit 1
fi
main_loop

90
ha-setup/healthcheck.sh Normal file
View File

@@ -0,0 +1,90 @@
#!/bin/bash
#
# Trading Bot Health Check Script
# Checks if trading bot is healthy and responding
#
# Usage: ./healthcheck.sh
# Exit codes: 0 = healthy, 1 = unhealthy
set -eu
TRADING_BOT_HOST="${TRADING_BOT_HOST:-localhost:3001}"
MAX_FAILURES="${MAX_FAILURES:-3}"
CHECK_INTERVAL="${CHECK_INTERVAL:-10}"
# Source API key from .env
if [ -f "/home/icke/traderv4/.env" ]; then
export $(grep "^API_SECRET_KEY=" /home/icke/traderv4/.env | xargs)
fi
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"
}
# Check if container is running
check_container() {
docker ps --filter "name=trading-bot-v4" --filter "status=running" | grep -q "trading-bot-v4"
return $?
}
# Check if API is responding
check_api() {
local response
response=$(curl -s -f -m 5 \
-H "Authorization: Bearer ${API_SECRET_KEY}" \
"http://${TRADING_BOT_HOST}/api/drift/account-summary" 2>&1)
if [ $? -eq 0 ] && echo "$response" | grep -q '"success":true'; then
return 0
else
return 1
fi
}
# Check if Position Manager is monitoring (if positions exist)
check_position_manager() {
local logs
logs=$(docker logs --tail=50 trading-bot-v4 2>&1)
# Check for recent monitoring activity (within last 30 seconds)
if echo "$logs" | grep -q "🔍 Monitoring"; then
return 0
fi
# If no monitoring logs but no positions open, that's OK
if echo "$logs" | grep -q "No positions to monitor"; then
return 0
fi
# If container just started (less than 1 min), give it time
if docker inspect trading-bot-v4 --format='{{.State.StartedAt}}' | grep -q "$(date -u +%Y-%m-%dT%H)"; then
return 0
fi
return 1
}
# Main health check
main() {
log "Starting health check..."
if ! check_container; then
log "❌ UNHEALTHY: Container not running"
exit 1
fi
if ! check_api; then
log "❌ UNHEALTHY: API not responding"
exit 1
fi
if ! check_position_manager; then
log "⚠️ WARNING: Position Manager may not be monitoring (check logs)"
# Don't fail on this - API working is primary health indicator
fi
log "✅ HEALTHY: All checks passed"
exit 0
}
main "$@"

View File

@@ -0,0 +1,104 @@
#!/bin/bash
#
# Database Replication Setup for PostgreSQL Streaming Replication
# Run on PRIMARY server first, then on SECONDARY
#
set -eu
MODE="${1:-help}"
PRIMARY_IP="192.168.1.100" # Update with primary server IP
SECONDARY_IP="72.62.39.24"
POSTGRES_PASSWORD="your_postgres_password" # Update from .env
REPLICATION_USER="replicator"
REPLICATION_PASSWORD="your_replication_password" # Generate strong password
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"
}
setup_primary() {
log "🔧 Setting up PRIMARY database for replication..."
# Create replication user
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 <<-EOF
-- Create replication user
CREATE USER ${REPLICATION_USER} WITH REPLICATION ENCRYPTED PASSWORD '${REPLICATION_PASSWORD}';
-- Grant necessary privileges
GRANT CONNECT ON DATABASE trading_bot_v4 TO ${REPLICATION_USER};
EOF
log "✅ Replication user created"
# Configure pg_hba.conf for replication
docker exec trading-bot-postgres bash -c "cat >> /var/lib/postgresql/data/pg_hba.conf <<EOF
# Replication connection from secondary
host replication ${REPLICATION_USER} ${SECONDARY_IP}/32 md5
EOF"
# Configure postgresql.conf for replication
docker exec trading-bot-postgres bash -c "cat >> /var/lib/postgresql/data/postgresql.conf <<EOF
# Replication settings
wal_level = replica
max_wal_senders = 3
wal_keep_size = 64
EOF"
log "✅ PostgreSQL configured for replication"
log "⚠️ Restart PostgreSQL container: docker restart trading-bot-postgres"
}
setup_secondary() {
log "🔧 Setting up SECONDARY database (replica)..."
# Stop secondary postgres if running
docker compose stop trading-bot-postgres || true
# Remove old data
log "⚠️ Removing old PostgreSQL data on secondary..."
docker volume rm traderv4_postgres-data || true
# Create base backup from primary
log "📦 Creating base backup from primary..."
docker run --rm \
-e PGPASSWORD="${REPLICATION_PASSWORD}" \
postgres:16-alpine \
pg_basebackup -h ${PRIMARY_IP} -U ${REPLICATION_USER} -D /backup -Fp -Xs -P -R
# Restore to secondary
# ... (needs volume mounting, complex - see simplified approach below)
log "✅ Secondary database configured as replica"
}
simplified_sync() {
log "📋 Simplified approach: Database sync via pg_dump/restore"
log "Run on PRIMARY to create backup:"
echo " docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > /tmp/trading_bot_backup.sql"
echo " rsync -avz /tmp/trading_bot_backup.sql ${SECONDARY_IP}:/tmp/"
log "Run on SECONDARY to restore:"
echo " docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/trading_bot_backup.sql"
}
case "$MODE" in
primary)
setup_primary
;;
secondary)
setup_secondary
;;
sync)
simplified_sync
;;
help|*)
echo "Usage: $0 {primary|secondary|sync}"
echo ""
echo "primary - Configure primary DB for replication"
echo "secondary - Configure secondary DB as replica"
echo "sync - Show commands for manual DB sync"
echo ""
echo "⚠️ IMPORTANT: Update IP addresses and passwords in script first!"
exit 1
;;
esac

76
ha-setup/sync-db-daily.sh Normal file
View File

@@ -0,0 +1,76 @@
#!/bin/bash
#
# Daily Database Sync from Primary to Secondary
# Run on PRIMARY server via cron
#
set -eu
PRIMARY_HOST="localhost"
SECONDARY_HOST="root@72.62.39.24"
PROJECT_DIR="/home/icke/traderv4"
BACKUP_FILE="/tmp/trading_bot_backup_$(date +%Y%m%d_%H%M%S).sql"
LOG_FILE="/var/log/trading-bot-db-sync.log"
log() {
echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG_FILE"
}
# Telegram notification
telegram_notify() {
local message="$1"
if [ -f "${PROJECT_DIR}/.env" ]; then
source "${PROJECT_DIR}/.env"
curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
-d chat_id="${TELEGRAM_CHAT_ID}" \
-d text="📊 DB Sync: ${message}" \
-d parse_mode="HTML" > /dev/null
fi
}
main() {
log "🔄 Starting daily database backup..."
# Create backup
if docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > "$BACKUP_FILE" 2>>"$LOG_FILE"; then
local size=$(du -h "$BACKUP_FILE" | cut -f1)
log "✅ Backup created: $BACKUP_FILE ($size)"
else
log "❌ Backup failed!"
telegram_notify "⚠️ Database backup failed on primary"
exit 1
fi
# Transfer to secondary
log "📤 Transferring to secondary..."
if rsync -avz --compress "$BACKUP_FILE" "${SECONDARY_HOST}:/tmp/" >> "$LOG_FILE" 2>&1; then
log "✅ Transfer complete"
else
log "❌ Transfer failed!"
telegram_notify "⚠️ Database transfer to secondary failed"
exit 1
fi
# Restore on secondary
log "📥 Restoring on secondary..."
if ssh "${SECONDARY_HOST}" "docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/$(basename $BACKUP_FILE)" >> "$LOG_FILE" 2>&1; then
log "✅ Restore complete on secondary"
else
log "❌ Restore failed on secondary!"
telegram_notify "⚠️ Database restore failed on secondary"
exit 1
fi
# Cleanup old backups (keep last 7 days)
find /tmp -name "trading_bot_backup_*.sql" -mtime +7 -delete
ssh "${SECONDARY_HOST}" "find /tmp -name 'trading_bot_backup_*.sql' -mtime +7 -delete"
log "🎉 Daily sync completed successfully"
# Only notify on first sync of the day or if there were issues
if [ "$(date +%H)" -eq 2 ]; then
telegram_notify "✅ Daily database sync completed"
fi
}
main "$@"

View File

@@ -0,0 +1,24 @@
[Unit]
Description=Trading Bot v4 HA Failover Controller
After=network.target docker.service
Requires=docker.service
[Service]
Type=simple
User=root
WorkingDirectory=/home/icke/traderv4
ExecStart=/bin/bash /home/icke/traderv4/ha-setup/failover-controller.sh
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
# Logging
SyslogIdentifier=trading-bot-ha
# Security
PrivateTmp=yes
NoNewPrivileges=yes
[Install]
WantedBy=multi-user.target