feat: Add High Availability setup roadmap and scripts

Created comprehensive HA roadmap with 6 phases: - Phase 1: Warm standby (CURRENT - manual failover) - Phase 2: Database replication - Phase 3: Health monitoring - Phase 4: Reverse proxy + floating IP - Phase 5: Automated failover - Phase 6: Geographic redundancy Includes: - Decision gates based on capital and stability - Cost-benefit analysis - Scripts for healthcheck, failover, DB sync - Recommendation to defer full HA until capital > $5k Secondary server ready at 72.62.39.24 for emergency manual failover. Related: User concern about system uptime, but full HA complexity not justified at current scale (~$600 capital). Revisit in Q1 2026.
2025-11-19 20:52:12 +01:00
parent d28da02089
commit 880aae9a77
7 changed files with 936 additions and 0 deletions
--- a/HA_SETUP_ROADMAP.md
+++ b/HA_SETUP_ROADMAP.md
@@ -0,0 +1,218 @@
+# High Availability Setup Roadmap
+
+**Status:** 🎯 FUTURE  
+**Priority:** Medium  
+**Estimated Effort:** 2-3 days full implementation  
+**Dependencies:** Stable production system, consistent profitability
+
+---
+
+## Current State (Nov 19, 2025)
+
+✅ **Warm Standby Ready:**
+- Secondary server at `root@72.62.39.24` with rsync'd code
+- Can manually failover in 10-15 minutes if primary fails
+- Single-server operation prevents duplicate trades
+
+❌ **Not Automated:**
+- Manual DNS/webhook updates required
+- No automatic failover detection
+- No reverse proxy/load balancer setup
+
+---
+
+## Phase 1: Warm Standby Maintenance (CURRENT)
+
+**Goal:** Keep secondary server ready for manual failover
+
+**Tasks:**
+- [ ] Daily rsync from primary to secondary (automated)
+- [ ] Weekly startup test on secondary (verify working)
+- [ ] Document manual failover procedure
+- [ ] Test database restore on secondary
+
+**Acceptance Criteria:**
+- Can start secondary and verify trading bot works within 5 minutes
+- Secondary has code/config updated within 24 hours of primary
+- Clear runbook for emergency failover
+
+**Timeline:** 1 day setup + ongoing maintenance
+
+---
+
+## Phase 2: Database Replication (NEXT)
+
+**Goal:** Zero data loss on failover
+
+**Tasks:**
+- [ ] Setup PostgreSQL streaming replication
+- [ ] Configure replication user and permissions
+- [ ] Test replica lag monitoring
+- [ ] Automate replica promotion on failover
+
+**Acceptance Criteria:**
+- Secondary database max 5 seconds behind primary
+- Trade history preserved during failover
+- Automatic replica promotion script tested
+
+**Timeline:** 2-3 days
+
+---
+
+## Phase 3: Health Monitoring & Alerts (NEXT)
+
+**Goal:** Know when primary fails, prepare for manual intervention
+
+**Tasks:**
+- [ ] Deploy healthcheck script on both servers
+- [ ] Setup monitoring dashboard (Grafana/simple webpage)
+- [ ] Telegram alerts for primary failures
+- [ ] Create failover decision flowchart
+
+**Acceptance Criteria:**
+- Telegram alert within 60 seconds of primary failure
+- Dashboard shows primary/secondary status
+- Clear steps for manual failover documented
+
+**Timeline:** 1-2 days
+
+---
+
+## Phase 4: Reverse Proxy + Floating IP (FUTURE)
+
+**Goal:** Automatic traffic routing to active server
+
+**Options:**
+
+### Option A: Floating IP (Simplest)
+- Use cloud provider's floating IP (DigitalOcean, AWS EIP)
+- IP automatically moves between servers
+- Requires: Cloud infrastructure, not bare metal
+
+### Option B: DNS-based Failover
+- Use DNS provider with health checks (Cloudflare, Route53)
+- Automatic DNS updates on failure
+- 1-5 minute TTL delay for propagation
+
+### Option C: Reverse Proxy
+- HAProxy or nginx in front of both servers
+- Health checks route to active server
+- Requires: Third server for proxy (single point of failure)
+
+**Tasks:**
+- [ ] Evaluate infrastructure options (cloud vs bare metal)
+- [ ] Choose failover mechanism (Floating IP vs DNS vs Proxy)
+- [ ] Implement automatic traffic routing
+- [ ] Test failover scenarios (primary crash, network partition)
+
+**Acceptance Criteria:**
+- TradingView webhooks automatically route to active server
+- Failover completes within 2 minutes with zero manual intervention
+- No duplicate trades during failover window
+- n8n workflows continue without reconfiguration
+
+**Timeline:** 3-5 days (depends on option chosen)
+
+---
+
+## Phase 5: Automated Failover Controller (FUTURE)
+
+**Goal:** Fully autonomous HA system
+
+**Tasks:**
+- [ ] Deploy failover controller on secondary
+- [ ] Configure automatic container startup on failure detection
+- [ ] Implement split-brain prevention
+- [ ] Test recovery scenarios (primary comes back online)
+- [ ] Setup automatic database sync on recovery
+
+**Acceptance Criteria:**
+- Secondary automatically activates within 60 seconds of primary failure
+- Primary automatically resumes when recovered
+- No manual intervention required for 99% of failures
+- Telegram notifications for all state changes
+
+**Timeline:** 2-3 days
+
+---
+
+## Phase 6: Geographic Redundancy (DISTANT FUTURE)
+
+**Goal:** Multi-region deployment for global reliability
+
+**Considerations:**
+- Secondary in different geographic region (US vs EU)
+- Protects against regional outages
+- Lower latency for global users
+- Requires: More complex routing, higher costs
+
+**Timeline:** 1+ weeks
+
+---
+
+## Decision Gates
+
+**Proceed to Phase 2+ when:**
+- Trading system profitable for 3+ consecutive months
+- Capital > $10,000 (downtime = significant money loss)
+- User frequently unavailable (travel, sleep schedule, etc.)
+- Primary server has experienced 2+ unplanned outages
+
+**Stay in Phase 1 when:**
+- System still in testing/optimization phase
+- User can manually intervene within 30 minutes most of the time
+- Capital < $5,000 (manual failover acceptable)
+- Primary server stable (99%+ uptime)
+
+---
+
+## Cost-Benefit Analysis
+
+### Current State (Warm Standby)
+- **Cost:** ~$10-20/month for secondary server
+- **Benefit:** 10-15 min manual failover vs hours of setup from scratch
+- **ROI:** Good - cheap insurance
+
+### Full HA (All Phases)
+- **Cost:** ~$50-100/month (servers, floating IP, monitoring)
+- **Time:** 1-2 weeks of development
+- **Benefit:** 99.9% uptime, automatic failover, peace of mind
+- **ROI:** Only worth it when trading capital justifies the cost
+
+### Break-Even Point
+- If trading $10k+ capital at 15% monthly returns = $1,500/month
+- 1 hour downtime = ~$2 lost opportunity
+- 24 hour downtime = ~$50 lost + potential missed exit = $100-500 risk
+- HA pays for itself after 1-2 major outages
+
+---
+
+## Current Recommendation (Nov 19, 2025)
+
+**Stay in Phase 1** (Warm Standby) because:
+- Capital still under $1,000
+- System in active optimization (indicator testing, quality tuning)
+- User available for manual intervention most of the time
+- Primary server stable
+
+**Revisit in Q1 2026** when:
+- Capital reaches $5,000+ (Phase 2 target)
+- System proven profitable over 3+ months
+- Trading strategy stabilized (v8 indicator validated)
+
+---
+
+## Related Files
+
+- `/home/icke/traderv4/ha-setup/` - HA scripts (created but not deployed)
+- `TRADING_GOALS.md` - Financial roadmap (HA aligns with Phase 4-5)
+- `OPTIMIZATION_MASTER_ROADMAP.md` - System improvements (HA is infrastructure)
+
+---
+
+## Notes
+
+- **Manual failover is acceptable for now** - 10-15 min downtime won't cause financial loss at current scale
+- **Focus on profitability first** - HA is luxury when system isn't making consistent money yet
+- **Complexity vs benefit** - Full HA adds operational overhead that may not be worth it yet
+- **Revisit quarterly** - As capital grows, HA becomes more important
--- a/ha-setup/README.md
+++ b/ha-setup/README.md
@@ -0,0 +1,298 @@
+# High Availability Setup for Trading Bot v4
+
+## Architecture: Active-Passive Failover
+
+**Primary Server (Active):** Runs trading bot 24/7
+**Secondary Server (Passive):** Monitors primary, takes over on failure
+
+### Why Active-Passive (Not Active-Active)?
+- **Prevents duplicate trades** - CRITICAL for financial system
+- **Single source of truth** - One Position Manager tracking state
+- **No split-brain scenarios** - Only one bot executes trades
+- **Database consistency** - No conflicting writes
+
+---
+
+## Setup Instructions
+
+### 1. Prerequisites
+
+**Primary Server:** `root@192.168.1.100` (update in scripts)
+**Secondary Server:** `root@72.62.39.24`
+
+Both servers need:
+- Docker & Docker Compose installed
+- Trading bot project at `/home/icke/traderv4`
+- Same `.env` file (especially DRIFT_WALLET_PRIVATE_KEY)
+- Same n8n workflows configured
+
+### 2. Initial Sync (Already Done via rsync ✅)
+
+```bash
+# From primary server
+rsync -avz --exclude 'node_modules' --exclude '.next' \
+  /home/icke/traderv4/ root@72.62.39.24:/home/icke/traderv4/
+```
+
+### 3. Database Synchronization
+
+**Option A: Manual Sync (Simpler, Recommended for Start)**
+
+On primary:
+```bash
+docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > /tmp/trading_bot_backup.sql
+rsync -avz /tmp/trading_bot_backup.sql root@72.62.39.24:/tmp/
+```
+
+On secondary:
+```bash
+docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/trading_bot_backup.sql
+```
+
+Run this daily via cron on primary:
+```bash
+0 2 * * * /home/icke/traderv4/ha-setup/sync-db-daily.sh
+```
+
+**Option B: Streaming Replication (Advanced)**
+```bash
+# On primary
+bash ha-setup/setup-db-replication.sh primary
+
+# On secondary
+bash ha-setup/setup-db-replication.sh secondary
+```
+
+### 4. Setup Health Monitoring
+
+Make scripts executable:
+```bash
+chmod +x ha-setup/*.sh
+```
+
+**Test healthcheck on both servers:**
+```bash
+bash ha-setup/healthcheck.sh
+# Should output: ✅ HEALTHY: All checks passed
+```
+
+### 5. Start Failover Controller (SECONDARY ONLY)
+
+**Edit configuration first:**
+```bash
+nano ha-setup/failover-controller.sh
+# Update PRIMARY_HOST with actual IP
+# Update SECONDARY_HOST if needed
+```
+
+**Run as systemd service:**
+```bash
+sudo cp ha-setup/trading-bot-ha.service /etc/systemd/system/
+sudo systemctl daemon-reload
+sudo systemctl enable trading-bot-ha
+sudo systemctl start trading-bot-ha
+```
+
+**Check status:**
+```bash
+sudo systemctl status trading-bot-ha
+sudo journalctl -u trading-bot-ha -f
+```
+
+### 6. SSH Key Setup (Password-less Auth)
+
+Secondary needs SSH access to primary for health checks:
+
+```bash
+# On secondary
+ssh-keygen -t ed25519 -f /root/.ssh/trading_bot_ha
+ssh-copy-id -i /root/.ssh/trading_bot_ha root@192.168.1.100
+
+# Test connection
+ssh root@192.168.1.100 "docker ps | grep trading-bot"
+```
+
+---
+
+## How It Works
+
+### Normal Operation (Primary Active)
+
+1. **Primary:** Trading bot running, executing trades
+2. **Secondary:** Failover controller checks primary every 15s
+3. **Secondary:** Bot container STOPPED (passive standby)
+
+### Failover Scenario
+
+1. **Primary fails** (server down, docker crash, API unresponsive)
+2. **Secondary detects** 3 consecutive failed health checks (45s)
+3. **Telegram alert sent:** "🚨 HA FAILOVER: Primary failed, activating secondary"
+4. **Secondary starts** trading bot container
+5. **Trading continues** on secondary with same wallet/config
+
+### Recovery Scenario
+
+1. **Primary recovers** (you fix it, restart, etc.)
+2. **Secondary detects** primary is healthy again
+3. **Secondary stops** its trading bot (returns to standby)
+4. **Telegram alert:** "Primary recovered, secondary deactivated"
+5. **Primary resumes** as active node
+
+---
+
+## Monitoring & Maintenance
+
+### Check HA Status
+
+**On secondary:**
+```bash
+# View failover controller logs
+sudo journalctl -u trading-bot-ha -f --lines=50
+
+# Check if secondary is active
+docker ps | grep trading-bot-v4
+```
+
+**On primary:**
+```bash
+# Run healthcheck manually
+bash ha-setup/healthcheck.sh
+
+# Check container status
+docker ps | grep trading-bot-v4
+```
+
+### Manual Failover Testing
+
+**Simulate primary failure:**
+```bash
+# On primary, stop trading bot
+docker compose stop trading-bot
+
+# Watch secondary logs - should activate within 45s
+# On secondary
+sudo journalctl -u trading-bot-ha -f
+```
+
+**Restore primary:**
+```bash
+# On primary, restart trading bot
+docker compose up -d trading-bot
+
+# Watch secondary - should deactivate within 15s
+```
+
+### Database Sync Schedule
+
+**Daily sync from primary to secondary:**
+
+On primary, add to crontab:
+```bash
+crontab -e
+# Add:
+0 2 * * * /home/icke/traderv4/ha-setup/sync-db-daily.sh >> /var/log/trading-bot-db-sync.log 2>&1
+```
+
+**Before failover events:** Secondary uses last synced DB state (max 24h old trade history)
+**After failover:** Secondary continues with current state, syncs back to primary when recovered
+
+---
+
+## Important Notes
+
+### Financial Safety
+
+- **NEVER run both servers actively** - would cause duplicate trades and wallet conflicts
+- **Failover controller ensures** only one active at a time
+- **Same wallet key** required on both servers
+- **Same n8n webhook endpoint** - update TradingView alerts if needed
+
+### Database Consistency
+
+- **Daily sync:** Keeps secondary within 24h of primary
+- **Trade history:** May have small gap after failover (acceptable)
+- **Position Manager:** Rebuilds state from Drift Protocol on startup
+- **No financial loss:** Drift Protocol is source of truth for positions
+
+### Network Requirements
+
+- **Secondary → Primary:** SSH access (port 22) for health checks
+- **Both → Internet:** For Drift Protocol, Telegram, n8n webhooks
+- **n8n:** Can run on both or centralized (needs webhook routing)
+
+### Testing Recommendations
+
+1. **Week 1:** Run without failover, just monitor health checks
+2. **Week 2:** Test manual failover (stop primary, verify secondary takes over)
+3. **Week 3:** Test recovery (restart primary, verify secondary stops)
+4. **Week 4:** Enable automatic failover for production
+
+---
+
+## Troubleshooting
+
+### Secondary Won't Start After Failover
+
+```bash
+# Check logs
+docker logs trading-bot-v4
+
+# Check .env file exists
+ls -la /home/icke/traderv4/.env
+
+# Check Drift initialization
+docker logs trading-bot-v4 | grep "Drift"
+```
+
+### Split-Brain (Both Servers Active)
+
+**EMERGENCY - Stop both immediately:**
+```bash
+# On both servers
+docker compose stop trading-bot
+```
+
+**Then restart only primary:**
+```bash
+# On primary only
+docker compose up -d trading-bot
+```
+
+**Check Drift positions:**
+```bash
+curl -s http://localhost:3001/api/trading/positions \
+  -H "Authorization: Bearer ${API_SECRET_KEY}" | jq .
+```
+
+### Health Check False Positives
+
+Adjust thresholds in `failover-controller.sh`:
+```bash
+CHECK_INTERVAL=30  # Slower checks (reduce network load)
+MAX_FAILURES=5     # More tolerant (reduce false failovers)
+```
+
+---
+
+## Cost Analysis
+
+**Primary Server:** Always running (existing cost)
+**Secondary Server:** Always running, but mostly idle
+
+**Benefits:**
+- **99.9% uptime** vs 95% single server
+- **~4.5 hours/year** max downtime (failover time)
+- **Financial protection** - no missed trades during outages
+- **Peace of mind** - sleep without worrying about server crashes
+
+**Worth it?** YES - For a financial system, redundancy is essential.
+
+---
+
+## Future Enhancements
+
+1. **Geographic redundancy:** Secondary in different datacenter/region
+2. **Load balancer:** Route n8n webhooks to active server automatically
+3. **Database streaming replication:** Real-time sync (0 data loss)
+4. **Multi-region:** Three servers (US, EU, Asia) for global coverage
+5. **Health dashboard:** Web UI showing HA status and metrics
--- a/ha-setup/failover-controller.sh
+++ b/ha-setup/failover-controller.sh
@@ -0,0 +1,126 @@
+#!/bin/bash
+#
+# HA Failover Controller
+# Monitors primary server and activates secondary on failure
+#
+# IMPORTANT: Run this ONLY on SECONDARY server
+# Primary should always be active unless failed
+#
+
+set -eu
+
+PRIMARY_HOST="root@192.168.1.100"  # Update with primary IP
+SECONDARY_HOST="root@72.62.39.24"
+CHECK_INTERVAL=15  # seconds between checks
+MAX_FAILURES=3     # failures before failover
+
+SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+PROJECT_DIR="/home/icke/traderv4"
+FAILURE_COUNT=0
+
+log() {
+  echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" | tee -a /var/log/trading-bot-ha.log
+}
+
+telegram_notify() {
+  local message="$1"
+  # Use the Telegram bot to send notification
+  if [ -f "${PROJECT_DIR}/.env" ]; then
+    source "${PROJECT_DIR}/.env"
+    curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
+      -d chat_id="${TELEGRAM_CHAT_ID}" \
+      -d text="🚨 HA FAILOVER: ${message}" \
+      -d parse_mode="HTML" > /dev/null
+  fi
+}
+
+check_primary_health() {
+  # SSH to primary and run healthcheck
+  ssh -o ConnectTimeout=5 -o BatchMode=yes "${PRIMARY_HOST}" \
+    "cd ${PROJECT_DIR} && bash ha-setup/healthcheck.sh" &>/dev/null
+  return $?
+}
+
+is_secondary_active() {
+  docker ps --filter "name=trading-bot-v4" --filter "status=running" | grep -q "trading-bot-v4"
+  return $?
+}
+
+start_secondary() {
+  log "🚀 Starting secondary (failover activation)..."
+  cd "${PROJECT_DIR}"
+  docker compose up -d trading-bot
+  sleep 10
+  
+  if docker ps --filter "name=trading-bot-v4" --filter "status=running" | grep -q "trading-bot-v4"; then
+    log "✅ Secondary activated successfully"
+    telegram_notify "Secondary server activated (primary failed ${MAX_FAILURES} health checks)"
+    return 0
+  else
+    log "❌ Failed to start secondary"
+    telegram_notify "⚠️ CRITICAL: Secondary failed to start after primary failure!"
+    return 1
+  fi
+}
+
+stop_secondary() {
+  log "🛑 Stopping secondary (primary recovered)..."
+  cd "${PROJECT_DIR}"
+  docker compose stop trading-bot
+  
+  if ! is_secondary_active; then
+    log "✅ Secondary stopped successfully"
+    telegram_notify "Primary server recovered, secondary deactivated"
+    return 0
+  else
+    log "❌ Failed to stop secondary"
+    return 1
+  fi
+}
+
+main_loop() {
+  log "🎯 HA Failover Controller started (Secondary mode)"
+  log "Monitoring primary: ${PRIMARY_HOST}"
+  log "Check interval: ${CHECK_INTERVAL}s, Max failures: ${MAX_FAILURES}"
+  
+  while true; do
+    if check_primary_health; then
+      # Primary is healthy
+      if [ $FAILURE_COUNT -gt 0 ]; then
+        log "✅ Primary recovered (was at ${FAILURE_COUNT} failures)"
+        FAILURE_COUNT=0
+      fi
+      
+      # If secondary is running, stop it (primary should be active)
+      if is_secondary_active; then
+        log "⚠️  Secondary is active but primary is healthy - stopping secondary"
+        stop_secondary
+      fi
+      
+    else
+      # Primary is unhealthy
+      FAILURE_COUNT=$((FAILURE_COUNT + 1))
+      log "❌ Primary health check failed (${FAILURE_COUNT}/${MAX_FAILURES})"
+      
+      if [ $FAILURE_COUNT -ge $MAX_FAILURES ]; then
+        if ! is_secondary_active; then
+          log "🚨 PRIMARY FAILED - Activating secondary..."
+          telegram_notify "Primary server failed ${MAX_FAILURES} consecutive health checks. Activating secondary..."
+          start_secondary
+        else
+          log "ℹ️  Secondary already active (primary still failing)"
+        fi
+      fi
+    fi
+    
+    sleep $CHECK_INTERVAL
+  done
+}
+
+# Ensure running as root (needs docker access)
+if [ "$EUID" -ne 0 ]; then
+  log "❌ Must run as root (needs docker and SSH access)"
+  exit 1
+fi
+
+main_loop
--- a/ha-setup/healthcheck.sh
+++ b/ha-setup/healthcheck.sh
@@ -0,0 +1,90 @@
+#!/bin/bash
+#
+# Trading Bot Health Check Script
+# Checks if trading bot is healthy and responding
+#
+# Usage: ./healthcheck.sh
+# Exit codes: 0 = healthy, 1 = unhealthy
+
+set -eu
+
+TRADING_BOT_HOST="${TRADING_BOT_HOST:-localhost:3001}"
+MAX_FAILURES="${MAX_FAILURES:-3}"
+CHECK_INTERVAL="${CHECK_INTERVAL:-10}"
+
+# Source API key from .env
+if [ -f "/home/icke/traderv4/.env" ]; then
+  export $(grep "^API_SECRET_KEY=" /home/icke/traderv4/.env | xargs)
+fi
+
+log() {
+  echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"
+}
+
+# Check if container is running
+check_container() {
+  docker ps --filter "name=trading-bot-v4" --filter "status=running" | grep -q "trading-bot-v4"
+  return $?
+}
+
+# Check if API is responding
+check_api() {
+  local response
+  response=$(curl -s -f -m 5 \
+    -H "Authorization: Bearer ${API_SECRET_KEY}" \
+    "http://${TRADING_BOT_HOST}/api/drift/account-summary" 2>&1)
+  
+  if [ $? -eq 0 ] && echo "$response" | grep -q '"success":true'; then
+    return 0
+  else
+    return 1
+  fi
+}
+
+# Check if Position Manager is monitoring (if positions exist)
+check_position_manager() {
+  local logs
+  logs=$(docker logs --tail=50 trading-bot-v4 2>&1)
+  
+  # Check for recent monitoring activity (within last 30 seconds)
+  if echo "$logs" | grep -q "🔍 Monitoring"; then
+    return 0
+  fi
+  
+  # If no monitoring logs but no positions open, that's OK
+  if echo "$logs" | grep -q "No positions to monitor"; then
+    return 0
+  fi
+  
+  # If container just started (less than 1 min), give it time
+  if docker inspect trading-bot-v4 --format='{{.State.StartedAt}}' | grep -q "$(date -u +%Y-%m-%dT%H)"; then
+    return 0
+  fi
+  
+  return 1
+}
+
+# Main health check
+main() {
+  log "Starting health check..."
+  
+  if ! check_container; then
+    log "❌ UNHEALTHY: Container not running"
+    exit 1
+  fi
+  
+  if ! check_api; then
+    log "❌ UNHEALTHY: API not responding"
+    exit 1
+  fi
+  
+  if ! check_position_manager; then
+    log "⚠️  WARNING: Position Manager may not be monitoring (check logs)"
+    # Don't fail on this - API working is primary health indicator
+  fi
+  
+  log "✅ HEALTHY: All checks passed"
+  exit 0
+}
+
+main "$@"
--- a/ha-setup/setup-db-replication.sh
+++ b/ha-setup/setup-db-replication.sh
@@ -0,0 +1,104 @@
+#!/bin/bash
+#
+# Database Replication Setup for PostgreSQL Streaming Replication
+# Run on PRIMARY server first, then on SECONDARY
+#
+
+set -eu
+
+MODE="${1:-help}"
+PRIMARY_IP="192.168.1.100"  # Update with primary server IP
+SECONDARY_IP="72.62.39.24"
+POSTGRES_PASSWORD="your_postgres_password"  # Update from .env
+REPLICATION_USER="replicator"
+REPLICATION_PASSWORD="your_replication_password"  # Generate strong password
+
+log() {
+  echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*"
+}
+
+setup_primary() {
+  log "🔧 Setting up PRIMARY database for replication..."
+  
+  # Create replication user
+  docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 <<-EOF
+    -- Create replication user
+    CREATE USER ${REPLICATION_USER} WITH REPLICATION ENCRYPTED PASSWORD '${REPLICATION_PASSWORD}';
+    
+    -- Grant necessary privileges
+    GRANT CONNECT ON DATABASE trading_bot_v4 TO ${REPLICATION_USER};
+EOF
+
+  log "✅ Replication user created"
+  
+  # Configure pg_hba.conf for replication
+  docker exec trading-bot-postgres bash -c "cat >> /var/lib/postgresql/data/pg_hba.conf <<EOF
+# Replication connection from secondary
+host    replication     ${REPLICATION_USER}     ${SECONDARY_IP}/32     md5
+EOF"
+
+  # Configure postgresql.conf for replication
+  docker exec trading-bot-postgres bash -c "cat >> /var/lib/postgresql/data/postgresql.conf <<EOF
+# Replication settings
+wal_level = replica
+max_wal_senders = 3
+wal_keep_size = 64
+EOF"
+
+  log "✅ PostgreSQL configured for replication"
+  log "⚠️  Restart PostgreSQL container: docker restart trading-bot-postgres"
+}
+
+setup_secondary() {
+  log "🔧 Setting up SECONDARY database (replica)..."
+  
+  # Stop secondary postgres if running
+  docker compose stop trading-bot-postgres || true
+  
+  # Remove old data
+  log "⚠️  Removing old PostgreSQL data on secondary..."
+  docker volume rm traderv4_postgres-data || true
+  
+  # Create base backup from primary
+  log "📦 Creating base backup from primary..."
+  docker run --rm \
+    -e PGPASSWORD="${REPLICATION_PASSWORD}" \
+    postgres:16-alpine \
+    pg_basebackup -h ${PRIMARY_IP} -U ${REPLICATION_USER} -D /backup -Fp -Xs -P -R
+  
+  # Restore to secondary
+  # ... (needs volume mounting, complex - see simplified approach below)
+  
+  log "✅ Secondary database configured as replica"
+}
+
+simplified_sync() {
+  log "📋 Simplified approach: Database sync via pg_dump/restore"
+  log "Run on PRIMARY to create backup:"
+  echo "  docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > /tmp/trading_bot_backup.sql"
+  echo "  rsync -avz /tmp/trading_bot_backup.sql ${SECONDARY_IP}:/tmp/"
+  log "Run on SECONDARY to restore:"
+  echo "  docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/trading_bot_backup.sql"
+}
+
+case "$MODE" in
+  primary)
+    setup_primary
+    ;;
+  secondary)
+    setup_secondary
+    ;;
+  sync)
+    simplified_sync
+    ;;
+  help|*)
+    echo "Usage: $0 {primary|secondary|sync}"
+    echo ""
+    echo "primary   - Configure primary DB for replication"
+    echo "secondary - Configure secondary DB as replica"
+    echo "sync      - Show commands for manual DB sync"
+    echo ""
+    echo "⚠️  IMPORTANT: Update IP addresses and passwords in script first!"
+    exit 1
+    ;;
+esac
--- a/ha-setup/sync-db-daily.sh
+++ b/ha-setup/sync-db-daily.sh
@@ -0,0 +1,76 @@
+#!/bin/bash
+#
+# Daily Database Sync from Primary to Secondary
+# Run on PRIMARY server via cron
+#
+
+set -eu
+
+PRIMARY_HOST="localhost"
+SECONDARY_HOST="root@72.62.39.24"
+PROJECT_DIR="/home/icke/traderv4"
+BACKUP_FILE="/tmp/trading_bot_backup_$(date +%Y%m%d_%H%M%S).sql"
+LOG_FILE="/var/log/trading-bot-db-sync.log"
+
+log() {
+  echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG_FILE"
+}
+
+# Telegram notification
+telegram_notify() {
+  local message="$1"
+  if [ -f "${PROJECT_DIR}/.env" ]; then
+    source "${PROJECT_DIR}/.env"
+    curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \
+      -d chat_id="${TELEGRAM_CHAT_ID}" \
+      -d text="📊 DB Sync: ${message}" \
+      -d parse_mode="HTML" > /dev/null
+  fi
+}
+
+main() {
+  log "🔄 Starting daily database backup..."
+  
+  # Create backup
+  if docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > "$BACKUP_FILE" 2>>"$LOG_FILE"; then
+    local size=$(du -h "$BACKUP_FILE" | cut -f1)
+    log "✅ Backup created: $BACKUP_FILE ($size)"
+  else
+    log "❌ Backup failed!"
+    telegram_notify "⚠️ Database backup failed on primary"
+    exit 1
+  fi
+  
+  # Transfer to secondary
+  log "📤 Transferring to secondary..."
+  if rsync -avz --compress "$BACKUP_FILE" "${SECONDARY_HOST}:/tmp/" >> "$LOG_FILE" 2>&1; then
+    log "✅ Transfer complete"
+  else
+    log "❌ Transfer failed!"
+    telegram_notify "⚠️ Database transfer to secondary failed"
+    exit 1
+  fi
+  
+  # Restore on secondary
+  log "📥 Restoring on secondary..."
+  if ssh "${SECONDARY_HOST}" "docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/$(basename $BACKUP_FILE)" >> "$LOG_FILE" 2>&1; then
+    log "✅ Restore complete on secondary"
+  else
+    log "❌ Restore failed on secondary!"
+    telegram_notify "⚠️ Database restore failed on secondary"
+    exit 1
+  fi
+  
+  # Cleanup old backups (keep last 7 days)
+  find /tmp -name "trading_bot_backup_*.sql" -mtime +7 -delete
+  ssh "${SECONDARY_HOST}" "find /tmp -name 'trading_bot_backup_*.sql' -mtime +7 -delete"
+  
+  log "🎉 Daily sync completed successfully"
+  
+  # Only notify on first sync of the day or if there were issues
+  if [ "$(date +%H)" -eq 2 ]; then
+    telegram_notify "✅ Daily database sync completed"
+  fi
+}
+
+main "$@"
--- a/ha-setup/trading-bot-ha.service
+++ b/ha-setup/trading-bot-ha.service
@@ -0,0 +1,24 @@
+[Unit]
+Description=Trading Bot v4 HA Failover Controller
+After=network.target docker.service
+Requires=docker.service
+
+[Service]
+Type=simple
+User=root
+WorkingDirectory=/home/icke/traderv4
+ExecStart=/bin/bash /home/icke/traderv4/ha-setup/failover-controller.sh
+Restart=always
+RestartSec=10
+StandardOutput=journal
+StandardError=journal
+
+# Logging
+SyslogIdentifier=trading-bot-ha
+
+# Security
+PrivateTmp=yes
+NoNewPrivileges=yes
+
+[Install]
+WantedBy=multi-user.target