diff --git a/HA_SETUP_ROADMAP.md b/HA_SETUP_ROADMAP.md new file mode 100644 index 0000000..3b8c4cd --- /dev/null +++ b/HA_SETUP_ROADMAP.md @@ -0,0 +1,218 @@ +# High Availability Setup Roadmap + +**Status:** đŸŽ¯ FUTURE +**Priority:** Medium +**Estimated Effort:** 2-3 days full implementation +**Dependencies:** Stable production system, consistent profitability + +--- + +## Current State (Nov 19, 2025) + +✅ **Warm Standby Ready:** +- Secondary server at `root@72.62.39.24` with rsync'd code +- Can manually failover in 10-15 minutes if primary fails +- Single-server operation prevents duplicate trades + +❌ **Not Automated:** +- Manual DNS/webhook updates required +- No automatic failover detection +- No reverse proxy/load balancer setup + +--- + +## Phase 1: Warm Standby Maintenance (CURRENT) + +**Goal:** Keep secondary server ready for manual failover + +**Tasks:** +- [ ] Daily rsync from primary to secondary (automated) +- [ ] Weekly startup test on secondary (verify working) +- [ ] Document manual failover procedure +- [ ] Test database restore on secondary + +**Acceptance Criteria:** +- Can start secondary and verify trading bot works within 5 minutes +- Secondary has code/config updated within 24 hours of primary +- Clear runbook for emergency failover + +**Timeline:** 1 day setup + ongoing maintenance + +--- + +## Phase 2: Database Replication (NEXT) + +**Goal:** Zero data loss on failover + +**Tasks:** +- [ ] Setup PostgreSQL streaming replication +- [ ] Configure replication user and permissions +- [ ] Test replica lag monitoring +- [ ] Automate replica promotion on failover + +**Acceptance Criteria:** +- Secondary database max 5 seconds behind primary +- Trade history preserved during failover +- Automatic replica promotion script tested + +**Timeline:** 2-3 days + +--- + +## Phase 3: Health Monitoring & Alerts (NEXT) + +**Goal:** Know when primary fails, prepare for manual intervention + +**Tasks:** +- [ ] Deploy healthcheck script on both servers +- [ ] Setup monitoring dashboard (Grafana/simple webpage) +- [ ] Telegram alerts for primary failures +- [ ] Create failover decision flowchart + +**Acceptance Criteria:** +- Telegram alert within 60 seconds of primary failure +- Dashboard shows primary/secondary status +- Clear steps for manual failover documented + +**Timeline:** 1-2 days + +--- + +## Phase 4: Reverse Proxy + Floating IP (FUTURE) + +**Goal:** Automatic traffic routing to active server + +**Options:** + +### Option A: Floating IP (Simplest) +- Use cloud provider's floating IP (DigitalOcean, AWS EIP) +- IP automatically moves between servers +- Requires: Cloud infrastructure, not bare metal + +### Option B: DNS-based Failover +- Use DNS provider with health checks (Cloudflare, Route53) +- Automatic DNS updates on failure +- 1-5 minute TTL delay for propagation + +### Option C: Reverse Proxy +- HAProxy or nginx in front of both servers +- Health checks route to active server +- Requires: Third server for proxy (single point of failure) + +**Tasks:** +- [ ] Evaluate infrastructure options (cloud vs bare metal) +- [ ] Choose failover mechanism (Floating IP vs DNS vs Proxy) +- [ ] Implement automatic traffic routing +- [ ] Test failover scenarios (primary crash, network partition) + +**Acceptance Criteria:** +- TradingView webhooks automatically route to active server +- Failover completes within 2 minutes with zero manual intervention +- No duplicate trades during failover window +- n8n workflows continue without reconfiguration + +**Timeline:** 3-5 days (depends on option chosen) + +--- + +## Phase 5: Automated Failover Controller (FUTURE) + +**Goal:** Fully autonomous HA system + +**Tasks:** +- [ ] Deploy failover controller on secondary +- [ ] Configure automatic container startup on failure detection +- [ ] Implement split-brain prevention +- [ ] Test recovery scenarios (primary comes back online) +- [ ] Setup automatic database sync on recovery + +**Acceptance Criteria:** +- Secondary automatically activates within 60 seconds of primary failure +- Primary automatically resumes when recovered +- No manual intervention required for 99% of failures +- Telegram notifications for all state changes + +**Timeline:** 2-3 days + +--- + +## Phase 6: Geographic Redundancy (DISTANT FUTURE) + +**Goal:** Multi-region deployment for global reliability + +**Considerations:** +- Secondary in different geographic region (US vs EU) +- Protects against regional outages +- Lower latency for global users +- Requires: More complex routing, higher costs + +**Timeline:** 1+ weeks + +--- + +## Decision Gates + +**Proceed to Phase 2+ when:** +- Trading system profitable for 3+ consecutive months +- Capital > $10,000 (downtime = significant money loss) +- User frequently unavailable (travel, sleep schedule, etc.) +- Primary server has experienced 2+ unplanned outages + +**Stay in Phase 1 when:** +- System still in testing/optimization phase +- User can manually intervene within 30 minutes most of the time +- Capital < $5,000 (manual failover acceptable) +- Primary server stable (99%+ uptime) + +--- + +## Cost-Benefit Analysis + +### Current State (Warm Standby) +- **Cost:** ~$10-20/month for secondary server +- **Benefit:** 10-15 min manual failover vs hours of setup from scratch +- **ROI:** Good - cheap insurance + +### Full HA (All Phases) +- **Cost:** ~$50-100/month (servers, floating IP, monitoring) +- **Time:** 1-2 weeks of development +- **Benefit:** 99.9% uptime, automatic failover, peace of mind +- **ROI:** Only worth it when trading capital justifies the cost + +### Break-Even Point +- If trading $10k+ capital at 15% monthly returns = $1,500/month +- 1 hour downtime = ~$2 lost opportunity +- 24 hour downtime = ~$50 lost + potential missed exit = $100-500 risk +- HA pays for itself after 1-2 major outages + +--- + +## Current Recommendation (Nov 19, 2025) + +**Stay in Phase 1** (Warm Standby) because: +- Capital still under $1,000 +- System in active optimization (indicator testing, quality tuning) +- User available for manual intervention most of the time +- Primary server stable + +**Revisit in Q1 2026** when: +- Capital reaches $5,000+ (Phase 2 target) +- System proven profitable over 3+ months +- Trading strategy stabilized (v8 indicator validated) + +--- + +## Related Files + +- `/home/icke/traderv4/ha-setup/` - HA scripts (created but not deployed) +- `TRADING_GOALS.md` - Financial roadmap (HA aligns with Phase 4-5) +- `OPTIMIZATION_MASTER_ROADMAP.md` - System improvements (HA is infrastructure) + +--- + +## Notes + +- **Manual failover is acceptable for now** - 10-15 min downtime won't cause financial loss at current scale +- **Focus on profitability first** - HA is luxury when system isn't making consistent money yet +- **Complexity vs benefit** - Full HA adds operational overhead that may not be worth it yet +- **Revisit quarterly** - As capital grows, HA becomes more important diff --git a/ha-setup/README.md b/ha-setup/README.md new file mode 100644 index 0000000..f92b76c --- /dev/null +++ b/ha-setup/README.md @@ -0,0 +1,298 @@ +# High Availability Setup for Trading Bot v4 + +## Architecture: Active-Passive Failover + +**Primary Server (Active):** Runs trading bot 24/7 +**Secondary Server (Passive):** Monitors primary, takes over on failure + +### Why Active-Passive (Not Active-Active)? +- **Prevents duplicate trades** - CRITICAL for financial system +- **Single source of truth** - One Position Manager tracking state +- **No split-brain scenarios** - Only one bot executes trades +- **Database consistency** - No conflicting writes + +--- + +## Setup Instructions + +### 1. Prerequisites + +**Primary Server:** `root@192.168.1.100` (update in scripts) +**Secondary Server:** `root@72.62.39.24` + +Both servers need: +- Docker & Docker Compose installed +- Trading bot project at `/home/icke/traderv4` +- Same `.env` file (especially DRIFT_WALLET_PRIVATE_KEY) +- Same n8n workflows configured + +### 2. Initial Sync (Already Done via rsync ✅) + +```bash +# From primary server +rsync -avz --exclude 'node_modules' --exclude '.next' \ + /home/icke/traderv4/ root@72.62.39.24:/home/icke/traderv4/ +``` + +### 3. Database Synchronization + +**Option A: Manual Sync (Simpler, Recommended for Start)** + +On primary: +```bash +docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > /tmp/trading_bot_backup.sql +rsync -avz /tmp/trading_bot_backup.sql root@72.62.39.24:/tmp/ +``` + +On secondary: +```bash +docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/trading_bot_backup.sql +``` + +Run this daily via cron on primary: +```bash +0 2 * * * /home/icke/traderv4/ha-setup/sync-db-daily.sh +``` + +**Option B: Streaming Replication (Advanced)** +```bash +# On primary +bash ha-setup/setup-db-replication.sh primary + +# On secondary +bash ha-setup/setup-db-replication.sh secondary +``` + +### 4. Setup Health Monitoring + +Make scripts executable: +```bash +chmod +x ha-setup/*.sh +``` + +**Test healthcheck on both servers:** +```bash +bash ha-setup/healthcheck.sh +# Should output: ✅ HEALTHY: All checks passed +``` + +### 5. Start Failover Controller (SECONDARY ONLY) + +**Edit configuration first:** +```bash +nano ha-setup/failover-controller.sh +# Update PRIMARY_HOST with actual IP +# Update SECONDARY_HOST if needed +``` + +**Run as systemd service:** +```bash +sudo cp ha-setup/trading-bot-ha.service /etc/systemd/system/ +sudo systemctl daemon-reload +sudo systemctl enable trading-bot-ha +sudo systemctl start trading-bot-ha +``` + +**Check status:** +```bash +sudo systemctl status trading-bot-ha +sudo journalctl -u trading-bot-ha -f +``` + +### 6. SSH Key Setup (Password-less Auth) + +Secondary needs SSH access to primary for health checks: + +```bash +# On secondary +ssh-keygen -t ed25519 -f /root/.ssh/trading_bot_ha +ssh-copy-id -i /root/.ssh/trading_bot_ha root@192.168.1.100 + +# Test connection +ssh root@192.168.1.100 "docker ps | grep trading-bot" +``` + +--- + +## How It Works + +### Normal Operation (Primary Active) + +1. **Primary:** Trading bot running, executing trades +2. **Secondary:** Failover controller checks primary every 15s +3. **Secondary:** Bot container STOPPED (passive standby) + +### Failover Scenario + +1. **Primary fails** (server down, docker crash, API unresponsive) +2. **Secondary detects** 3 consecutive failed health checks (45s) +3. **Telegram alert sent:** "🚨 HA FAILOVER: Primary failed, activating secondary" +4. **Secondary starts** trading bot container +5. **Trading continues** on secondary with same wallet/config + +### Recovery Scenario + +1. **Primary recovers** (you fix it, restart, etc.) +2. **Secondary detects** primary is healthy again +3. **Secondary stops** its trading bot (returns to standby) +4. **Telegram alert:** "Primary recovered, secondary deactivated" +5. **Primary resumes** as active node + +--- + +## Monitoring & Maintenance + +### Check HA Status + +**On secondary:** +```bash +# View failover controller logs +sudo journalctl -u trading-bot-ha -f --lines=50 + +# Check if secondary is active +docker ps | grep trading-bot-v4 +``` + +**On primary:** +```bash +# Run healthcheck manually +bash ha-setup/healthcheck.sh + +# Check container status +docker ps | grep trading-bot-v4 +``` + +### Manual Failover Testing + +**Simulate primary failure:** +```bash +# On primary, stop trading bot +docker compose stop trading-bot + +# Watch secondary logs - should activate within 45s +# On secondary +sudo journalctl -u trading-bot-ha -f +``` + +**Restore primary:** +```bash +# On primary, restart trading bot +docker compose up -d trading-bot + +# Watch secondary - should deactivate within 15s +``` + +### Database Sync Schedule + +**Daily sync from primary to secondary:** + +On primary, add to crontab: +```bash +crontab -e +# Add: +0 2 * * * /home/icke/traderv4/ha-setup/sync-db-daily.sh >> /var/log/trading-bot-db-sync.log 2>&1 +``` + +**Before failover events:** Secondary uses last synced DB state (max 24h old trade history) +**After failover:** Secondary continues with current state, syncs back to primary when recovered + +--- + +## Important Notes + +### Financial Safety + +- **NEVER run both servers actively** - would cause duplicate trades and wallet conflicts +- **Failover controller ensures** only one active at a time +- **Same wallet key** required on both servers +- **Same n8n webhook endpoint** - update TradingView alerts if needed + +### Database Consistency + +- **Daily sync:** Keeps secondary within 24h of primary +- **Trade history:** May have small gap after failover (acceptable) +- **Position Manager:** Rebuilds state from Drift Protocol on startup +- **No financial loss:** Drift Protocol is source of truth for positions + +### Network Requirements + +- **Secondary → Primary:** SSH access (port 22) for health checks +- **Both → Internet:** For Drift Protocol, Telegram, n8n webhooks +- **n8n:** Can run on both or centralized (needs webhook routing) + +### Testing Recommendations + +1. **Week 1:** Run without failover, just monitor health checks +2. **Week 2:** Test manual failover (stop primary, verify secondary takes over) +3. **Week 3:** Test recovery (restart primary, verify secondary stops) +4. **Week 4:** Enable automatic failover for production + +--- + +## Troubleshooting + +### Secondary Won't Start After Failover + +```bash +# Check logs +docker logs trading-bot-v4 + +# Check .env file exists +ls -la /home/icke/traderv4/.env + +# Check Drift initialization +docker logs trading-bot-v4 | grep "Drift" +``` + +### Split-Brain (Both Servers Active) + +**EMERGENCY - Stop both immediately:** +```bash +# On both servers +docker compose stop trading-bot +``` + +**Then restart only primary:** +```bash +# On primary only +docker compose up -d trading-bot +``` + +**Check Drift positions:** +```bash +curl -s http://localhost:3001/api/trading/positions \ + -H "Authorization: Bearer ${API_SECRET_KEY}" | jq . +``` + +### Health Check False Positives + +Adjust thresholds in `failover-controller.sh`: +```bash +CHECK_INTERVAL=30 # Slower checks (reduce network load) +MAX_FAILURES=5 # More tolerant (reduce false failovers) +``` + +--- + +## Cost Analysis + +**Primary Server:** Always running (existing cost) +**Secondary Server:** Always running, but mostly idle + +**Benefits:** +- **99.9% uptime** vs 95% single server +- **~4.5 hours/year** max downtime (failover time) +- **Financial protection** - no missed trades during outages +- **Peace of mind** - sleep without worrying about server crashes + +**Worth it?** YES - For a financial system, redundancy is essential. + +--- + +## Future Enhancements + +1. **Geographic redundancy:** Secondary in different datacenter/region +2. **Load balancer:** Route n8n webhooks to active server automatically +3. **Database streaming replication:** Real-time sync (0 data loss) +4. **Multi-region:** Three servers (US, EU, Asia) for global coverage +5. **Health dashboard:** Web UI showing HA status and metrics diff --git a/ha-setup/failover-controller.sh b/ha-setup/failover-controller.sh new file mode 100644 index 0000000..f4d0e42 --- /dev/null +++ b/ha-setup/failover-controller.sh @@ -0,0 +1,126 @@ +#!/bin/bash +# +# HA Failover Controller +# Monitors primary server and activates secondary on failure +# +# IMPORTANT: Run this ONLY on SECONDARY server +# Primary should always be active unless failed +# + +set -eu + +PRIMARY_HOST="root@192.168.1.100" # Update with primary IP +SECONDARY_HOST="root@72.62.39.24" +CHECK_INTERVAL=15 # seconds between checks +MAX_FAILURES=3 # failures before failover + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_DIR="/home/icke/traderv4" +FAILURE_COUNT=0 + +log() { + echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" | tee -a /var/log/trading-bot-ha.log +} + +telegram_notify() { + local message="$1" + # Use the Telegram bot to send notification + if [ -f "${PROJECT_DIR}/.env" ]; then + source "${PROJECT_DIR}/.env" + curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \ + -d chat_id="${TELEGRAM_CHAT_ID}" \ + -d text="🚨 HA FAILOVER: ${message}" \ + -d parse_mode="HTML" > /dev/null + fi +} + +check_primary_health() { + # SSH to primary and run healthcheck + ssh -o ConnectTimeout=5 -o BatchMode=yes "${PRIMARY_HOST}" \ + "cd ${PROJECT_DIR} && bash ha-setup/healthcheck.sh" &>/dev/null + return $? +} + +is_secondary_active() { + docker ps --filter "name=trading-bot-v4" --filter "status=running" | grep -q "trading-bot-v4" + return $? +} + +start_secondary() { + log "🚀 Starting secondary (failover activation)..." + cd "${PROJECT_DIR}" + docker compose up -d trading-bot + sleep 10 + + if docker ps --filter "name=trading-bot-v4" --filter "status=running" | grep -q "trading-bot-v4"; then + log "✅ Secondary activated successfully" + telegram_notify "Secondary server activated (primary failed ${MAX_FAILURES} health checks)" + return 0 + else + log "❌ Failed to start secondary" + telegram_notify "âš ī¸ CRITICAL: Secondary failed to start after primary failure!" + return 1 + fi +} + +stop_secondary() { + log "🛑 Stopping secondary (primary recovered)..." + cd "${PROJECT_DIR}" + docker compose stop trading-bot + + if ! is_secondary_active; then + log "✅ Secondary stopped successfully" + telegram_notify "Primary server recovered, secondary deactivated" + return 0 + else + log "❌ Failed to stop secondary" + return 1 + fi +} + +main_loop() { + log "đŸŽ¯ HA Failover Controller started (Secondary mode)" + log "Monitoring primary: ${PRIMARY_HOST}" + log "Check interval: ${CHECK_INTERVAL}s, Max failures: ${MAX_FAILURES}" + + while true; do + if check_primary_health; then + # Primary is healthy + if [ $FAILURE_COUNT -gt 0 ]; then + log "✅ Primary recovered (was at ${FAILURE_COUNT} failures)" + FAILURE_COUNT=0 + fi + + # If secondary is running, stop it (primary should be active) + if is_secondary_active; then + log "âš ī¸ Secondary is active but primary is healthy - stopping secondary" + stop_secondary + fi + + else + # Primary is unhealthy + FAILURE_COUNT=$((FAILURE_COUNT + 1)) + log "❌ Primary health check failed (${FAILURE_COUNT}/${MAX_FAILURES})" + + if [ $FAILURE_COUNT -ge $MAX_FAILURES ]; then + if ! is_secondary_active; then + log "🚨 PRIMARY FAILED - Activating secondary..." + telegram_notify "Primary server failed ${MAX_FAILURES} consecutive health checks. Activating secondary..." + start_secondary + else + log "â„šī¸ Secondary already active (primary still failing)" + fi + fi + fi + + sleep $CHECK_INTERVAL + done +} + +# Ensure running as root (needs docker access) +if [ "$EUID" -ne 0 ]; then + log "❌ Must run as root (needs docker and SSH access)" + exit 1 +fi + +main_loop diff --git a/ha-setup/healthcheck.sh b/ha-setup/healthcheck.sh new file mode 100644 index 0000000..1135960 --- /dev/null +++ b/ha-setup/healthcheck.sh @@ -0,0 +1,90 @@ +#!/bin/bash +# +# Trading Bot Health Check Script +# Checks if trading bot is healthy and responding +# +# Usage: ./healthcheck.sh +# Exit codes: 0 = healthy, 1 = unhealthy + +set -eu + +TRADING_BOT_HOST="${TRADING_BOT_HOST:-localhost:3001}" +MAX_FAILURES="${MAX_FAILURES:-3}" +CHECK_INTERVAL="${CHECK_INTERVAL:-10}" + +# Source API key from .env +if [ -f "/home/icke/traderv4/.env" ]; then + export $(grep "^API_SECRET_KEY=" /home/icke/traderv4/.env | xargs) +fi + +log() { + echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" +} + +# Check if container is running +check_container() { + docker ps --filter "name=trading-bot-v4" --filter "status=running" | grep -q "trading-bot-v4" + return $? +} + +# Check if API is responding +check_api() { + local response + response=$(curl -s -f -m 5 \ + -H "Authorization: Bearer ${API_SECRET_KEY}" \ + "http://${TRADING_BOT_HOST}/api/drift/account-summary" 2>&1) + + if [ $? -eq 0 ] && echo "$response" | grep -q '"success":true'; then + return 0 + else + return 1 + fi +} + +# Check if Position Manager is monitoring (if positions exist) +check_position_manager() { + local logs + logs=$(docker logs --tail=50 trading-bot-v4 2>&1) + + # Check for recent monitoring activity (within last 30 seconds) + if echo "$logs" | grep -q "🔍 Monitoring"; then + return 0 + fi + + # If no monitoring logs but no positions open, that's OK + if echo "$logs" | grep -q "No positions to monitor"; then + return 0 + fi + + # If container just started (less than 1 min), give it time + if docker inspect trading-bot-v4 --format='{{.State.StartedAt}}' | grep -q "$(date -u +%Y-%m-%dT%H)"; then + return 0 + fi + + return 1 +} + +# Main health check +main() { + log "Starting health check..." + + if ! check_container; then + log "❌ UNHEALTHY: Container not running" + exit 1 + fi + + if ! check_api; then + log "❌ UNHEALTHY: API not responding" + exit 1 + fi + + if ! check_position_manager; then + log "âš ī¸ WARNING: Position Manager may not be monitoring (check logs)" + # Don't fail on this - API working is primary health indicator + fi + + log "✅ HEALTHY: All checks passed" + exit 0 +} + +main "$@" diff --git a/ha-setup/setup-db-replication.sh b/ha-setup/setup-db-replication.sh new file mode 100644 index 0000000..b3cb6dc --- /dev/null +++ b/ha-setup/setup-db-replication.sh @@ -0,0 +1,104 @@ +#!/bin/bash +# +# Database Replication Setup for PostgreSQL Streaming Replication +# Run on PRIMARY server first, then on SECONDARY +# + +set -eu + +MODE="${1:-help}" +PRIMARY_IP="192.168.1.100" # Update with primary server IP +SECONDARY_IP="72.62.39.24" +POSTGRES_PASSWORD="your_postgres_password" # Update from .env +REPLICATION_USER="replicator" +REPLICATION_PASSWORD="your_replication_password" # Generate strong password + +log() { + echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" +} + +setup_primary() { + log "🔧 Setting up PRIMARY database for replication..." + + # Create replication user + docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 <<-EOF + -- Create replication user + CREATE USER ${REPLICATION_USER} WITH REPLICATION ENCRYPTED PASSWORD '${REPLICATION_PASSWORD}'; + + -- Grant necessary privileges + GRANT CONNECT ON DATABASE trading_bot_v4 TO ${REPLICATION_USER}; +EOF + + log "✅ Replication user created" + + # Configure pg_hba.conf for replication + docker exec trading-bot-postgres bash -c "cat >> /var/lib/postgresql/data/pg_hba.conf <> /var/lib/postgresql/data/postgresql.conf < /tmp/trading_bot_backup.sql" + echo " rsync -avz /tmp/trading_bot_backup.sql ${SECONDARY_IP}:/tmp/" + log "Run on SECONDARY to restore:" + echo " docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/trading_bot_backup.sql" +} + +case "$MODE" in + primary) + setup_primary + ;; + secondary) + setup_secondary + ;; + sync) + simplified_sync + ;; + help|*) + echo "Usage: $0 {primary|secondary|sync}" + echo "" + echo "primary - Configure primary DB for replication" + echo "secondary - Configure secondary DB as replica" + echo "sync - Show commands for manual DB sync" + echo "" + echo "âš ī¸ IMPORTANT: Update IP addresses and passwords in script first!" + exit 1 + ;; +esac diff --git a/ha-setup/sync-db-daily.sh b/ha-setup/sync-db-daily.sh new file mode 100644 index 0000000..3bb93f8 --- /dev/null +++ b/ha-setup/sync-db-daily.sh @@ -0,0 +1,76 @@ +#!/bin/bash +# +# Daily Database Sync from Primary to Secondary +# Run on PRIMARY server via cron +# + +set -eu + +PRIMARY_HOST="localhost" +SECONDARY_HOST="root@72.62.39.24" +PROJECT_DIR="/home/icke/traderv4" +BACKUP_FILE="/tmp/trading_bot_backup_$(date +%Y%m%d_%H%M%S).sql" +LOG_FILE="/var/log/trading-bot-db-sync.log" + +log() { + echo "[$(date +'%Y-%m-%d %H:%M:%S')] $*" | tee -a "$LOG_FILE" +} + +# Telegram notification +telegram_notify() { + local message="$1" + if [ -f "${PROJECT_DIR}/.env" ]; then + source "${PROJECT_DIR}/.env" + curl -s -X POST "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/sendMessage" \ + -d chat_id="${TELEGRAM_CHAT_ID}" \ + -d text="📊 DB Sync: ${message}" \ + -d parse_mode="HTML" > /dev/null + fi +} + +main() { + log "🔄 Starting daily database backup..." + + # Create backup + if docker exec trading-bot-postgres pg_dump -U postgres trading_bot_v4 > "$BACKUP_FILE" 2>>"$LOG_FILE"; then + local size=$(du -h "$BACKUP_FILE" | cut -f1) + log "✅ Backup created: $BACKUP_FILE ($size)" + else + log "❌ Backup failed!" + telegram_notify "âš ī¸ Database backup failed on primary" + exit 1 + fi + + # Transfer to secondary + log "📤 Transferring to secondary..." + if rsync -avz --compress "$BACKUP_FILE" "${SECONDARY_HOST}:/tmp/" >> "$LOG_FILE" 2>&1; then + log "✅ Transfer complete" + else + log "❌ Transfer failed!" + telegram_notify "âš ī¸ Database transfer to secondary failed" + exit 1 + fi + + # Restore on secondary + log "đŸ“Ĩ Restoring on secondary..." + if ssh "${SECONDARY_HOST}" "docker exec -i trading-bot-postgres psql -U postgres trading_bot_v4 < /tmp/$(basename $BACKUP_FILE)" >> "$LOG_FILE" 2>&1; then + log "✅ Restore complete on secondary" + else + log "❌ Restore failed on secondary!" + telegram_notify "âš ī¸ Database restore failed on secondary" + exit 1 + fi + + # Cleanup old backups (keep last 7 days) + find /tmp -name "trading_bot_backup_*.sql" -mtime +7 -delete + ssh "${SECONDARY_HOST}" "find /tmp -name 'trading_bot_backup_*.sql' -mtime +7 -delete" + + log "🎉 Daily sync completed successfully" + + # Only notify on first sync of the day or if there were issues + if [ "$(date +%H)" -eq 2 ]; then + telegram_notify "✅ Daily database sync completed" + fi +} + +main "$@" diff --git a/ha-setup/trading-bot-ha.service b/ha-setup/trading-bot-ha.service new file mode 100644 index 0000000..6a8a903 --- /dev/null +++ b/ha-setup/trading-bot-ha.service @@ -0,0 +1,24 @@ +[Unit] +Description=Trading Bot v4 HA Failover Controller +After=network.target docker.service +Requires=docker.service + +[Service] +Type=simple +User=root +WorkingDirectory=/home/icke/traderv4 +ExecStart=/bin/bash /home/icke/traderv4/ha-setup/failover-controller.sh +Restart=always +RestartSec=10 +StandardOutput=journal +StandardError=journal + +# Logging +SyslogIdentifier=trading-bot-ha + +# Security +PrivateTmp=yes +NoNewPrivileges=yes + +[Install] +WantedBy=multi-user.target