docs: Update HA Infrastructure section - decommissioned Jan 6, 2026
- Status changed from PRODUCTION READY to DECOMMISSIONED - Cost savings: ~$20-30/month (overkill for current $540 capital) - Backup archived at /home/icke/backups/hostinger-ha/ - Contains: DB dump, n8n data, env, DNS scripts, docker-compose, nginx configs - Restoration instructions added for when capital grows - Historical reference preserved for future re-implementation
This commit is contained in:
162
.github/copilot-instructions.md
vendored
162
.github/copilot-instructions.md
vendored
@@ -3166,112 +3166,76 @@ Web UI → /api/settings POST
|
||||
|
||||
**DATABASE_URL caveat:** Use `trading-bot-postgres` (container name) in .env for runtime, but `localhost:5432` for Prisma CLI migrations from host
|
||||
|
||||
## High Availability Infrastructure (Nov 25, 2025 - PRODUCTION READY | Dec 12, 2025 - AUTO-PROMOTE ENHANCED)
|
||||
## High Availability Infrastructure (DECOMMISSIONED - Jan 6, 2026)
|
||||
|
||||
**Status:** ✅ FULLY AUTOMATED - Zero-downtime failover with automatic database promotion
|
||||
**Status:** ⏸️ DECOMMISSIONED - Secondary server cleaned up for cost savings
|
||||
|
||||
**Architecture Overview:**
|
||||
**Current Architecture (Single Server):**
|
||||
```
|
||||
Primary Server (srvdocker02) Secondary Server (Hostinger)
|
||||
95.216.52.28:3001 72.62.39.24:3001
|
||||
├── trading-bot-v4 (Docker) ├── trading-bot-v4-secondary (Docker)
|
||||
├── trading-bot-postgres (PRIMARY) ├── trading-bot-postgres (STANDBY→PRIMARY on failover)
|
||||
├── nginx (HTTPS/SSL) ├── nginx (HTTPS/SSL)
|
||||
└── Source: Active deployment └── Source: Standby (real-time sync)
|
||||
95.216.52.28:3001 72.62.39.24 - DECOMMISSIONED
|
||||
├── trading-bot-v4 (Docker) └── Cleaned to base OS only
|
||||
├── trading-bot-postgres (PRIMARY) (ready for power-off via Hostinger panel)
|
||||
├── nginx (HTTPS/SSL)
|
||||
└── Source: Active deployment
|
||||
|
||||
↓
|
||||
DNS: tradervone.v4.dedyn.io
|
||||
(INWX automatic failover)
|
||||
↓
|
||||
Monitoring: dns-failover.service
|
||||
(systemd service on secondary)
|
||||
↓
|
||||
AUTO-PROMOTE: pg_ctl promote (Dec 12, 2025)
|
||||
SPLIT-BRAIN PREVENTION: DEMOTED flag
|
||||
No automatic failover currently active
|
||||
```
|
||||
|
||||
**Key Components:**
|
||||
**Decommissioning Details (Jan 6, 2026):**
|
||||
- **Reason:** Cost savings - HA setup overkill for current capital ($540), ~$20-30/month saved
|
||||
- **Decision:** "once we are trading serious money we can restore the system"
|
||||
- **VPS State:** All Docker containers, images, volumes removed; all project files deleted
|
||||
- **Disk Usage:** Reduced from ~10GB+ to 9.6GB (base Debian 12 OS only)
|
||||
|
||||
1. **Database Replication (PostgreSQL Streaming)**
|
||||
- Type: Asynchronous streaming replication
|
||||
- Lag: <1 second typical
|
||||
- Config: `/home/icke/traderv4/docs/DEPLOY_SECONDARY_MANUAL.md`
|
||||
- Verify: `ssh root@72.62.39.24 'docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c "SELECT status, write_lag FROM pg_stat_replication;"'`
|
||||
**Backup for Future Restoration:**
|
||||
- **Location:** `/home/icke/backups/hostinger-ha/ha-backup-20260106_131044.tar.gz` (8.3MB)
|
||||
- **Contents:**
|
||||
* `trading_bot_v4.dump` (27MB uncompressed) - Full PostgreSQL database backup
|
||||
* `n8n-data/` - n8n workflow automation data
|
||||
* `env-backup.txt` - All environment variables
|
||||
* `dns-failover-monitor.py` - DNS failover monitoring script
|
||||
* `manual-dns-switch.py` - Manual DNS switching utility
|
||||
* `docker-compose.yml` - Container orchestration config
|
||||
* `nginx/` - Reverse proxy configurations
|
||||
|
||||
2. **DNS Failover Monitor (Automated - Enhanced Dec 12, 2025)**
|
||||
- Service: `/etc/systemd/system/dns-failover.service`
|
||||
- Script: `/usr/local/bin/dns-failover-monitor.py` (enhanced with auto-promote)
|
||||
- Check interval: 30 seconds
|
||||
- Failure threshold: 3 consecutive failures (90 seconds total)
|
||||
- Health endpoint: `http://95.216.52.28:3001/api/health` (must return valid JSON)
|
||||
- Logs: `/var/log/dns-failover.log`
|
||||
- Status: `ssh root@72.62.39.24 'systemctl status dns-failover'`
|
||||
- **NEW:** Auto-promotes secondary database to PRIMARY on failover
|
||||
- **NEW:** Creates DEMOTED flag on primary to prevent split-brain
|
||||
**To Restore HA Setup (When Needed):**
|
||||
```bash
|
||||
# 1. Power on Hostinger VPS via control panel
|
||||
# 2. SSH to Hostinger
|
||||
ssh root@72.62.39.24
|
||||
|
||||
3. **Automatic Failover Sequence (Enhanced Dec 12, 2025):**
|
||||
```
|
||||
Primary Failure Detected (3 × 30s checks = 90s)
|
||||
↓
|
||||
STEP 1: SSH to primary, create /var/lib/postgresql/data/DEMOTED flag
|
||||
↓
|
||||
STEP 2: Promote secondary database: pg_ctl promote
|
||||
↓
|
||||
STEP 3: Verify database writable (pg_is_in_recovery() = false)
|
||||
↓
|
||||
STEP 4: DNS Update via INWX API (<1 second)
|
||||
tradervone.v4.dedyn.io: 95.216.52.28 → 72.62.39.24
|
||||
↓
|
||||
Secondary Now PRIMARY - Full Read/Write (0s downtime)
|
||||
TradingView webhooks → Secondary bot → Writes to promoted database
|
||||
↓
|
||||
Primary Recovery Detected
|
||||
↓
|
||||
Telegram Notification: Manual rewind needed (future: automatic)
|
||||
```
|
||||
# 3. Upload backup from primary
|
||||
scp /home/icke/backups/hostinger-ha/ha-backup-20260106_131044.tar.gz root@72.62.39.24:/root/
|
||||
|
||||
4. **Split-Brain Prevention System (Dec 12, 2025):**
|
||||
- **DEMOTED Flag:** `/var/lib/postgresql/data/DEMOTED` created on primary during failover
|
||||
- **Purpose:** Prevents old primary from accepting writes when it rejoins
|
||||
- **Startup Safety Script:** `/usr/local/bin/postgres-startup-check.sh` (created, not yet integrated)
|
||||
- **Future Auto-Failback:** Script checks flag, auto-rewinds from new primary via pg_basebackup
|
||||
- **Safe Failure Mode:** If flag exists and secondary not responding, refuse to start
|
||||
# 4. Extract backup
|
||||
cd /root && tar -xzf ha-backup-20260106_131044.tar.gz
|
||||
|
||||
4. **Live Test Results (Nov 25, 2025 21:53-22:00 CET):**
|
||||
- **Detection Time:** 90 seconds (3 × 30s health checks)
|
||||
- **Failover Execution:** <1 second (DNS update)
|
||||
- **Service Downtime:** 0 seconds (seamless takeover)
|
||||
- **Failback:** Automatic and immediate when primary recovered
|
||||
- **Total Cycle:** ~7 minutes from failure to full restoration
|
||||
- **Result:** ✅ Zero downtime, zero duplicate trades, zero data loss
|
||||
- **Note:** Nov 25 test was DNS-only; Dec 12 enhancement adds database promotion
|
||||
# 5. Follow deployment guide
|
||||
# See: docs/DEPLOY_SECONDARY_MANUAL.md (689 lines)
|
||||
```
|
||||
|
||||
5. **Enhanced Failover Results (Dec 12, 2025 - Expected):**
|
||||
- **Detection Time:** 90 seconds (3 × 30s health checks)
|
||||
- **Database Promotion:** <5 seconds (pg_ctl promote)
|
||||
- **DNS Update:** <1 second (INWX API)
|
||||
- **Service Downtime:** 0 seconds (seamless takeover)
|
||||
- **Database State:** Secondary now PRIMARY (read-write)
|
||||
- **Split-Brain Prevention:** DEMOTED flag created on old primary
|
||||
- **Result:** ✅ Zero downtime, zero data loss, zero manual intervention needed
|
||||
- **Testing Status:** ⏳ Awaiting controlled failover test
|
||||
**Historical Reference (Nov 25 - Dec 12, 2025):**
|
||||
The HA system was fully functional with:
|
||||
- PostgreSQL streaming replication (<1 second lag)
|
||||
- DNS failover via INWX API (90 second detection)
|
||||
- Auto-promote on failover (Dec 12, 2025 enhancement)
|
||||
- Split-brain prevention via DEMOTED flag
|
||||
- Zero-downtime failover verified in live tests
|
||||
|
||||
**Critical Operational Notes:**
|
||||
**Why HA Was Valuable:**
|
||||
- Zero missed trading signals during server failures
|
||||
- Database replication prevented trade history loss
|
||||
- Autonomous failover while user sleeps
|
||||
- Enterprise-grade 99.9%+ uptime
|
||||
|
||||
- **Primary Health Check Firewall:** pfSense rule allows Hostinger (72.62.39.24) → srvdocker02:3001 for health checks
|
||||
- **Both Bots on Port 3001:** Reverse proxies handle HTTPS, internal port standardized for consistency
|
||||
- **Health Endpoint Requirements:** Must return valid JSON (not HTML 404). Monitor uses JSON validation to detect failures.
|
||||
- **Manual Failover (Emergency):** `ssh root@72.62.39.24 'python3 /usr/local/bin/manual-dns-switch.py secondary'`
|
||||
- **Database Promotion (Manual):** `ssh root@72.62.39.24 'docker exec trading-bot-postgres pg_ctl promote'`
|
||||
- **Check Primary Status:** `ssh root@95.216.52.28 'ls -la /var/lib/postgresql/data/ | grep DEMOTED'`
|
||||
- **Update Secondary Bot:**
|
||||
```bash
|
||||
rsync -avz --exclude 'node_modules' --exclude '.next' --exclude 'logs' \
|
||||
/home/icke/traderv4/ root@72.62.39.24:/root/traderv4-secondary/
|
||||
ssh root@72.62.39.24 'cd /root/traderv4-secondary && docker compose build trading-bot && docker compose up -d --force-recreate trading-bot'
|
||||
```
|
||||
**When to Restore:**
|
||||
- Capital reaches $5,000+ (meaningful money at risk)
|
||||
- Trading volume increases significantly
|
||||
- User wants peace of mind for larger positions
|
||||
- After any primary server reliability issues
|
||||
|
||||
**Documentation References:**
|
||||
**Documentation References (Historical):**
|
||||
- **Deployment Guide:** `docs/DEPLOY_SECONDARY_MANUAL.md` (689 lines)
|
||||
- **Auto-Promote Documentation:** `docs/HA_AUTO_FAILOVER_DEPLOYED_DEC12_2025.md` (1000+ lines)
|
||||
- **Roadmap:** `HA_SETUP_ROADMAP.md` (all phases complete)
|
||||
@@ -3280,24 +3244,6 @@ Primary Server (srvdocker02) Secondary Server (Hostinger)
|
||||
- `62c7b70` - Roadmap completion documentation (Nov 25, 2025)
|
||||
- `d637aac` - Auto-promote HA deployment (Dec 12, 2025)
|
||||
|
||||
**Why This Matters:**
|
||||
- **Financial Protection:** Trading bot stays online 24/7 even if primary server fails
|
||||
- **Zero Downtime:** Automatic failover ensures no missed trading signals
|
||||
- **Data Integrity:** Database replication prevents trade history loss
|
||||
- **No Manual Intervention:** Database auto-promotes, no need to SSH and run pg_ctl manually
|
||||
- **Split-Brain Safety:** DEMOTED flag prevents data corruption when old primary rejoins
|
||||
- **Peace of Mind:** System handles failures autonomously while user sleeps
|
||||
- **Cost:** ~$20-30/month for enterprise-grade 99.9%+ uptime
|
||||
|
||||
**When Making Changes:**
|
||||
- **Code Deployments:** Deploy to primary first, test, then rsync to secondary
|
||||
- **Database Migrations:** Run on primary only (replicates automatically)
|
||||
- **Container Restarts:** Primary can be restarted safely, failover protection active
|
||||
- **Testing:** Use `docker stop trading-bot-v4` on primary to test failover (verified working)
|
||||
- **Monitor Logs:** `ssh root@72.62.39.24 'tail -f /var/log/dns-failover.log'` to watch health checks
|
||||
- **After Failover:** Manual pg_rewind needed until startup safety script integrated with Docker
|
||||
- **Verify Replication:** After failback, check `pg_stat_replication` to confirm streaming resumed
|
||||
|
||||
## Project-Specific Patterns
|
||||
|
||||
### 1. Singleton Services
|
||||
|
||||
@@ -242,7 +242,11 @@ plotshape(finalShortSignal, title="Sell", location=location.abovebar, color=colo
|
||||
// DEBUG TABLE
|
||||
// =============================================================================
|
||||
|
||||
var table dbg = table.new(position.top_right, 2, 9, bgcolor=color.new(color.black, 80))
|
||||
// Regime detection based on MA Gap
|
||||
regimeText = maGap > 0.5 ? "BULL 🐂" : maGap < -0.5 ? "BEAR 🐻" : "NEUTRAL"
|
||||
regimeColor = maGap > 0.5 ? color.lime : maGap < -0.5 ? color.red : color.yellow
|
||||
|
||||
var table dbg = table.new(position.top_right, 2, 10, bgcolor=color.new(color.black, 80))
|
||||
if barstate.islast
|
||||
table.cell(dbg, 0, 0, "Trend", text_color=color.white)
|
||||
table.cell(dbg, 1, 0, trend == 1 ? "LONG ✓" : "SHORT ✓", text_color=trend == 1 ? color.lime : color.red)
|
||||
@@ -260,5 +264,7 @@ if barstate.islast
|
||||
table.cell(dbg, 1, 6, directionMode, text_color=directionMode == "Long Only" ? color.lime : directionMode == "Short Only" ? color.red : color.yellow)
|
||||
table.cell(dbg, 0, 7, "Signal", text_color=color.white)
|
||||
table.cell(dbg, 1, 7, finalLongSignal ? "BUY!" : finalShortSignal ? "SELL!" : "—", text_color=finalLongSignal ? color.lime : finalShortSignal ? color.red : color.gray)
|
||||
table.cell(dbg, 0, 8, "Version", text_color=color.white)
|
||||
table.cell(dbg, 1, 8, indicatorVersion, text_color=color.yellow)
|
||||
table.cell(dbg, 0, 8, "MA Gap", text_color=color.white)
|
||||
table.cell(dbg, 1, 8, str.tostring(maGap, "#.##") + "% " + regimeText, text_color=regimeColor)
|
||||
table.cell(dbg, 0, 9, "Version", text_color=color.white)
|
||||
table.cell(dbg, 1, 9, indicatorVersion, text_color=color.yellow)
|
||||
|
||||
Reference in New Issue
Block a user