feat: Continuous optimization cluster for 2 EPYC servers

- Master controller with job queue and result aggregation - Worker scripts for parallel backtesting (22 workers per server) - SQLite database for strategy ranking and performance tracking - File-based job queue (simple, robust, survives crashes) - Auto-setup script for both EPYC servers - Status dashboard for monitoring progress - Comprehensive deployment guide Architecture: - Master: Job generation, worker coordination, result collection - Worker 1 (pve-nu-monitor01): AMD EPYC 7282, 22 parallel jobs - Worker 2 (srv-bd-host01): AMD EPYC 7302, 22 parallel jobs - Total capacity: ~49,000 backtests/day (44 cores @ 70%) Initial focus: v9 parameter refinement (27 configurations) Target: Find strategies >00/1k P&L (current baseline 92/1k) Files: - cluster/master.py: Main controller (570 lines) - cluster/worker.py: Worker execution script (220 lines) - cluster/setup_cluster.sh: Automated deployment - cluster/status.py: Real-time status dashboard - cluster/README.md: Operational documentation - cluster/DEPLOYMENT.md: Step-by-step deployment guide
2025-11-29 22:34:52 +01:00
parent 2d14f2d5c5
commit 2a8e04fe57
6 changed files with 1382 additions and 0 deletions
--- a/cluster/DEPLOYMENT.md
+++ b/cluster/DEPLOYMENT.md
@@ -0,0 +1,415 @@
+# 🚀 Continuous Optimization Cluster - Deployment Guide
+
+## ⚡ Quick Deploy (5 minutes)
+
+```bash
+cd /home/icke/traderv4/cluster
+
+# 1. Setup both EPYC servers
+./setup_cluster.sh
+
+# 2. Start master controller
+python3 master.py
+
+# 3. Monitor status (separate terminal)
+watch -n 10 'python3 status.py'
+```
+
+---
+
+## 📋 Prerequisites Checklist
+
+- [x] **SSH Access:** Keys configured for both EPYC servers
+  - `root@10.10.254.106` (pve-nu-monitor01)
+  - `root@10.20.254.100` (srv-bd-host01 via monitor01)
+
+- [x] **Python 3.7+** installed on all servers
+
+- [x] **OHLCV Data:** `backtester/data/solusdt_5m.csv` (139,678 rows)
+
+- [ ] **Backtester Code:** In `/home/icke/traderv4/backtester/`
+  - `backtester_core.py`
+  - `v9_moneyline_ma_gap.py`
+  - `moneyline_core.py`
+
+---
+
+## 🏗️ Step-by-Step Setup
+
+### Step 1: Verify Backtester Works Locally
+
+```bash
+cd /home/icke/traderv4/backtester
+
+# Test v9 backtest
+python3 backtester_core.py \
+  --data data/solusdt_5m.csv \
+  --indicator v9 \
+  --flip-threshold 0.6 \
+  --ma-gap 0.35 \
+  --momentum-adx 23 \
+  --output json
+```
+
+**Expected output:**
+```json
+{
+  "pnl": 192.50,
+  "trades": 569,
+  "win_rate": 60.98,
+  "profit_factor": 1.022,
+  "max_drawdown": 1360.58
+}
+```
+
+### Step 2: Deploy to EPYC Servers
+
+```bash
+cd /home/icke/traderv4/cluster
+./setup_cluster.sh
+```
+
+**This will:**
+1. Create `/root/optimization-cluster` on both servers
+2. Install Python venv + pandas/numpy
+3. Copy backtester code
+4. Copy worker.py script
+5. Copy OHLCV data
+6. Verify installation
+
+**Expected output:**
+```
+🚀 Setting up optimization cluster...
+
+Setting up Worker 1 (root@10.10.254.106)...
+  📦 Installing Python packages...
+  📁 Copying backtester modules...
+  📄 Installing worker script...
+  📊 Copying OHLCV data...
+  ✅ Verifying installation...
+✅ Worker 1 (pve-nu-monitor01) setup complete
+
+Setting up Worker 2 (root@10.20.254.100)...
+  📦 Installing Python packages...
+  📁 Copying backtester modules...
+  📄 Installing worker script...
+  📊 Copying OHLCV data...
+  ✅ Verifying installation...
+✅ Worker 2 (srv-bd-host01) setup complete
+
+🎉 Cluster setup complete!
+```
+
+### Step 3: Start Master Controller
+
+```bash
+python3 master.py
+```
+
+**Master will:**
+1. Initialize SQLite database (`strategies.db`)
+2. Generate initial v9 parameter sweep (27 jobs)
+3. Start monitoring loop (60-second intervals)
+4. Assign jobs to idle workers
+5. Collect and rank results
+
+**Expected output:**
+```
+🚀 Cluster Master starting...
+📊 Workers: 2
+💾 Database: /home/icke/traderv4/cluster/strategies.db
+📁 Queue: /home/icke/traderv4/cluster/queue
+
+🔧 Generating v9 parameter sweep jobs...
+✅ Created job: v9_moneyline_1701234567890 (priority 1)
+✅ Created job: v9_moneyline_1701234567891 (priority 1)
+...
+✅ Created 27 v9 refinement jobs
+
+============================================================
+🔄 Iteration 1 - 2025-11-29 15:30:00
+
+📤 Assigning v9_moneyline_1701234567890.json to worker1...
+✅ Job started on worker1
+
+📤 Assigning v9_moneyline_1701234567891.json to worker2...
+✅ Job started on worker2
+
+📊 Status: 25 queued | 2 running | 0 completed
+```
+
+### Step 4: Monitor Progress
+
+**Terminal 1 - Master logs:**
+```bash
+cd /home/icke/traderv4/cluster
+python3 master.py
+```
+
+**Terminal 2 - Status dashboard:**
+```bash
+watch -n 10 'python3 status.py'
+```
+
+**Terminal 3 - Queue size:**
+```bash
+watch -n 5 'ls -1 cluster/queue/*.json 2>/dev/null | wc -l'
+```
+
+---
+
+## 📊 Understanding Results
+
+### Status Dashboard Output
+
+```
+======================================================================
+🎯 OPTIMIZATION CLUSTER STATUS
+======================================================================
+📅 2025-11-29 15:45:00
+
+📋 Queue: 15 jobs waiting
+   Running: 2
+   Completed: 10
+
+🏆 TOP 5 STRATEGIES:
+----------------------------------------------------------------------
+Rank   Strategy                       P&L/1k       Trades   WR%      PF    
+----------------------------------------------------------------------
+1      v9_flip0.7_ma0.40_adx25       $215.80      587      62.3%    1.18  
+2      v9_flip0.6_ma0.35_adx23       $208.40      569      61.5%    1.12  
+3      v9_flip0.7_ma0.35_adx25       $205.20      601      60.8%    1.09  
+4      v9_flip0.6_ma0.40_adx21       $198.70      553      61.2%    1.07  
+5      v9_flip0.5_ma0.35_adx23       $192.50      569      60.9%    1.02  
+
+📊 BASELINE COMPARISON:
+   v9 baseline: $192.00/1k (current production)
+   Best found: $215.80/1k (+12.4% improvement) ✅
+
+======================================================================
+```
+
+### Strategy Naming Convention
+
+Format: `{indicator}_{param1}_{param2}_{param3}...`
+
+Example: `v9_flip0.7_ma0.40_adx25`
+- `v9`: Money Line indicator
+- `flip0.7`: flip_threshold = 0.7 (70% EMA flip confirmation)
+- `ma0.40`: ma_gap = 0.40 (MA50-MA200 gap bonus threshold)
+- `adx25`: momentum_adx = 25 (ADX requirement for momentum filter)
+
+---
+
+## 🔧 Troubleshooting
+
+### Problem: Workers not picking up jobs
+
+**Check worker health:**
+```bash
+ssh root@10.10.254.106 'pgrep -f backtester || echo IDLE'
+ssh root@10.10.254.106 'ssh root@10.20.254.100 "pgrep -f backtester || echo IDLE"'
+```
+
+**View worker logs:**
+```bash
+ssh root@10.10.254.106 'tail -f /root/optimization-cluster/logs/worker.log'
+```
+
+**Restart worker manually:**
+```bash
+ssh root@10.10.254.106 'cd /root/optimization-cluster && python3 worker.py jobs/v9_moneyline_*.json'
+```
+
+### Problem: Jobs stuck in "running" status
+
+**Reset stale jobs (>30 minutes):**
+```bash
+sqlite3 cluster/strategies.db <<EOF
+UPDATE jobs 
+SET status = 'queued', worker_id = NULL, started_at = NULL
+WHERE status = 'running' 
+  AND started_at < datetime('now', '-30 minutes');
+EOF
+```
+
+### Problem: Database locked
+
+**Kill master process:**
+```bash
+pkill -f 'python3 master.py'
+```
+
+**Remove lock file:**
+```bash
+rm -f cluster/strategies.db-journal
+```
+
+**Restart master:**
+```bash
+python3 master.py
+```
+
+### Problem: Out of disk space
+
+**Archive old results:**
+```bash
+cd cluster/results
+tar -czf archive_$(date +%Y%m%d).tar.gz archive/
+mv archive_$(date +%Y%m%d).tar.gz ~/backups/
+rm -rf archive/*
+```
+
+**Clean worker temp files:**
+```bash
+ssh root@10.10.254.106 'rm -rf /root/optimization-cluster/results/archive/*'
+ssh root@10.10.254.106 'ssh root@10.20.254.100 "rm -rf /root/optimization-cluster/results/archive/*"'
+```
+
+---
+
+## 🎯 Adding Custom Strategies
+
+### Example: Volume Profile Indicator
+
+1. **Implement backtester module:**
+```bash
+vim backtester/volume_profile.py
+```
+
+2. **Add job generation in master.py:**
+```python
+def generate_volume_jobs(self):
+    """Generate volume profile jobs"""
+    for window in [20, 50, 100]:
+        for threshold in [0.6, 0.7, 0.8]:
+            params = {
+                'profile_window': window,
+                'entry_threshold': threshold,
+                'stop_loss_atr': 3.0
+            }
+            
+            self.queue.create_job(
+                'volume_profile',
+                params,
+                PRIORITY_MEDIUM
+            )
+```
+
+3. **Update worker.py to handle new indicator:**
+```python
+def execute(self):
+    if self.job['indicator'] == 'v9_moneyline':
+        result = self.run_v9_backtest()
+    elif self.job['indicator'] == 'volume_profile':
+        result = self.run_volume_backtest()
+    # ...
+```
+
+4. **Deploy updates:**
+```bash
+./setup_cluster.sh  # Redeploy with new code
+```
+
+---
+
+## 📈 Performance Expectations
+
+### Cluster Capacity
+
+**Hardware:**
+- 64 total cores (32 per server)
+- 44 cores @ 70% utilization
+- ~22 parallel backtests
+
+**Throughput:**
+- ~1.6s per v9 backtest (EPYC 7282/7302)
+- ~49,000 backtests per day
+- ~1.47 million backtests per month
+
+### Optimization Timeline
+
+**Week 1 (v9 refinement):**
+- 27 initial jobs (flip × ma_gap × adx)
+- Expand to 81 jobs (add long_pos, short_pos variations)
+- Target: Find >$200/1k P&L configuration
+
+**Week 2-3 (New indicators):**
+- Volume profile: 27 configurations
+- Order flow: 27 configurations  
+- Market structure: 27 configurations
+- Target: Find >$250/1k P&L strategy
+
+**Week 4+ (Advanced):**
+- Multi-timeframe: 81 configurations
+- ML-based scoring: 100+ hyperparameter combinations
+- Target: Find >$300/1k P&L strategy
+
+---
+
+## 🔒 Safety & Deployment
+
+### Validation Gates
+
+Before deploying strategy to production:
+
+1. **Trade Count:** ≥700 trades (statistical significance)
+2. **Win Rate:** 63-68% realistic range
+3. **Profit Factor:** ≥1.5 solid edge
+4. **Max Drawdown:** <20% manageable
+5. **Sharpe Ratio:** ≥1.0 risk-adjusted
+6. **Consistency:** Top 3 for 7 days straight
+
+### Manual Deployment Process
+
+```bash
+# 1. Review top strategy
+sqlite3 cluster/strategies.db "SELECT * FROM strategies WHERE name = 'v9_flip0.7_ma0.40_adx25'"
+
+# 2. Extract parameters
+# flip_threshold: 0.7
+# ma_gap: 0.40
+# momentum_adx: 25
+
+# 3. Update TradingView indicator
+# Edit moneyline_v9_ma_gap.pinescript
+# Change parameters to winning values
+
+# 4. Update TradingView alerts
+# Verify alerts fire with new parameters
+
+# 5. Monitor first 10 trades
+# Ensure behavior matches backtest
+
+# 6. Full deployment after validation
+```
+
+### Auto-Deployment (Future)
+
+Once confident in system:
+
+1. Master marks top strategy as `status = 'staging'`
+2. User reviews via web dashboard
+3. User clicks "Deploy" button
+4. System auto-updates TradingView via API
+5. Alerts regenerated with new parameters
+6. First 10 trades monitored closely
+
+---
+
+## 📞 Support
+
+- **Main docs:** `/home/icke/traderv4/.github/copilot-instructions.md`
+- **Cluster README:** `/home/icke/traderv4/cluster/README.md`
+- **Backtester docs:** `/home/icke/traderv4/backtester/README.md` (if exists)
+
+---
+
+**Ready to deploy?**
+
+```bash
+cd /home/icke/traderv4/cluster
+./setup_cluster.sh
+```
+
+Let the machines find better strategies while you sleep! 🚀