Files
trading_bot_v4/cluster/DEPLOYMENT.md
mindesbunister 2a8e04fe57 feat: Continuous optimization cluster for 2 EPYC servers
- Master controller with job queue and result aggregation
- Worker scripts for parallel backtesting (22 workers per server)
- SQLite database for strategy ranking and performance tracking
- File-based job queue (simple, robust, survives crashes)
- Auto-setup script for both EPYC servers
- Status dashboard for monitoring progress
- Comprehensive deployment guide

Architecture:
- Master: Job generation, worker coordination, result collection
- Worker 1 (pve-nu-monitor01): AMD EPYC 7282, 22 parallel jobs
- Worker 2 (srv-bd-host01): AMD EPYC 7302, 22 parallel jobs
- Total capacity: ~49,000 backtests/day (44 cores @ 70%)

Initial focus: v9 parameter refinement (27 configurations)
Target: Find strategies >00/1k P&L (current baseline 92/1k)

Files:
- cluster/master.py: Main controller (570 lines)
- cluster/worker.py: Worker execution script (220 lines)
- cluster/setup_cluster.sh: Automated deployment
- cluster/status.py: Real-time status dashboard
- cluster/README.md: Operational documentation
- cluster/DEPLOYMENT.md: Step-by-step deployment guide
2025-11-29 22:34:52 +01:00

416 lines
9.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# 🚀 Continuous Optimization Cluster - Deployment Guide
## ⚡ Quick Deploy (5 minutes)
```bash
cd /home/icke/traderv4/cluster
# 1. Setup both EPYC servers
./setup_cluster.sh
# 2. Start master controller
python3 master.py
# 3. Monitor status (separate terminal)
watch -n 10 'python3 status.py'
```
---
## 📋 Prerequisites Checklist
- [x] **SSH Access:** Keys configured for both EPYC servers
- `root@10.10.254.106` (pve-nu-monitor01)
- `root@10.20.254.100` (srv-bd-host01 via monitor01)
- [x] **Python 3.7+** installed on all servers
- [x] **OHLCV Data:** `backtester/data/solusdt_5m.csv` (139,678 rows)
- [ ] **Backtester Code:** In `/home/icke/traderv4/backtester/`
- `backtester_core.py`
- `v9_moneyline_ma_gap.py`
- `moneyline_core.py`
---
## 🏗️ Step-by-Step Setup
### Step 1: Verify Backtester Works Locally
```bash
cd /home/icke/traderv4/backtester
# Test v9 backtest
python3 backtester_core.py \
--data data/solusdt_5m.csv \
--indicator v9 \
--flip-threshold 0.6 \
--ma-gap 0.35 \
--momentum-adx 23 \
--output json
```
**Expected output:**
```json
{
"pnl": 192.50,
"trades": 569,
"win_rate": 60.98,
"profit_factor": 1.022,
"max_drawdown": 1360.58
}
```
### Step 2: Deploy to EPYC Servers
```bash
cd /home/icke/traderv4/cluster
./setup_cluster.sh
```
**This will:**
1. Create `/root/optimization-cluster` on both servers
2. Install Python venv + pandas/numpy
3. Copy backtester code
4. Copy worker.py script
5. Copy OHLCV data
6. Verify installation
**Expected output:**
```
🚀 Setting up optimization cluster...
Setting up Worker 1 (root@10.10.254.106)...
📦 Installing Python packages...
📁 Copying backtester modules...
📄 Installing worker script...
📊 Copying OHLCV data...
✅ Verifying installation...
✅ Worker 1 (pve-nu-monitor01) setup complete
Setting up Worker 2 (root@10.20.254.100)...
📦 Installing Python packages...
📁 Copying backtester modules...
📄 Installing worker script...
📊 Copying OHLCV data...
✅ Verifying installation...
✅ Worker 2 (srv-bd-host01) setup complete
🎉 Cluster setup complete!
```
### Step 3: Start Master Controller
```bash
python3 master.py
```
**Master will:**
1. Initialize SQLite database (`strategies.db`)
2. Generate initial v9 parameter sweep (27 jobs)
3. Start monitoring loop (60-second intervals)
4. Assign jobs to idle workers
5. Collect and rank results
**Expected output:**
```
🚀 Cluster Master starting...
📊 Workers: 2
💾 Database: /home/icke/traderv4/cluster/strategies.db
📁 Queue: /home/icke/traderv4/cluster/queue
🔧 Generating v9 parameter sweep jobs...
✅ Created job: v9_moneyline_1701234567890 (priority 1)
✅ Created job: v9_moneyline_1701234567891 (priority 1)
...
✅ Created 27 v9 refinement jobs
============================================================
🔄 Iteration 1 - 2025-11-29 15:30:00
📤 Assigning v9_moneyline_1701234567890.json to worker1...
✅ Job started on worker1
📤 Assigning v9_moneyline_1701234567891.json to worker2...
✅ Job started on worker2
📊 Status: 25 queued | 2 running | 0 completed
```
### Step 4: Monitor Progress
**Terminal 1 - Master logs:**
```bash
cd /home/icke/traderv4/cluster
python3 master.py
```
**Terminal 2 - Status dashboard:**
```bash
watch -n 10 'python3 status.py'
```
**Terminal 3 - Queue size:**
```bash
watch -n 5 'ls -1 cluster/queue/*.json 2>/dev/null | wc -l'
```
---
## 📊 Understanding Results
### Status Dashboard Output
```
======================================================================
🎯 OPTIMIZATION CLUSTER STATUS
======================================================================
📅 2025-11-29 15:45:00
📋 Queue: 15 jobs waiting
Running: 2
Completed: 10
🏆 TOP 5 STRATEGIES:
----------------------------------------------------------------------
Rank Strategy P&L/1k Trades WR% PF
----------------------------------------------------------------------
1 v9_flip0.7_ma0.40_adx25 $215.80 587 62.3% 1.18
2 v9_flip0.6_ma0.35_adx23 $208.40 569 61.5% 1.12
3 v9_flip0.7_ma0.35_adx25 $205.20 601 60.8% 1.09
4 v9_flip0.6_ma0.40_adx21 $198.70 553 61.2% 1.07
5 v9_flip0.5_ma0.35_adx23 $192.50 569 60.9% 1.02
📊 BASELINE COMPARISON:
v9 baseline: $192.00/1k (current production)
Best found: $215.80/1k (+12.4% improvement) ✅
======================================================================
```
### Strategy Naming Convention
Format: `{indicator}_{param1}_{param2}_{param3}...`
Example: `v9_flip0.7_ma0.40_adx25`
- `v9`: Money Line indicator
- `flip0.7`: flip_threshold = 0.7 (70% EMA flip confirmation)
- `ma0.40`: ma_gap = 0.40 (MA50-MA200 gap bonus threshold)
- `adx25`: momentum_adx = 25 (ADX requirement for momentum filter)
---
## 🔧 Troubleshooting
### Problem: Workers not picking up jobs
**Check worker health:**
```bash
ssh root@10.10.254.106 'pgrep -f backtester || echo IDLE'
ssh root@10.10.254.106 'ssh root@10.20.254.100 "pgrep -f backtester || echo IDLE"'
```
**View worker logs:**
```bash
ssh root@10.10.254.106 'tail -f /root/optimization-cluster/logs/worker.log'
```
**Restart worker manually:**
```bash
ssh root@10.10.254.106 'cd /root/optimization-cluster && python3 worker.py jobs/v9_moneyline_*.json'
```
### Problem: Jobs stuck in "running" status
**Reset stale jobs (>30 minutes):**
```bash
sqlite3 cluster/strategies.db <<EOF
UPDATE jobs
SET status = 'queued', worker_id = NULL, started_at = NULL
WHERE status = 'running'
AND started_at < datetime('now', '-30 minutes');
EOF
```
### Problem: Database locked
**Kill master process:**
```bash
pkill -f 'python3 master.py'
```
**Remove lock file:**
```bash
rm -f cluster/strategies.db-journal
```
**Restart master:**
```bash
python3 master.py
```
### Problem: Out of disk space
**Archive old results:**
```bash
cd cluster/results
tar -czf archive_$(date +%Y%m%d).tar.gz archive/
mv archive_$(date +%Y%m%d).tar.gz ~/backups/
rm -rf archive/*
```
**Clean worker temp files:**
```bash
ssh root@10.10.254.106 'rm -rf /root/optimization-cluster/results/archive/*'
ssh root@10.10.254.106 'ssh root@10.20.254.100 "rm -rf /root/optimization-cluster/results/archive/*"'
```
---
## 🎯 Adding Custom Strategies
### Example: Volume Profile Indicator
1. **Implement backtester module:**
```bash
vim backtester/volume_profile.py
```
2. **Add job generation in master.py:**
```python
def generate_volume_jobs(self):
"""Generate volume profile jobs"""
for window in [20, 50, 100]:
for threshold in [0.6, 0.7, 0.8]:
params = {
'profile_window': window,
'entry_threshold': threshold,
'stop_loss_atr': 3.0
}
self.queue.create_job(
'volume_profile',
params,
PRIORITY_MEDIUM
)
```
3. **Update worker.py to handle new indicator:**
```python
def execute(self):
if self.job['indicator'] == 'v9_moneyline':
result = self.run_v9_backtest()
elif self.job['indicator'] == 'volume_profile':
result = self.run_volume_backtest()
# ...
```
4. **Deploy updates:**
```bash
./setup_cluster.sh # Redeploy with new code
```
---
## 📈 Performance Expectations
### Cluster Capacity
**Hardware:**
- 64 total cores (32 per server)
- 44 cores @ 70% utilization
- ~22 parallel backtests
**Throughput:**
- ~1.6s per v9 backtest (EPYC 7282/7302)
- ~49,000 backtests per day
- ~1.47 million backtests per month
### Optimization Timeline
**Week 1 (v9 refinement):**
- 27 initial jobs (flip × ma_gap × adx)
- Expand to 81 jobs (add long_pos, short_pos variations)
- Target: Find >$200/1k P&L configuration
**Week 2-3 (New indicators):**
- Volume profile: 27 configurations
- Order flow: 27 configurations
- Market structure: 27 configurations
- Target: Find >$250/1k P&L strategy
**Week 4+ (Advanced):**
- Multi-timeframe: 81 configurations
- ML-based scoring: 100+ hyperparameter combinations
- Target: Find >$300/1k P&L strategy
---
## 🔒 Safety & Deployment
### Validation Gates
Before deploying strategy to production:
1. **Trade Count:** ≥700 trades (statistical significance)
2. **Win Rate:** 63-68% realistic range
3. **Profit Factor:** ≥1.5 solid edge
4. **Max Drawdown:** <20% manageable
5. **Sharpe Ratio:** ≥1.0 risk-adjusted
6. **Consistency:** Top 3 for 7 days straight
### Manual Deployment Process
```bash
# 1. Review top strategy
sqlite3 cluster/strategies.db "SELECT * FROM strategies WHERE name = 'v9_flip0.7_ma0.40_adx25'"
# 2. Extract parameters
# flip_threshold: 0.7
# ma_gap: 0.40
# momentum_adx: 25
# 3. Update TradingView indicator
# Edit moneyline_v9_ma_gap.pinescript
# Change parameters to winning values
# 4. Update TradingView alerts
# Verify alerts fire with new parameters
# 5. Monitor first 10 trades
# Ensure behavior matches backtest
# 6. Full deployment after validation
```
### Auto-Deployment (Future)
Once confident in system:
1. Master marks top strategy as `status = 'staging'`
2. User reviews via web dashboard
3. User clicks "Deploy" button
4. System auto-updates TradingView via API
5. Alerts regenerated with new parameters
6. First 10 trades monitored closely
---
## 📞 Support
- **Main docs:** `/home/icke/traderv4/.github/copilot-instructions.md`
- **Cluster README:** `/home/icke/traderv4/cluster/README.md`
- **Backtester docs:** `/home/icke/traderv4/backtester/README.md` (if exists)
---
**Ready to deploy?**
```bash
cd /home/icke/traderv4/cluster
./setup_cluster.sh
```
Let the machines find better strategies while you sleep! 🚀