- Master controller with job queue and result aggregation - Worker scripts for parallel backtesting (22 workers per server) - SQLite database for strategy ranking and performance tracking - File-based job queue (simple, robust, survives crashes) - Auto-setup script for both EPYC servers - Status dashboard for monitoring progress - Comprehensive deployment guide Architecture: - Master: Job generation, worker coordination, result collection - Worker 1 (pve-nu-monitor01): AMD EPYC 7282, 22 parallel jobs - Worker 2 (srv-bd-host01): AMD EPYC 7302, 22 parallel jobs - Total capacity: ~49,000 backtests/day (44 cores @ 70%) Initial focus: v9 parameter refinement (27 configurations) Target: Find strategies >00/1k P&L (current baseline 92/1k) Files: - cluster/master.py: Main controller (570 lines) - cluster/worker.py: Worker execution script (220 lines) - cluster/setup_cluster.sh: Automated deployment - cluster/status.py: Real-time status dashboard - cluster/README.md: Operational documentation - cluster/DEPLOYMENT.md: Step-by-step deployment guide
9.6 KiB
🚀 Continuous Optimization Cluster - Deployment Guide
⚡ Quick Deploy (5 minutes)
cd /home/icke/traderv4/cluster
# 1. Setup both EPYC servers
./setup_cluster.sh
# 2. Start master controller
python3 master.py
# 3. Monitor status (separate terminal)
watch -n 10 'python3 status.py'
📋 Prerequisites Checklist
-
SSH Access: Keys configured for both EPYC servers
root@10.10.254.106(pve-nu-monitor01)root@10.20.254.100(srv-bd-host01 via monitor01)
-
Python 3.7+ installed on all servers
-
OHLCV Data:
backtester/data/solusdt_5m.csv(139,678 rows) -
Backtester Code: In
/home/icke/traderv4/backtester/backtester_core.pyv9_moneyline_ma_gap.pymoneyline_core.py
🏗️ Step-by-Step Setup
Step 1: Verify Backtester Works Locally
cd /home/icke/traderv4/backtester
# Test v9 backtest
python3 backtester_core.py \
--data data/solusdt_5m.csv \
--indicator v9 \
--flip-threshold 0.6 \
--ma-gap 0.35 \
--momentum-adx 23 \
--output json
Expected output:
{
"pnl": 192.50,
"trades": 569,
"win_rate": 60.98,
"profit_factor": 1.022,
"max_drawdown": 1360.58
}
Step 2: Deploy to EPYC Servers
cd /home/icke/traderv4/cluster
./setup_cluster.sh
This will:
- Create
/root/optimization-clusteron both servers - Install Python venv + pandas/numpy
- Copy backtester code
- Copy worker.py script
- Copy OHLCV data
- Verify installation
Expected output:
🚀 Setting up optimization cluster...
Setting up Worker 1 (root@10.10.254.106)...
📦 Installing Python packages...
📁 Copying backtester modules...
📄 Installing worker script...
📊 Copying OHLCV data...
✅ Verifying installation...
✅ Worker 1 (pve-nu-monitor01) setup complete
Setting up Worker 2 (root@10.20.254.100)...
📦 Installing Python packages...
📁 Copying backtester modules...
📄 Installing worker script...
📊 Copying OHLCV data...
✅ Verifying installation...
✅ Worker 2 (srv-bd-host01) setup complete
🎉 Cluster setup complete!
Step 3: Start Master Controller
python3 master.py
Master will:
- Initialize SQLite database (
strategies.db) - Generate initial v9 parameter sweep (27 jobs)
- Start monitoring loop (60-second intervals)
- Assign jobs to idle workers
- Collect and rank results
Expected output:
🚀 Cluster Master starting...
📊 Workers: 2
💾 Database: /home/icke/traderv4/cluster/strategies.db
📁 Queue: /home/icke/traderv4/cluster/queue
🔧 Generating v9 parameter sweep jobs...
✅ Created job: v9_moneyline_1701234567890 (priority 1)
✅ Created job: v9_moneyline_1701234567891 (priority 1)
...
✅ Created 27 v9 refinement jobs
============================================================
🔄 Iteration 1 - 2025-11-29 15:30:00
📤 Assigning v9_moneyline_1701234567890.json to worker1...
✅ Job started on worker1
📤 Assigning v9_moneyline_1701234567891.json to worker2...
✅ Job started on worker2
📊 Status: 25 queued | 2 running | 0 completed
Step 4: Monitor Progress
Terminal 1 - Master logs:
cd /home/icke/traderv4/cluster
python3 master.py
Terminal 2 - Status dashboard:
watch -n 10 'python3 status.py'
Terminal 3 - Queue size:
watch -n 5 'ls -1 cluster/queue/*.json 2>/dev/null | wc -l'
📊 Understanding Results
Status Dashboard Output
======================================================================
🎯 OPTIMIZATION CLUSTER STATUS
======================================================================
📅 2025-11-29 15:45:00
📋 Queue: 15 jobs waiting
Running: 2
Completed: 10
🏆 TOP 5 STRATEGIES:
----------------------------------------------------------------------
Rank Strategy P&L/1k Trades WR% PF
----------------------------------------------------------------------
1 v9_flip0.7_ma0.40_adx25 $215.80 587 62.3% 1.18
2 v9_flip0.6_ma0.35_adx23 $208.40 569 61.5% 1.12
3 v9_flip0.7_ma0.35_adx25 $205.20 601 60.8% 1.09
4 v9_flip0.6_ma0.40_adx21 $198.70 553 61.2% 1.07
5 v9_flip0.5_ma0.35_adx23 $192.50 569 60.9% 1.02
📊 BASELINE COMPARISON:
v9 baseline: $192.00/1k (current production)
Best found: $215.80/1k (+12.4% improvement) ✅
======================================================================
Strategy Naming Convention
Format: {indicator}_{param1}_{param2}_{param3}...
Example: v9_flip0.7_ma0.40_adx25
v9: Money Line indicatorflip0.7: flip_threshold = 0.7 (70% EMA flip confirmation)ma0.40: ma_gap = 0.40 (MA50-MA200 gap bonus threshold)adx25: momentum_adx = 25 (ADX requirement for momentum filter)
🔧 Troubleshooting
Problem: Workers not picking up jobs
Check worker health:
ssh root@10.10.254.106 'pgrep -f backtester || echo IDLE'
ssh root@10.10.254.106 'ssh root@10.20.254.100 "pgrep -f backtester || echo IDLE"'
View worker logs:
ssh root@10.10.254.106 'tail -f /root/optimization-cluster/logs/worker.log'
Restart worker manually:
ssh root@10.10.254.106 'cd /root/optimization-cluster && python3 worker.py jobs/v9_moneyline_*.json'
Problem: Jobs stuck in "running" status
Reset stale jobs (>30 minutes):
sqlite3 cluster/strategies.db <<EOF
UPDATE jobs
SET status = 'queued', worker_id = NULL, started_at = NULL
WHERE status = 'running'
AND started_at < datetime('now', '-30 minutes');
EOF
Problem: Database locked
Kill master process:
pkill -f 'python3 master.py'
Remove lock file:
rm -f cluster/strategies.db-journal
Restart master:
python3 master.py
Problem: Out of disk space
Archive old results:
cd cluster/results
tar -czf archive_$(date +%Y%m%d).tar.gz archive/
mv archive_$(date +%Y%m%d).tar.gz ~/backups/
rm -rf archive/*
Clean worker temp files:
ssh root@10.10.254.106 'rm -rf /root/optimization-cluster/results/archive/*'
ssh root@10.10.254.106 'ssh root@10.20.254.100 "rm -rf /root/optimization-cluster/results/archive/*"'
🎯 Adding Custom Strategies
Example: Volume Profile Indicator
- Implement backtester module:
vim backtester/volume_profile.py
- Add job generation in master.py:
def generate_volume_jobs(self):
"""Generate volume profile jobs"""
for window in [20, 50, 100]:
for threshold in [0.6, 0.7, 0.8]:
params = {
'profile_window': window,
'entry_threshold': threshold,
'stop_loss_atr': 3.0
}
self.queue.create_job(
'volume_profile',
params,
PRIORITY_MEDIUM
)
- Update worker.py to handle new indicator:
def execute(self):
if self.job['indicator'] == 'v9_moneyline':
result = self.run_v9_backtest()
elif self.job['indicator'] == 'volume_profile':
result = self.run_volume_backtest()
# ...
- Deploy updates:
./setup_cluster.sh # Redeploy with new code
📈 Performance Expectations
Cluster Capacity
Hardware:
- 64 total cores (32 per server)
- 44 cores @ 70% utilization
- ~22 parallel backtests
Throughput:
- ~1.6s per v9 backtest (EPYC 7282/7302)
- ~49,000 backtests per day
- ~1.47 million backtests per month
Optimization Timeline
Week 1 (v9 refinement):
- 27 initial jobs (flip × ma_gap × adx)
- Expand to 81 jobs (add long_pos, short_pos variations)
- Target: Find >$200/1k P&L configuration
Week 2-3 (New indicators):
- Volume profile: 27 configurations
- Order flow: 27 configurations
- Market structure: 27 configurations
- Target: Find >$250/1k P&L strategy
Week 4+ (Advanced):
- Multi-timeframe: 81 configurations
- ML-based scoring: 100+ hyperparameter combinations
- Target: Find >$300/1k P&L strategy
🔒 Safety & Deployment
Validation Gates
Before deploying strategy to production:
- Trade Count: ≥700 trades (statistical significance)
- Win Rate: 63-68% realistic range
- Profit Factor: ≥1.5 solid edge
- Max Drawdown: <20% manageable
- Sharpe Ratio: ≥1.0 risk-adjusted
- Consistency: Top 3 for 7 days straight
Manual Deployment Process
# 1. Review top strategy
sqlite3 cluster/strategies.db "SELECT * FROM strategies WHERE name = 'v9_flip0.7_ma0.40_adx25'"
# 2. Extract parameters
# flip_threshold: 0.7
# ma_gap: 0.40
# momentum_adx: 25
# 3. Update TradingView indicator
# Edit moneyline_v9_ma_gap.pinescript
# Change parameters to winning values
# 4. Update TradingView alerts
# Verify alerts fire with new parameters
# 5. Monitor first 10 trades
# Ensure behavior matches backtest
# 6. Full deployment after validation
Auto-Deployment (Future)
Once confident in system:
- Master marks top strategy as
status = 'staging' - User reviews via web dashboard
- User clicks "Deploy" button
- System auto-updates TradingView via API
- Alerts regenerated with new parameters
- First 10 trades monitored closely
📞 Support
- Main docs:
/home/icke/traderv4/.github/copilot-instructions.md - Cluster README:
/home/icke/traderv4/cluster/README.md - Backtester docs:
/home/icke/traderv4/backtester/README.md(if exists)
Ready to deploy?
cd /home/icke/traderv4/cluster
./setup_cluster.sh
Let the machines find better strategies while you sleep! 🚀