New Features: - Distributed coordinator orchestrates 2x AMD EPYC 16-core servers - 64 total cores processing 12M parameter combinations (70% CPU limit) - Worker1 (pve-nu-monitor01): Direct SSH access at 10.10.254.106 - Worker2 (bd-host01): 2-hop SSH through worker1 (10.20.254.100) - Web UI at /cluster shows real-time status and AI recommendations - API endpoint /api/cluster/status serves cluster metrics - Auto-refresh every 30s with top strategies and actionable insights Files Added: - cluster/distributed_coordinator.py (510 lines) - Main orchestrator - cluster/distributed_worker.py (271 lines) - Worker1 script - cluster/distributed_worker_bd_clean.py (275 lines) - Worker2 script - cluster/monitor_bd_host01.sh - Monitoring script - app/api/cluster/status/route.ts (274 lines) - API endpoint - app/cluster/page.tsx (258 lines) - Web UI - cluster/CLUSTER_SETUP.md - Complete setup and access documentation Technical Details: - SQLite database tracks chunk assignments - 10,000 combinations per chunk (1,195 total chunks) - Multiprocessing.Pool with 70% CPU limit (22 cores per EPYC) - SSH/SCP for deployment and result collection - Handles 2-hop SSH for bd-host01 access - Results in CSV format with top strategies ranked Access Documentation: - Worker1: ssh root@10.10.254.106 - Worker2: ssh root@10.10.254.106 "ssh root@10.20.254.100" - Web UI: http://localhost:3001/cluster - See CLUSTER_SETUP.md for complete guide Status: Deployed and operational
Continuous Optimization Cluster
24/7 automated strategy optimization across 2 EPYC servers (64 cores total).
🏗️ Architecture
Master (your local machine)
↓ Job Queue (file-based)
↓
Worker 1: pve-nu-monitor01 (22 workers @ 70% CPU)
Worker 2: srv-bd-host01 (22 workers @ 70% CPU)
↓
Results Database (SQLite)
↓
Top Strategies (auto-deployment ready)
🚀 Quick Start
1. Setup Cluster
cd /home/icke/traderv4/cluster
chmod +x setup_cluster.sh
./setup_cluster.sh
This will:
- Create
/root/optimization-clusteron both EPYC servers - Install Python dependencies (pandas, numpy)
- Copy backtester code and OHLCV data
- Install worker scripts
2. Start Master Controller
python3 master.py
Master will:
- Generate initial job queue (v9 parameter sweep: 27 combinations)
- Monitor both workers every 60 seconds
- Assign jobs to idle workers
- Collect and rank results
- Display top performers
3. Monitor Progress
Terminal 1 - Master logs:
cd /home/icke/traderv4/cluster
python3 master.py
Terminal 2 - Job queue:
watch -n 5 'ls -1 cluster/queue/*.json 2>/dev/null | wc -l'
Terminal 3 - Results:
watch -n 10 'sqlite3 cluster/strategies.db "SELECT name, pnl_per_1k, trade_count, win_rate FROM strategies ORDER BY pnl_per_1k DESC LIMIT 5"'
📊 Database Schema
strategies table
name: Strategy identifier (e.g., "v9_flip0.6_ma0.35_adx23")indicator_type: Indicator family (v9_moneyline, volume_profile, etc.)params: JSON parameter configurationpnl_per_1k: Performance metric ($ PnL per $1k capital)trade_count: Total trades in backtestwin_rate: Percentage winning tradesprofit_factor: Gross profit / gross lossmax_drawdown: Largest peak-to-trough declinestatus: pending/running/completed/deployed
jobs table
job_file: Filename in queue directorypriority: 1 (high), 2 (medium), 3 (low)worker_id: Which worker is processingstatus: queued/running/completed/failed
🎯 Job Priorities
Priority 1 (HIGH): Known good strategies
- v9 refinements (flip_threshold, ma_gap, momentum_adx)
- Proven concepts with minor tweaks
Priority 2 (MEDIUM): New concepts
- Volume profile integration
- Order flow analysis
- Market structure detection
Priority 3 (LOW): Experimental
- ML-based indicators
- Neural network predictions
- Complex multi-timeframe logic
📈 Adding New Strategies
Example: Test volume profile indicator
from cluster.master import ClusterMaster
master = ClusterMaster()
# Add volume profile jobs
for profile_window in [20, 50, 100]:
for entry_threshold in [0.6, 0.7, 0.8]:
params = {
'profile_window': profile_window,
'entry_threshold': entry_threshold,
'stop_loss_atr': 3.0
}
master.queue.create_job(
'volume_profile',
params,
priority=2 # MEDIUM priority
)
🔒 Safety Features
- Resource Limits: Each worker respects 70% CPU cap
- Memory Management: 4GB per worker, prevents OOM
- Disk Monitoring: Auto-cleanup old results when space low
- Error Recovery: Failed jobs automatically requeued
- Manual Approval: Top strategies wait for user deployment
🏆 Auto-Deployment Gates
Strategy must pass ALL checks before auto-deployment:
- Trade Count: Minimum 700 trades (statistical significance)
- Win Rate: 63-68% realistic range
- Profit Factor: ≥1.5 (solid edge)
- Max Drawdown: <20% manageable risk
- Sharpe Ratio: ≥1.0 risk-adjusted returns
- Consistency: Top 3 in rolling 7-day window
📋 Operational Commands
View Queue Status
ls -lh cluster/queue/
Check Worker Health
ssh root@10.10.254.106 'pgrep -f backtester'
ssh root@10.10.254.106 'ssh root@10.20.254.100 "pgrep -f backtester"'
View Top 10 Strategies
sqlite3 cluster/strategies.db <<EOF
SELECT
name,
printf('$%.2f', pnl_per_1k) as pnl,
trade_count as trades,
printf('%.1f%%', win_rate) as wr,
printf('%.2f', profit_factor) as pf
FROM strategies
WHERE status = 'completed'
ORDER BY pnl_per_1k DESC
LIMIT 10;
EOF
Force Job Priority
# Make specific job high priority
sqlite3 cluster/strategies.db "UPDATE jobs SET priority = 1 WHERE job_file LIKE '%v9_flip0.7%'"
Restart Master (safe)
# Ctrl+C in master.py terminal
# Jobs remain in queue, workers continue
# Restart: python3 master.py
🔧 Troubleshooting
Workers not picking up jobs
# Check worker logs
ssh root@10.10.254.106 'tail -f /root/optimization-cluster/logs/worker.log'
Jobs stuck in "running"
# Reset stale jobs (>30 min)
sqlite3 cluster/strategies.db <<EOF
UPDATE jobs
SET status = 'queued', worker_id = NULL
WHERE status = 'running'
AND started_at < datetime('now', '-30 minutes');
EOF
Disk space low
# Archive old results
cd cluster/results
tar -czf archive_$(date +%Y%m%d).tar.gz archive/
mv archive_$(date +%Y%m%d).tar.gz ~/backups/
rm -rf archive/*
📈 Expected Performance
Current baseline (v9): $192 P&L per $1k capital
Cluster capacity:
- 64 cores total (44 cores @ 70% utilization)
- ~22 parallel backtests
- ~1.6s per backtest (v9 on EPYC)
- ~49,000 backtests per day
Optimization potential:
- Test 100,000+ parameter combinations per week
- Discover strategies beyond manual optimization
- Continuous adaptation to market regime changes
🎯 Roadmap
Phase 1 (Week 1): v9 refinement
- Exhaustive parameter sweep
- Find optimal flip_threshold, ma_gap, momentum_adx
- Target: >$200/1k P&L
Phase 2 (Week 2-3): Volume integration
- Volume profile entries
- Order flow imbalance detection
- Target: >$250/1k P&L
Phase 3 (Week 4+): Advanced concepts
- Multi-timeframe confirmation
- Market structure analysis
- ML-based signal quality scoring
- Target: >$300/1k P&L
📞 Contact
Questions? Check copilot-instructions.md or ask in main project chat.