trading_bot_v4/cluster/README.md

# Continuous Optimization Cluster

24/7 automated strategy optimization across 2 EPYC servers (64 cores total).

## 🏗️ Architecture

```
Master (your local machine)
  ↓ Job Queue (file-based)
  ↓
Worker 1: pve-nu-monitor01 (22 workers @ 70% CPU)
Worker 2: srv-bd-host01 (22 workers @ 70% CPU)
  ↓
Results Database (SQLite)
  ↓
Top Strategies (auto-deployment ready)
```

## 🚀 Quick Start

### 1. Setup Cluster

```bash
cd /home/icke/traderv4/cluster
chmod +x setup_cluster.sh
./setup_cluster.sh
```

This will:
- Create `/root/optimization-cluster` on both EPYC servers
- Install Python dependencies (pandas, numpy)
- Copy backtester code and OHLCV data
- Install worker scripts

### 2. Start Master Controller

```bash
python3 master.py
```

Master will:
- Generate initial job queue (v9 parameter sweep: 27 combinations)
- Monitor both workers every 60 seconds
- Assign jobs to idle workers
- Collect and rank results
- Display top performers

### 3. Monitor Progress

**Terminal 1 - Master logs:**
```bash
cd /home/icke/traderv4/cluster
python3 master.py
```

**Terminal 2 - Job queue:**
```bash
watch -n 5 'ls -1 cluster/queue/*.json 2>/dev/null | wc -l'
```

**Terminal 3 - Results:**
```bash
watch -n 10 'sqlite3 cluster/strategies.db "SELECT name, pnl_per_1k, trade_count, win_rate FROM strategies ORDER BY pnl_per_1k DESC LIMIT 5"'
```

## 📊 Database Schema

### strategies table
- `name`: Strategy identifier (e.g., "v9_flip0.6_ma0.35_adx23")
- `indicator_type`: Indicator family (v9_moneyline, volume_profile, etc.)
- `params`: JSON parameter configuration
- `pnl_per_1k`: Performance metric ($ PnL per $1k capital)
- `trade_count`: Total trades in backtest
- `win_rate`: Percentage winning trades
- `profit_factor`: Gross profit / gross loss
- `max_drawdown`: Largest peak-to-trough decline
- `status`: pending/running/completed/deployed

### jobs table
- `job_file`: Filename in queue directory
- `priority`: 1 (high), 2 (medium), 3 (low)
- `worker_id`: Which worker is processing
- `status`: queued/running/completed/failed

## 🎯 Job Priorities

**Priority 1 (HIGH):** Known good strategies
- v9 refinements (flip_threshold, ma_gap, momentum_adx)
- Proven concepts with minor tweaks

**Priority 2 (MEDIUM):** New concepts
- Volume profile integration
- Order flow analysis
- Market structure detection

**Priority 3 (LOW):** Experimental
- ML-based indicators
- Neural network predictions
- Complex multi-timeframe logic

## 📈 Adding New Strategies

### Example: Test volume profile indicator

```python
from cluster.master import ClusterMaster

master = ClusterMaster()

# Add volume profile jobs
for profile_window in [20, 50, 100]:
    for entry_threshold in [0.6, 0.7, 0.8]:
        params = {
            'profile_window': profile_window,
            'entry_threshold': entry_threshold,
            'stop_loss_atr': 3.0
        }

        master.queue.create_job(
            'volume_profile',
            params,
            priority=2  # MEDIUM priority
        )
```

## 🔒 Safety Features

1. **Resource Limits:** Each worker respects 70% CPU cap
2. **Memory Management:** 4GB per worker, prevents OOM
3. **Disk Monitoring:** Auto-cleanup old results when space low
4. **Error Recovery:** Failed jobs automatically requeued
5. **Manual Approval:** Top strategies wait for user deployment

## 🏆 Auto-Deployment Gates

Strategy must pass ALL checks before auto-deployment:

1. **Trade Count:** Minimum 700 trades (statistical significance)
2. **Win Rate:** 63-68% realistic range
3. **Profit Factor:** ≥1.5 (solid edge)
4. **Max Drawdown:** <20% manageable risk
5. **Sharpe Ratio:** ≥1.0 risk-adjusted returns
6. **Consistency:** Top 3 in rolling 7-day window

## 📋 Operational Commands

### View Queue Status
```bash
ls -lh cluster/queue/
```

### Check Worker Health
```bash
ssh root@10.10.254.106 'pgrep -f backtester'
ssh root@10.10.254.106 'ssh root@10.20.254.100 "pgrep -f backtester"'
```

### View Top 10 Strategies
```bash
sqlite3 cluster/strategies.db <<EOF
SELECT
    name,
    printf('$%.2f', pnl_per_1k) as pnl,
    trade_count as trades,
    printf('%.1f%%', win_rate) as wr,
    printf('%.2f', profit_factor) as pf
FROM strategies
WHERE status = 'completed'
ORDER BY pnl_per_1k DESC
LIMIT 10;
EOF
```

### Force Job Priority
```bash
# Make specific job high priority
sqlite3 cluster/strategies.db "UPDATE jobs SET priority = 1 WHERE job_file LIKE '%v9_flip0.7%'"
```

### Restart Master (safe)
```bash
# Ctrl+C in master.py terminal
# Jobs remain in queue, workers continue
# Restart: python3 master.py
```

## 🔧 Troubleshooting

### Workers not picking up jobs
```bash
# Check worker logs
ssh root@10.10.254.106 'tail -f /root/optimization-cluster/logs/worker.log'
```

### Jobs stuck in "running"
```bash
# Reset stale jobs (>30 min)
sqlite3 cluster/strategies.db <<EOF
UPDATE jobs
SET status = 'queued', worker_id = NULL
WHERE status = 'running'
  AND started_at < datetime('now', '-30 minutes');
EOF
```

### Disk space low
```bash
# Archive old results
cd cluster/results
tar -czf archive_$(date +%Y%m%d).tar.gz archive/
mv archive_$(date +%Y%m%d).tar.gz ~/backups/
rm -rf archive/*
```

## 📈 Expected Performance

**Current baseline (v9):** $192 P&L per $1k capital

**Cluster capacity:**
- 64 cores total (44 cores @ 70% utilization)
- ~22 parallel backtests
- ~1.6s per backtest (v9 on EPYC)
- **~49,000 backtests per day**

**Optimization potential:**
- Test 100,000+ parameter combinations per week
- Discover strategies beyond manual optimization
- Continuous adaptation to market regime changes

## 🎯 Roadmap

**Phase 1 (Week 1):** v9 refinement
- Exhaustive parameter sweep
- Find optimal flip_threshold, ma_gap, momentum_adx
- Target: >$200/1k P&L

**Phase 2 (Week 2-3):** Volume integration
- Volume profile entries
- Order flow imbalance detection
- Target: >$250/1k P&L

**Phase 3 (Week 4+):** Advanced concepts
- Multi-timeframe confirmation
- Market structure analysis
- ML-based signal quality scoring
- Target: >$300/1k P&L

## 📞 Contact

Questions? Check copilot-instructions.md or ask in main project chat.