Files
trading_bot_v4/cluster
mindesbunister 3fc161a695 fix: Enable parallel worker deployment with subprocess.Popen + deploy to workspace root
CRITICAL FIX - Parallel Execution Now Working:
- Problem: coordinator blocked on subprocess.run(ssh_cmd) preventing worker2 deployment
- Root cause #1: subprocess.run() waits for SSH FDs even with 'nohup &' and '-f' flag
- Root cause #2: Indicator deployed to backtester/ subdirectory instead of workspace root
- Solution #1: Replace subprocess.run() with subprocess.Popen() + communicate(timeout=2)
- Solution #2: Deploy v11_moneyline_all_filters.py to workspace root for direct import
- Result: Both workers start simultaneously (worker1 chunk 0, worker2 chunk 1)
- Impact: 2× speedup achieved (15 min vs 30 min sequential)

Verification:
- Worker1: 31 processes, generating 1,125+ signals per config ✓
- Worker2: 29 processes, generating 848-898 signals per config ✓
- Coordinator: Both chunks active, parallel deployment in 12 seconds ✓

User concern addressed: 'if we are not using them in parallel how are we supposed
to gain a time advantage?' - Now using them in parallel, gaining 2× advantage.

Files modified:
- cluster/v11_test_coordinator.py (lines 287-301: Popen + timeout, lines 238-255: workspace root)
2025-12-06 23:17:45 +01:00
..

Distributed Continuous Optimization Cluster

24/7 automated strategy discovery across 2 EPYC servers (64 cores total). Explores entire indicator/parameter space to find the absolute best trading approach.

🏗️ Architecture

Three-Component Distributed System:

  1. Coordinator (distributed_coordinator.py) - Master orchestrator running on srvdocker02

    • Defines parameter grid (14 dimensions, ~500k combinations)
    • Splits work into chunks (e.g., 10,000 combos per chunk)
    • Deploys worker script to EPYC servers via SSH/SCP
    • Assigns chunks to idle workers dynamically
    • Collects CSV results and imports to SQLite database
    • Tracks progress (completed/running/pending chunks)
  2. Worker (distributed_worker.py) - Runs on EPYC servers

    • Integrates with existing /home/comprehensive_sweep/backtester/ infrastructure
    • Uses proven simulator.py vectorized engine and MoneyLineInputs class
    • Loads chunk spec (start_idx, end_idx from total parameter grid)
    • Generates parameter combinations via itertools.product()
    • Runs multiprocessing sweep with mp.cpu_count() workers
    • Saves results to CSV (same format as comprehensive_sweep.py)
  3. Monitor (exploration_status.py) - Real-time status dashboard

    • SSH worker health checks (active distributed_worker.py processes)
    • Chunk progress tracking (total/completed/running/pending)
    • Top 10 strategies leaderboard (P&L, trades, WR, PF, DD)
    • Best configuration details (full parameters)
    • Watch mode for continuous monitoring (30s refresh)

Infrastructure:

  • Worker 1: pve-nu-monitor01 (10.10.254.106) - EPYC 7282 32 threads, 62GB RAM
  • Worker 2: pve-srvmon01 (10.20.254.100 via worker1 2-hop SSH) - EPYC 7302 32 threads, 31GB RAM
  • Combined: 64 cores, ~108,000 backtests/day capacity (proven: 65,536 in 29h)
  • Existing Backtester: /home/comprehensive_sweep/backtester/ with simulator.py, indicators/, data/
  • Data: solusdt_5m.csv - Binance 5-minute OHLCV (Nov 2024 - Nov 2025)
  • Database: exploration.db SQLite with strategies/chunks/phases tables

🚀 Quick Start

Verify system works before large-scale deployment:

cd /home/icke/traderv4/cluster

# Modify distributed_coordinator.py temporarily (lines 120-135)
# Reduce parameter ranges to 2-3 values per dimension
# Total: ~500-1000 combinations for testing

# Run test
python3 distributed_coordinator.py --chunk-size 100

# Monitor in separate terminal
python3 exploration_status.py --watch

Expected: 5-10 chunks complete in 30-60 minutes, all results in exploration.db

Verify:

  • SSH commands execute successfully
  • Worker script deploys to /home/comprehensive_sweep/backtester/scripts/
  • CSV results appear in cluster/distributed_results/
  • Database populated with strategies (check with sqlite3 exploration.db "SELECT COUNT(*) FROM strategies")
  • Monitoring dashboard shows accurate worker/chunk status

2. Run Full v9 Parameter Sweep

After test succeeds, explore full parameter space:

cd /home/icke/traderv4/cluster

# Restore full parameter ranges in distributed_coordinator.py
# Total: ~500,000 combinations (4^8 * 3^3 * 1 ≈ 500k)

# Start exploration (runs in background)
nohup python3 distributed_coordinator.py --chunk-size 10000 > sweep.log 2>&1 &

# Monitor progress
python3 exploration_status.py --watch
# OR
watch -n 60 'python3 exploration_status.py'

# Check logs
tail -f sweep.log

Expected Results:

  • Duration: ~3.5 hours with 64 cores
  • Find 5-10 configurations with P&L > $250/1k (baseline: $192/1k)
  • Quality filters: 700+ trades, 50-70% WR, PF ≥ 1.2

3. Query Top Strategies

# Top 20 performers
sqlite3 cluster/exploration.db <<EOF
SELECT 
    params_json,
    printf('$%.2f', pnl_per_1k) as pnl,
    trades,
    printf('%.1f%%', win_rate * 100) as wr,
    printf('%.2f', profit_factor) as pf,
    printf('%.1f%%', max_drawdown * 100) as dd,
    DATE(tested_at) as tested
FROM strategies
WHERE trades >= 700 
  AND win_rate >= 0.50 
  AND win_rate <= 0.70
  AND profit_factor >= 1.2
ORDER BY pnl_per_1k DESC
LIMIT 20;
EOF

📊 Parameter Space (14 Dimensions)

v9 Money Line Configuration:

ParameterGrid(
    flip_thresholds=[0.4, 0.5, 0.6, 0.7],           # EMA flip confirmation (4 values)
    ma_gaps=[0.20, 0.30, 0.40, 0.50],                # MA50-MA200 convergence bonus (4 values)
    adx_mins=[18, 21, 24, 27],                       # ADX requirement for momentum filter (4 values)
    long_pos_maxs=[60, 65, 70, 75],                  # Price position for LONG momentum (4 values)
    short_pos_mins=[20, 25, 30, 35],                 # Price position for SHORT momentum (4 values)
    cooldowns=[1, 2, 3, 4],                          # Bars between signals (4 values)
    position_sizes=[1.0],                            # Full position (1 value fixed)
    tp1_multipliers=[1.5, 2.0, 2.5],                 # TP1 as ATR multiple (3 values)
    tp2_multipliers=[3.0, 4.0, 5.0],                 # TP2 as ATR multiple (3 values)
    sl_multipliers=[2.0, 3.0, 4.0],                  # SL as ATR multiple (3 values)
    tp1_close_percents=[0.5, 0.6, 0.7, 0.75],       # TP1 close % (4 values)
    trailing_multipliers=[1.0, 1.5, 2.0],            # Trailing stop multiplier (3 values)
    vol_mins=[0.8, 1.0, 1.2],                        # Minimum volume ratio (3 values)
    max_bars_list=[100, 150, 200]                    # Max bars in position (3 values)
)

# Total: 4×4×4×4×4×4×1×3×3×3×4×3×3×3 ≈ 497,664 combinations

🎯 Quality Filters

Applied to all strategy results:

  • Minimum trades: 700+ (statistical significance)
  • Win rate range: 50-70% (realistic, avoids overfitting)
  • Profit factor: ≥ 1.2 (solid edge)
  • Max drawdown: Tracked but no hard limit (informational)

Why these filters:

  • Trade count validates statistical robustness
  • WR range prevents curve-fitting (>70% = overfit, <50% = coin flip)
  • PF threshold ensures strategy has actual edge

📈 Expected Results

Current Baseline (v9 default parameters):

  • P&L: $192 per $1k capital
  • Trades: ~700
  • Win Rate: ~61%
  • Profit Factor: ~1.4

Optimization Goals:

  • Target: >$250/1k P&L (30% improvement)
  • Stretch: >$300/1k P&L (56% improvement)
  • Expected: Find 5-10 configurations meeting quality filters with P&L > $250/1k

Why achievable:

  • 500k combinations vs 27 tested in narrow sweep
  • Full parameter space exploration vs limited grid
  • Proven infrastructure (65,536 backtests completed successfully)

🔄 Continuous Exploration Roadmap

Phase 1: v9 Money Line Parameter Optimization (~500k combos, 3.5h)

  • Status: READY TO RUN
  • Goal: Find optimal flip_threshold, ma_gap, momentum filters
  • Expected: >$250/1k P&L

Phase 2: RSI Divergence Integration (~100k combos, 45min)

  • Add RSI divergence detection
  • Combine with v9 momentum filter
  • Parameters: RSI lookback, divergence strength threshold
  • Goal: Catch trend reversals early

Phase 3: Volume Profile Analysis (~200k combos, 1.5h)

  • Volume profile zones (POC, VAH, VAL)
  • Order flow imbalance detection
  • Parameters: Profile window, entry threshold, confirmation bars
  • Goal: Better entry timing

Phase 4: Multi-Timeframe Confirmation (~150k combos, 1h)

  • 5min + 15min + 1H alignment
  • Higher timeframe trend filter
  • Parameters: Timeframes to use, alignment strictness
  • Goal: Reduce false signals

Phase 5: Hybrid Indicators (~50k combos, 30min)

  • Combine best performers from Phase 1-4
  • Test cross-strategy synergy
  • Goal: Break $300/1k barrier

Phase 6: ML-Based Optimization (~100k+ combos, 1h+)

  • Feature engineering from top strategies
  • Gradient boosting / random forest
  • Genetic algorithm parameter tuning
  • Goal: Discover non-obvious patterns

📁 File Structure

cluster/
├── distributed_coordinator.py    # Master orchestrator (650 lines)
├── distributed_worker.py         # Worker script (350 lines)
├── exploration_status.py         # Monitoring dashboard (200 lines)
├── exploration.db                # SQLite results database
├── distributed_results/          # CSV results from workers
│   ├── worker1_chunk_0.csv
│   ├── worker1_chunk_1.csv
│   └── worker2_chunk_0.csv
└── README.md                     # This file

/home/comprehensive_sweep/backtester/  (on EPYC servers)
├── simulator.py                  # Core vectorized engine
├── indicators/
│   ├── money_line.py             # MoneyLineInputs class
│   └── ...
├── data/
│   └── solusdt_5m.csv            # Binance 5-minute OHLCV
├── scripts/
│   ├── comprehensive_sweep.py    # Original multiprocessing sweep
│   └── distributed_worker.py     # Deployed by coordinator
└── .venv/                        # Python 3.11.2, pandas, numpy

💾 Database Schema

strategies table

CREATE TABLE strategies (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    phase_id INTEGER,                -- Which exploration phase (1=v9, 2=RSI, etc.)
    params_json TEXT NOT NULL,       -- JSON parameter configuration
    pnl_per_1k REAL,                 -- Performance metric ($ PnL per $1k)
    trades INTEGER,                  -- Total trades in backtest
    win_rate REAL,                   -- Decimal win rate (0.61 = 61%)
    profit_factor REAL,              -- Gross profit / gross loss
    max_drawdown REAL,               -- Largest peak-to-trough decline (decimal)
    tested_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (phase_id) REFERENCES phases(id)
);
CREATE INDEX idx_strategies_pnl ON strategies(pnl_per_1k DESC);
CREATE INDEX idx_strategies_trades ON strategies(trades);

chunks table

CREATE TABLE chunks (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    phase_id INTEGER,
    worker_id TEXT,                  -- 'worker1' or 'worker2'
    start_idx INTEGER,               -- Start index in parameter grid
    end_idx INTEGER,                 -- End index (exclusive)
    total_combos INTEGER,            -- Total in this chunk
    status TEXT DEFAULT 'pending',   -- pending/running/completed/failed
    assigned_at TIMESTAMP,
    completed_at TIMESTAMP,
    result_file TEXT,                -- Path to CSV result file
    FOREIGN KEY (phase_id) REFERENCES phases(id)
);
CREATE INDEX idx_chunks_status ON chunks(status);

phases table

CREATE TABLE phases (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    name TEXT NOT NULL,              -- 'v9_optimization', 'rsi_divergence', etc.
    description TEXT,
    total_combinations INTEGER,      -- Total parameter combinations
    started_at TIMESTAMP,
    completed_at TIMESTAMP
);

🔧 Troubleshooting

SSH Connection Issues

Symptom: "Connection refused" or timeout errors

Solutions:

# Test Worker 1 connectivity
ssh root@10.10.254.106 'echo "Worker 1 OK"'

# Test Worker 2 (2-hop) connectivity
ssh root@10.10.254.106 'ssh root@10.20.254.100 "echo Worker 2 OK"'

# Check SSH keys
ssh-add -l

# Verify authorized_keys on workers
ssh root@10.10.254.106 'cat ~/.ssh/authorized_keys'

Path/Import Errors on Workers

Symptom: "ModuleNotFoundError" or "FileNotFoundError"

Solutions:

# Verify backtester exists on Worker 1
ssh root@10.10.254.106 'ls -lah /home/comprehensive_sweep/backtester/'

# Check Python environment
ssh root@10.10.254.106 'cd /home/comprehensive_sweep && source .venv/bin/activate && python --version'

# Verify data file
ssh root@10.10.254.106 'ls -lh /home/comprehensive_sweep/backtester/data/solusdt_5m.csv'

# Check distributed_worker.py deployment
ssh root@10.10.254.106 'ls -lh /home/comprehensive_sweep/backtester/scripts/distributed_worker.py'

Worker Processes Stuck/Hung

Symptom: exploration_status.py shows "running" but no progress

Solutions:

# Check worker processes
ssh root@10.10.254.106 'ps aux | grep distributed_worker'

# Check worker CPU usage (should be near 100% on 32 cores)
ssh root@10.10.254.106 'top -bn1 | head -20'

# Kill hung worker (coordinator will reassign chunk)
ssh root@10.10.254.106 'pkill -f distributed_worker.py'

# Check worker logs
ssh root@10.10.254.106 'tail -50 /home/comprehensive_sweep/backtester/scripts/worker_*.log'

Database Locked/Corrupt

Symptom: "database is locked" errors

Solutions:

# Check for stale locks
cd /home/icke/traderv4/cluster
fuser exploration.db

# Backup and rebuild
cp exploration.db exploration.db.backup
sqlite3 exploration.db "VACUUM;"

# Verify integrity
sqlite3 exploration.db "PRAGMA integrity_check;"

Results Not Importing

Symptom: CSVs in distributed_results/ but database empty

Solutions:

# Check CSV format
head -20 cluster/distributed_results/worker1_chunk_0.csv

# Manual import test
python3 -c "
import sqlite3
import pandas as pd

df = pd.read_csv('cluster/distributed_results/worker1_chunk_0.csv')
print(f'Loaded {len(df)} results')
print(df.columns.tolist())
print(df.head())
"

# Check coordinator logs for import errors
grep -i "error\|exception" sweep.log | tail -20

Performance Tuning

Chunk Size Trade-offs

Small chunks (1,000-5,000):

  • Better load balancing
  • Faster feedback loop
  • More SSH/SCP overhead
  • More database writes

Large chunks (10,000-20,000):

  • Less overhead
  • Fewer database transactions
  • Less granular progress tracking
  • Wasted work if chunk fails

Recommended: 10,000 combos per chunk (good balance)

Worker Concurrency

Current: Uses mp.cpu_count() (32 workers per EPYC)

To reduce CPU load:

# In distributed_worker.py line ~280
# Change from:
workers = mp.cpu_count()
# To:
workers = int(mp.cpu_count() * 0.7)  # 70% utilization (22 workers)

Database Optimization

For large result sets (>100k strategies):

# Add indexes if queries slow
sqlite3 cluster/exploration.db <<EOF
CREATE INDEX IF NOT EXISTS idx_strategies_phase ON strategies(phase_id);
CREATE INDEX IF NOT EXISTS idx_strategies_wr ON strategies(win_rate);
CREATE INDEX IF NOT EXISTS idx_strategies_pf ON strategies(profit_factor);
ANALYZE;
EOF

Best Practices

  1. Always test with small chunk first (100-1000 combos) before full sweep
  2. Monitor regularly with exploration_status.py --watch during runs
  3. Backup database before major changes: cp exploration.db exploration.db.backup
  4. Review top strategies after each phase completion
  5. Archive old results if disk space low (CSV files can be deleted after import)
  6. Validate quality filters - adjust if too strict/lenient based on results
  7. Check worker logs if progress stalls: ssh root@10.10.254.106 'tail -f /home/comprehensive_sweep/backtester/scripts/worker_*.log'

🔗 Integration with Production Bot

After finding top strategy:

  1. Extract parameters from database:
sqlite3 cluster/exploration.db <<EOF
SELECT params_json FROM strategies 
WHERE id = (SELECT id FROM strategies ORDER BY pnl_per_1k DESC LIMIT 1);
EOF
  1. Update TradingView indicator (workflows/trading/moneyline_v9_ma_gap.pinescript):

    • Set flip_threshold, ma_gap, momentum_adx, etc. to optimal values
    • Test in replay mode with historical data
  2. Update bot configuration (.env file):

    • Adjust MIN_SIGNAL_QUALITY_SCORE if needed
    • Update position sizing if strategy has different risk profile
  3. Forward test (50-100 trades) before increasing capital:

    • Use SOLANA_POSITION_SIZE=10 (10% of capital)
    • Monitor win rate, P&L, drawdown
    • If metrics match backtest ± 10%, increase to full size

📚 Support & Documentation

  • Main project docs: /home/icke/traderv4/.github/copilot-instructions.md (5,181 lines)
  • Trading goals: TRADING_GOALS.md (8-phase $106→$100k+ roadmap)
  • v9 indicator: INDICATOR_V9_MA_GAP_ROADMAP.md
  • Optimization roadmaps: SIGNAL_QUALITY_OPTIMIZATION_ROADMAP.md, POSITION_SCALING_ROADMAP.md
  • Adaptive leverage: ADAPTIVE_LEVERAGE_SYSTEM.md

🚀 Future Enhancements

Potential additions:

  1. Genetic Algorithm Optimization - Breed top performers, test offspring
  2. Bayesian Optimization - Guide search toward promising parameter regions
  3. Web Dashboard - Real-time browser-based monitoring (Flask/FastAPI)
  4. Telegram Alerts - Notify when exceptional strategies found (P&L > threshold)
  5. Walk-Forward Analysis - Test strategies on rolling time windows
  6. Multi-Asset Support - Extend to ETH, BTC, other Drift markets
  7. Auto-Deployment - Push top strategies to production after validation

Questions? Check main project documentation or ask in development chat.

Ready to start? Run test sweep first: python3 cluster/distributed_coordinator.py --chunk-size 100