docs: Add comprehensive v11 test sweep documentation and deployment script

Co-authored-by: mindesbunister <32161838+mindesbunister@users.noreply.github.com>
2025-12-06 19:18:37 +00:00
parent 4599afafaa
commit 73887ac4f3
2 changed files with 352 additions and 0 deletions
--- a/cluster/V11_TEST_SWEEP_README.md
+++ b/cluster/V11_TEST_SWEEP_README.md
@@ -0,0 +1,286 @@
 # V11 Test Parameter Sweep
 Fast validation sweep for v11 "Money Line All Filters" indicator before full 65,536-combination optimization.
 ## Overview
 **Purpose:** Verify v11 indicator produces valid backtest data with varied P&L before committing to 30-35 hour full sweep.
 **Test Size:** 256 combinations (2^8 parameters)  
 **Expected Runtime:** 6-25 minutes  
 **Workers:** 2 × 27 cores (85% CPU limit)  
 **Output:** CSV files + SQLite database
 ## Critical v11 Fix
 **v9 Bug:** Filters were calculated but NOT applied to signals (broken logic)  
 **v11 Fix:** ALL filters must pass for signal generation (lines 271-272 from pinescript)
 ```python
 # v11: Filters actually applied
 if adx_ok and volume_ok and rsi_ok and pos_ok and entry_buffer_ok:
    signals.append(...)  # Signal only fires when ALL conditions met
 ```
 ## Test Parameter Grid
 8 parameters × 2 values each = 256 combinations:
 | Parameter | Values | Purpose |
 |-----------|--------|---------|
 | `flip_threshold` | 0.5, 0.6 | % price must move to flip trend |
 | `adx_min` | 18, 21 | Minimum ADX for trend strength |
 | `long_pos_max` | 75, 80 | Max price position for longs (%) |
 | `short_pos_min` | 20, 25 | Min price position for shorts (%) |
 | `vol_min` | 0.8, 1.0 | Minimum volume ratio |
 | `entry_buffer_atr` | 0.15, 0.20 | ATR buffer beyond Money Line |
 | `rsi_long_min` | 35, 40 | RSI minimum for longs |
 | `rsi_short_max` | 65, 70 | RSI maximum for shorts |
 ## Worker Configuration
 ### Worker 1 (pve-nu-monitor01)
 - **Host:** root@10.10.254.106
 - **Cores:** 27 (85% of 32 threads)
 - **Availability:** 24/7 no restrictions
 - **Workspace:** /home/comprehensive_sweep
 ### Worker 2 (pve-srvmon01)
 - **Host:** root@10.20.254.100 (via Worker 1 SSH hop)
 - **Cores:** 27 (85% of 32 threads)
 - **Availability:** 
  - **Weekdays (Mon-Fri):** 6 PM - 8 AM only (nights)
  - **Weekends (Sat-Sun):** 24/7 at 85%
  - **Office hours (Mon-Fri 8am-6pm):** DISABLED
 - **Workspace:** /home/backtest_dual/backtest
 ### Expected Performance
 **Test sweep (256 combos):**
 - Worker 1 only (weekday daytime): ~25 minutes
 - Both workers (nights/weekends): ~12-15 minutes
 **Full sweep (65,536 combos) - after test passes:**
 - Optimal start: Friday 6 PM
 - Completion: ~30-35 hours (by Tuesday morning)
 ## Usage
 ### Step 1: Deploy to Cluster
 ```bash
 # On local machine
 cd /home/icke/traderv4
 rsync -avz --exclude 'node_modules' --exclude '.next' cluster/ root@10.10.254.106:/home/comprehensive_sweep/
 rsync -avz backtester/ root@10.10.254.106:/home/comprehensive_sweep/backtester/
 ```
 ### Step 2: Launch Test Sweep
 ```bash
 # SSH to Worker 1
 ssh root@10.10.254.106
 # Navigate to workspace
 cd /home/comprehensive_sweep
 # Launch coordinator
 bash run_v11_test_sweep.sh
 ```
 ### Step 3: Monitor Progress
 ```bash
 # Watch coordinator logs
 tail -f coordinator_v11_test.log
 # Check database status
 sqlite3 exploration.db "SELECT id, status, assigned_worker FROM v11_test_chunks"
 # Check completion
 sqlite3 exploration.db "SELECT COUNT(*) FROM v11_test_strategies"
 # Expected: 256
 # View top results
 sqlite3 exploration.db "SELECT params, pnl, total_trades FROM v11_test_strategies ORDER BY pnl DESC LIMIT 10"
 ```
 ## Output Files
 ### CSV Results
 Location: `cluster/v11_test_results/`
 Files:
 - `v11_test_chunk_0000_results.csv` (128 combinations)
 - `v11_test_chunk_0001_results.csv` (128 combinations)
 Format:
 ```csv
 flip_threshold,adx_min,long_pos_max,short_pos_min,vol_min,entry_buffer_atr,rsi_long_min,rsi_short_max,pnl,win_rate,profit_factor,max_drawdown,total_trades
 0.5,18,75,20,0.8,0.15,35,65,245.32,58.3,1.245,125.40,48
 ...
 ```
 ### Database Tables
 **v11_test_chunks:**
 ```sql
 CREATE TABLE v11_test_chunks (
    id TEXT PRIMARY KEY,
    start_combo INTEGER,
    end_combo INTEGER,
    total_combos INTEGER,
    status TEXT,
    assigned_worker TEXT,
    started_at INTEGER,
    completed_at INTEGER
 );
 ```
 **v11_test_strategies:**
 ```sql
 CREATE TABLE v11_test_strategies (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    chunk_id TEXT,
    params TEXT,
    pnl REAL,
    win_rate REAL,
    profit_factor REAL,
    max_drawdown REAL,
    total_trades INTEGER,
    FOREIGN KEY (chunk_id) REFERENCES v11_test_chunks(id)
 );
 ```
 ## Verification Steps
 After sweep completes (~6-25 minutes):
 ```bash
 # 1. Check output files exist
 ls -lh cluster/v11_test_results/
 # Expected: 2 CSV files
 # 2. Verify database has all strategies
 sqlite3 cluster/exploration.db "SELECT COUNT(*) FROM v11_test_strategies"
 # Expected: 256
 # 3. Check for varied PnL (NOT all zeros like v9 bug)
 head -10 cluster/v11_test_results/v11_test_chunk_0000_results.csv
 # Should show different PnL values
 # 4. View top 5 results
 sqlite3 cluster/exploration.db "SELECT params, pnl, total_trades FROM v11_test_strategies ORDER BY pnl DESC LIMIT 5"
 # Should show PnL > $0 and trades > 0
 # 5. Check coordinator logs
 tail -100 cluster/coordinator_v11_test.log
 # Should show "V11 TEST SWEEP COMPLETE!"
 ```
 ## Success Criteria
 ✅ Completes in <30 minutes  
 ✅ CSV files have 256 rows total  
 ✅ PnL values are varied (not all zeros)  
 ✅ Database has 256 strategies  
 ✅ Top result shows PnL > $0 and trades > 0  
 ✅ Worker 2 respected office hours (if applicable)
 ## Telegram Notifications
 Bot sends 3 notifications:
 1. **Start:** When coordinator launches
   - Shows available workers
   - Worker 2 status (active or office hours)
 2. **Completion:** When all chunks finish
   - Duration in minutes
   - Total strategies tested
   - Links to results
 3. **Failure:** If coordinator crashes
   - Premature stop notification
 ## Troubleshooting
 ### Worker 2 Not Starting (Weekday Daytime)
 **Expected:** Worker 2 is disabled Mon-Fri 8am-6pm for office hours  
 **Action:** Wait until 6 PM or start on weekend for full 2-worker speed
 ### No Signals Generated (All Zero PnL)
 **Symptom:** All PnL values are 0.0  
 **Cause:** Filters too strict (blocks all signals)  
 **Action:** This is the validation - if v11 produces zeros like v9, don't run full sweep
 ### SSH Timeout on Worker 2
 **Symptom:** Worker 2 fails to deploy  
 **Cause:** SSH hop connection issue  
 **Action:** 
 ```bash
 # Test connection manually
 ssh -o ProxyJump=root@10.10.254.106 root@10.20.254.100 'hostname'
 ```
 ### Database Locked
 **Symptom:** SQLite error "database is locked"  
 **Cause:** Coordinator still running  
 **Action:**
 ```bash
 # Find coordinator PID
 ps aux | grep v11_test_coordinator
 # Kill gracefully
 kill <PID>
 ```
 ## Next Steps After Test Passes
 1. **User verifies data quality:**
   - PnL values varied (not all zeros)
   - Top results show positive P&L
   - Trade counts > 0
 2. **If test PASSES:**
   - Create full 65,536-combo sweep coordinator
   - 4096 values per parameter (comprehensive grid)
   - Start Friday 6 PM for optimal weekend utilization
   - Complete by Tuesday morning (~30-35 hours)
 3. **If test FAILS (all zeros):**
   - v11 filters may still be broken
   - Debug indicator logic
   - Compare with pinescript lines 271-272
   - Don't run full sweep until fixed
 ## Architecture
 ```
 run_v11_test_sweep.sh
    ├── Initializes database (2 chunks)
    └── Launches v11_test_coordinator.py
            ├── Worker 1 (always available)
            │   └── v11_test_worker.py (27 cores)
            │       └── backtester/v11_moneyline_all_filters.py
            └── Worker 2 (office hours aware)
                └── v11_test_worker.py (27 cores)
                    └── backtester/v11_moneyline_all_filters.py
 ```
 ## Files
 | File | Purpose | Lines |
 |------|---------|-------|
 | `run_v11_test_sweep.sh` | Launch script | 52 |
 | `v11_test_coordinator.py` | Orchestrates sweep | 384 |
 | `v11_test_worker.py` | Processes chunks | 296 |
 | `backtester/v11_moneyline_all_filters.py` | Indicator logic | 335 |
 ## References
 - **Pinescript:** `workflows/trading/moneyline_v11_all_filters.pinescript`
 - **v9 Bug:** Filters calculated but not applied (lines 271-272 broken)
 - **v9 Coordinator:** `cluster/v9_advanced_coordinator.py` (reference pattern)
 - **Math Utils:** `backtester/math_utils.py` (ATR, ADX, RSI)
 - **Simulator:** `backtester/simulator.py` (backtest engine)
--- a/cluster/deploy_v11_test.sh
+++ b/cluster/deploy_v11_test.sh
@@ -0,0 +1,66 @@
 #!/bin/bash
 # V11 Test Sweep - Quick Deployment Script
 # Syncs files to EPYC cluster and verifies setup
 set -e
 echo "================================================================"
 echo "V11 TEST SWEEP - DEPLOYMENT TO EPYC CLUSTER"
 echo "================================================================"
 echo ""
 # Configuration
 WORKER1_HOST="root@10.10.254.106"
 WORKER1_WORKSPACE="/home/comprehensive_sweep"
 LOCAL_CLUSTER="cluster"
 LOCAL_BACKTESTER="backtester"
 echo "📦 Step 1: Sync cluster scripts to Worker 1..."
 rsync -avz --progress \
  --exclude '.venv' \
  --exclude '__pycache__' \
  --exclude '*.pyc' \
  --exclude 'exploration.db' \
  --exclude '*.log' \
  --exclude '*_results' \
  ${LOCAL_CLUSTER}/v11_test_coordinator.py \
  ${LOCAL_CLUSTER}/v11_test_worker.py \
  ${LOCAL_CLUSTER}/run_v11_test_sweep.sh \
  ${LOCAL_CLUSTER}/V11_TEST_SWEEP_README.md \
  ${WORKER1_HOST}:${WORKER1_WORKSPACE}/
 echo ""
 echo "📦 Step 2: Sync v11 indicator to Worker 1..."
 rsync -avz --progress \
  --exclude '__pycache__' \
  --exclude '*.pyc' \
  ${LOCAL_BACKTESTER}/v11_moneyline_all_filters.py \
  ${WORKER1_HOST}:${WORKER1_WORKSPACE}/backtester/
 echo ""
 echo "📦 Step 3: Verify math_utils exists on Worker 1..."
 ssh ${WORKER1_HOST} "test -f ${WORKER1_WORKSPACE}/backtester/math_utils.py && echo '✓ math_utils.py found' || echo '✗ math_utils.py missing!'"
 echo ""
 echo "📦 Step 4: Verify data file exists on Worker 1..."
 ssh ${WORKER1_HOST} "test -f ${WORKER1_WORKSPACE}/data/solusdt_5m.csv && echo '✓ data/solusdt_5m.csv found' || echo '✗ data/solusdt_5m.csv missing!'"
 echo ""
 echo "📦 Step 5: Make scripts executable on Worker 1..."
 ssh ${WORKER1_HOST} "chmod +x ${WORKER1_WORKSPACE}/run_v11_test_sweep.sh ${WORKER1_WORKSPACE}/v11_test_coordinator.py ${WORKER1_WORKSPACE}/v11_test_worker.py"
 echo ""
 echo "================================================================"
 echo "✅ DEPLOYMENT COMPLETE"
 echo "================================================================"
 echo ""
 echo "To start test sweep, run:"
 echo "  ssh ${WORKER1_HOST}"
 echo "  cd ${WORKER1_WORKSPACE}"
 echo "  bash run_v11_test_sweep.sh"
 echo ""
 echo "To monitor progress:"
 echo "  tail -f ${WORKER1_WORKSPACE}/coordinator_v11_test.log"
 echo ""
 echo "Expected runtime: 6-25 minutes"
 echo "================================================================"