diff --git a/cluster/WORKER2_TIME_RESTRICTION.md b/cluster/WORKER2_TIME_RESTRICTION.md new file mode 100644 index 0000000..3714a6d --- /dev/null +++ b/cluster/WORKER2_TIME_RESTRICTION.md @@ -0,0 +1,271 @@ +# Worker2 Time Restriction - Noise Constraint Management + +**Date:** December 4, 2025 +**Issue:** Node 2 (bd-host01) generates excessive noise during office hours +**Solution:** Time-restricted scheduling (19:00 - 06:00 only) + +--- + +## Problem + +Worker2 (bd-host01 / 10.20.254.100) is an EPYC 16-core server that generates significant noise when running parameter sweeps at full load. This is disruptive during office hours (06:00 - 19:00). + +--- + +## Solution Implemented + +### Time-Based Worker Scheduling + +**Configuration in `v9_advanced_coordinator.py`:** + +```python +WORKERS = { + 'worker1': { + 'host': 'root@10.10.254.106', + 'workspace': '/home/comprehensive_sweep', + # No time restriction - runs 24/7 + }, + 'worker2': { + 'host': 'root@10.20.254.100', + 'workspace': '/home/backtest_dual/backtest', + 'ssh_hop': 'root@10.10.254.106', + 'time_restricted': True, # Enable time-based control + 'allowed_start_hour': 19, # 7 PM + 'allowed_end_hour': 6, # 6 AM + } +} +``` + +### Logic Implementation + +```python +def is_worker_allowed_to_run(worker_name: str) -> bool: + """Check if worker is allowed to run based on time restrictions""" + worker = WORKERS[worker_name] + + # If no time restriction, always allowed + if not worker.get('time_restricted', False): + return True + + # Check current hour (local time) + current_hour = datetime.now().hour + start_hour = worker['allowed_start_hour'] + end_hour = worker['allowed_end_hour'] + + # Handle time range that crosses midnight (e.g., 19:00 - 06:00) + if start_hour > end_hour: + allowed = current_hour >= start_hour or current_hour < end_hour + else: + allowed = start_hour <= current_hour < end_hour + + return allowed +``` + +### Coordinator Integration + +The coordinator now checks time restrictions before assigning work: + +```python +# Assign work to idle workers +for worker_name in WORKERS.keys(): + # Check if worker is allowed to run (time restrictions) + if not is_worker_allowed_to_run(worker_name): + if iteration % 10 == 0: # Log every 10 iterations to avoid spam + print(f"⏰ {worker_name} not allowed (office hours, noise restriction)") + continue + + # ... continue with worker assignment ... +``` + +--- + +## Operating Hours + +| Worker | Hours | Status | Reason | +|--------|-------|--------|--------| +| **Worker1** | 24/7 | Always active | No noise constraint | +| **Worker2** | 19:00 - 06:00 | Time-restricted | Noise during office hours | + +**Worker2 Schedule:** +- **ACTIVE:** 7:00 PM - 6:00 AM (11 hours/day) +- **IDLE:** 6:00 AM - 7:00 PM (13 hours/day) + +--- + +## Impact on Sweep Performance + +### Before Time Restriction +- **Worker1:** 32 cores, 24/7 = 768 core-hours/day +- **Worker2:** 32 cores, 24/7 = 768 core-hours/day +- **Total:** 1,536 core-hours/day + +### After Time Restriction +- **Worker1:** 32 cores, 24/7 = 768 core-hours/day +- **Worker2:** 32 cores, 11h/day = 352 core-hours/day +- **Total:** 1,120 core-hours/day + +**Performance Impact:** ~27% reduction in daily throughput (worker2 contributes 45.8% less) + +### Sweep Progress Impact +- **Chunks completed:** 63 / 1,693 (3.7%) +- **Chunks pending:** 1,629 +- **Estimated completion time:** + - Old: ~40 days (both workers 24/7) + - New: ~54 days (worker2 time-restricted) + - **Delta:** +14 days + +**Acceptable trade-off:** Quiet office hours > slightly longer sweep time + +--- + +## Verification + +### Test Current Time Restriction (Dec 4, 14:11) +```bash +cd /home/icke/traderv4/cluster +python3 -c " +from datetime import datetime +current_hour = datetime.now().hour +allowed = current_hour >= 19 or current_hour < 6 +print(f'Current hour: {current_hour}') +print(f'Worker2 allowed: {allowed}') +" +``` + +**Output:** +``` +Current hour: 14 +Worker2 allowed: False ✅ Correct (office hours) +``` + +### Monitor Coordinator Logs +```bash +cd /home/icke/traderv4/cluster +tail -f v9_advanced_coordinator.log | grep "⏰" +``` + +**Expected output during office hours:** +``` +⏰ worker2 not allowed (office hours, noise restriction) +``` + +--- + +## Fixed Issues + +### Stuck Chunk Problem (Dec 2 - Dec 4) + +**Issue:** Chunk 14 assigned to worker2 on Dec 2 at 15:14, never completed +- Database showed: `status='running'` +- Reality: No processes running on worker2 +- Impact: Blocked new work assignment to worker2 for 46+ hours + +**Resolution:** +```sql +UPDATE v9_advanced_chunks +SET status='pending', assigned_worker=NULL +WHERE id='v9_advanced_chunk_0014'; +``` + +Chunk 14 now available for reassignment during worker2's active hours (19:00-06:00). + +--- + +## Manual Overrides + +### Temporarily Disable Time Restriction +If needed for urgent sweeps, modify coordinator: + +```python +# In WORKERS['worker2'], comment out time restriction: +'worker2': { + # 'time_restricted': True, # TEMPORARILY DISABLED + 'allowed_start_hour': 19, + 'allowed_end_hour': 6, +} +``` + +Then restart coordinator. + +### Adjust Operating Hours +To change allowed hours (e.g., extend to 8 PM - 5 AM): + +```python +'worker2': { + 'time_restricted': True, + 'allowed_start_hour': 20, # 8 PM + 'allowed_end_hour': 5, # 5 AM +} +``` + +--- + +## Monitoring Commands + +### Check Worker2 Status +```bash +# Check if worker2 has active processes +ssh root@10.10.254.106 "ssh root@10.20.254.100 'ps aux | grep v9_advanced_worker | grep -v grep | wc -l'" + +# Check worker2 assignments in database +cd /home/icke/traderv4/cluster +sqlite3 exploration.db "SELECT COUNT(*) FROM v9_advanced_chunks WHERE assigned_worker='worker2' AND status='running';" +``` + +### Check Time Restriction Status +```bash +cd /home/icke/traderv4/cluster +sqlite3 exploration.db " +SELECT + assigned_worker, + COUNT(*) as chunks, + SUM(CASE WHEN status='completed' THEN 1 ELSE 0 END) as completed, + SUM(CASE WHEN status='running' THEN 1 ELSE 0 END) as running +FROM v9_advanced_chunks +WHERE assigned_worker IS NOT NULL +GROUP BY assigned_worker; +" +``` + +--- + +## Expected Behavior + +### During Office Hours (06:00 - 19:00) +- Worker1: ✅ Processing chunks +- Worker2: ⏸️ Idle (time restriction active) +- Coordinator logs: "⏰ worker2 not allowed (office hours, noise restriction)" + +### During Off Hours (19:00 - 06:00) +- Worker1: ✅ Processing chunks +- Worker2: ✅ Processing chunks (if available) +- Both workers: Full 32-core utilization + +--- + +## Files Modified + +- `cluster/v9_advanced_coordinator.py` - Added time restriction logic +- `cluster/exploration.db` - Reset stuck chunk 14 +- `cluster/WORKER2_TIME_RESTRICTION.md` - This documentation + +--- + +## Future Improvements + +1. **Dynamic hour adjustment** via environment variables +2. **Holiday/weekend override** (allow 24/7 on non-work days) +3. **Load-based throttling** (reduce cores instead of full stop) +4. **SMS alerts** when worker2 transitions active/idle + +--- + +## Contact + +For adjustments to worker2 operating hours or noise constraint issues, update the configuration in `v9_advanced_coordinator.py` and restart the coordinator. + +**Current Status (Dec 4, 2025):** +- ✅ Time restriction implemented +- ✅ Stuck chunk 14 resolved +- ✅ Worker1 processing continuously +- ⏸️ Worker2 waiting for 19:00 (off-hours start)