trading_bot_v4

Author	SHA1	Message	Date
mindesbunister	4c36fa2bc3	docs: Major documentation reorganization + ENV variable reference Documentation Structure: - Created docs/ subdirectory organization (analysis/, architecture/, bugs/, cluster/, deployments/, roadmaps/, setup/, archived/) - Moved 68 root markdown files to appropriate categories - Root directory now clean (only README.md remains) - Total: 83 markdown files now organized by purpose New Content: - Added comprehensive Environment Variable Reference to copilot-instructions.md - 100+ ENV variables documented with types, defaults, purpose, notes - Organized by category: Required (Drift/RPC/Pyth), Trading Config (quality/ leverage/sizing), ATR System, Runner System, Risk Limits, Notifications, etc. - Includes usage examples (correct vs wrong patterns) File Distribution: - docs/analysis/ - Performance analyses, blocked signals, profit projections - docs/architecture/ - Adaptive leverage, ATR trailing, indicator tracking - docs/bugs/ - CRITICAL_.md, FIXES_.md bug reports (7 files) - docs/cluster/ - EPYC setup, distributed computing docs (3 files) - docs/deployments/ - _COMPLETE.md, DEPLOYMENT_.md status (12 files) - docs/roadmaps/ - All ROADMAP.md strategic planning files (7 files) - docs/setup/ - TradingView guides, signal quality, n8n setup (8 files) - docs/archived/2025_pre_nov/ - Obsolete verification checklist (1 file) Key Improvements: - ENV variable reference: Single source of truth for all configuration - Common Pitfalls #68-71: Already complete, verified during audit - Better findability: Category-based navigation vs 68 files in root - Preserves history: All files git mv (rename), not copy/delete - Zero broken functionality: Only documentation moved, no code changes Verification: - 83 markdown files now in docs/ subdirectories - Root directory cleaned: 68 files → 0 files (except README.md) - Git history preserved for all moved files - Container running: trading-bot-v4 (no restart needed) Next Steps: - Create README.md files in each docs subdirectory - Add navigation index - Update main README.md with new structure - Consolidate duplicate deployment docs - Archive truly obsolete files (old SQL backups) See: docs/analysis/CLEANUP_PLAN.md for complete reorganization strategy	2025-12-04 08:29:59 +01:00
mindesbunister	db33af9f17	fix: Stop button database reset + UI state display (DATABASE-FIRST ARCHITECTURE) CRITICAL FIXES: 1. Stop button now resets database FIRST (before pkill) - Database cleanup happens even if coordinator crashed - Prevents stale 'running' chunks blocking restart - Uses Node.js sqlite library (not CLI - Docker compatible) 2. UI enhancement - 4-state display - ⚡ Processing (running > 0) - ⏳ Pending (pending > 0, running = 0) - ✅ Complete (all completed) - ⏸️ Idle (no work queued) [NEW] - Shows pending chunk count when present TECHNICAL DETAILS: - Replaced sqlite3 CLI calls with proper Node.js API - Fixed permissions: chown 1001:1001 cluster/ for container write - Database-first logic: reset → pkill → verify - Detailed logging for each operation step FILES CHANGED: - app/api/cluster/control/route.ts (database operations refactored) - app/cluster/page.tsx (4-state UI display) VERIFIED: - Stop button successfully reset 3 'running' chunks → 'pending' - UI correctly shows Idle state after Stop - Container logs show detailed operation flow - Database operations work in Docker environment DEPLOYMENT: - Container rebuilt with fixed code - Tested with real stale database (3 running chunks) - All operations working correctly	2025-12-01 11:34:47 +01:00
mindesbunister	5d07fbbd28	critical: Fix EPYC cluster start button - database cleanup before start Problem: - Start button showed 'already running' when cluster wasn't actually running - Database had stale chunks in 'running' state from crashed/killed coordinator - Control endpoint checked process but not database state Solution: 1. Reset stale 'running' chunks to 'pending' before starting coordinator 2. Verify coordinator not running before starting (prevent duplicates) 3. Add database cleanup to stop action as well (prevent future stale states) 4. Enhanced error reporting with coordinator log output Changes: - app/api/cluster/control/route.ts - Added database cleanup in start action (reset running chunks) - Added process check before start (prevent duplicates) - Added database cleanup in stop action (cleanup orphaned state) - Added coordinator log output on start failure - Improved error messages and logging Impact: - Start button now works correctly even after unclean coordinator shutdown - Prevents false 'already running' reports - Automatic cleanup of stale database state - Better error diagnostics Verified: - Container rebuilt and restarted successfully - Cluster status shows 'idle' after database cleanup - Ready for user to test start button functionality	2025-12-01 08:28:05 +01:00
mindesbunister	cc56b72df2	fix: Database-first cluster status detection + Stop button clarification CRITICAL FIX (Nov 30, 2025): - Dashboard showed 'idle' despite 22+ worker processes running - Root cause: SSH-based worker detection timing out - Solution: Check database for running chunks FIRST Changes: 1. app/api/cluster/status/route.ts: - Query exploration database before SSH detection - If running chunks exist, mark workers 'active' even if SSH fails - Override worker status: 'offline' → 'active' when chunks running - Log: '✅ Cluster status: ACTIVE (database shows running chunks)' - Database is source of truth, SSH only for supplementary metrics 2. app/cluster/page.tsx: - Stop button ALREADY EXISTS (conditionally shown) - Shows Start when status='idle', Stop when status='active' - No code changes needed - fixed by status detection Result: - Dashboard now shows 'ACTIVE' with 2 workers (correct) - Workers show 'active' status (was 'offline') - Stop button automatically visible when cluster active - System resilient to SSH timeouts/network issues Verified: - Container restarted: Nov 30 21:18 UTC - API tested: Returns status='active', activeWorkers=2 - Logs confirm: Database-first logic working - Workers confirmed running: 22+ processes on worker1, workers on worker2	2025-11-30 22:23:01 +01:00
mindesbunister	b77282b560	feat: Add EPYC cluster distributed sweep with web UI New Features: - Distributed coordinator orchestrates 2x AMD EPYC 16-core servers - 64 total cores processing 12M parameter combinations (70% CPU limit) - Worker1 (pve-nu-monitor01): Direct SSH access at 10.10.254.106 - Worker2 (bd-host01): 2-hop SSH through worker1 (10.20.254.100) - Web UI at /cluster shows real-time status and AI recommendations - API endpoint /api/cluster/status serves cluster metrics - Auto-refresh every 30s with top strategies and actionable insights Files Added: - cluster/distributed_coordinator.py (510 lines) - Main orchestrator - cluster/distributed_worker.py (271 lines) - Worker1 script - cluster/distributed_worker_bd_clean.py (275 lines) - Worker2 script - cluster/monitor_bd_host01.sh - Monitoring script - app/api/cluster/status/route.ts (274 lines) - API endpoint - app/cluster/page.tsx (258 lines) - Web UI - cluster/CLUSTER_SETUP.md - Complete setup and access documentation Technical Details: - SQLite database tracks chunk assignments - 10,000 combinations per chunk (1,195 total chunks) - Multiprocessing.Pool with 70% CPU limit (22 cores per EPYC) - SSH/SCP for deployment and result collection - Handles 2-hop SSH for bd-host01 access - Results in CSV format with top strategies ranked Access Documentation: - Worker1: ssh root@10.10.254.106 - Worker2: ssh root@10.10.254.106 "ssh root@10.20.254.100" - Web UI: http://localhost:3001/cluster - See CLUSTER_SETUP.md for complete guide Status: Deployed and operational	2025-11-30 13:02:18 +01:00

5 Commits