mindesbunister
4c36fa2bc3
docs: Major documentation reorganization + ENV variable reference
...
**Documentation Structure:**
- Created docs/ subdirectory organization (analysis/, architecture/, bugs/,
cluster/, deployments/, roadmaps/, setup/, archived/)
- Moved 68 root markdown files to appropriate categories
- Root directory now clean (only README.md remains)
- Total: 83 markdown files now organized by purpose
**New Content:**
- Added comprehensive Environment Variable Reference to copilot-instructions.md
- 100+ ENV variables documented with types, defaults, purpose, notes
- Organized by category: Required (Drift/RPC/Pyth), Trading Config (quality/
leverage/sizing), ATR System, Runner System, Risk Limits, Notifications, etc.
- Includes usage examples (correct vs wrong patterns)
**File Distribution:**
- docs/analysis/ - Performance analyses, blocked signals, profit projections
- docs/architecture/ - Adaptive leverage, ATR trailing, indicator tracking
- docs/bugs/ - CRITICAL_*.md, FIXES_*.md bug reports (7 files)
- docs/cluster/ - EPYC setup, distributed computing docs (3 files)
- docs/deployments/ - *_COMPLETE.md, DEPLOYMENT_*.md status (12 files)
- docs/roadmaps/ - All *ROADMAP*.md strategic planning files (7 files)
- docs/setup/ - TradingView guides, signal quality, n8n setup (8 files)
- docs/archived/2025_pre_nov/ - Obsolete verification checklist (1 file)
**Key Improvements:**
- ENV variable reference: Single source of truth for all configuration
- Common Pitfalls #68-71: Already complete, verified during audit
- Better findability: Category-based navigation vs 68 files in root
- Preserves history: All files git mv (rename), not copy/delete
- Zero broken functionality: Only documentation moved, no code changes
**Verification:**
- 83 markdown files now in docs/ subdirectories
- Root directory cleaned: 68 files → 0 files (except README.md)
- Git history preserved for all moved files
- Container running: trading-bot-v4 (no restart needed)
**Next Steps:**
- Create README.md files in each docs subdirectory
- Add navigation index
- Update main README.md with new structure
- Consolidate duplicate deployment docs
- Archive truly obsolete files (old SQL backups)
See: docs/analysis/CLEANUP_PLAN.md for complete reorganization strategy
2025-12-04 08:29:59 +01:00
mindesbunister
db33af9f17
fix: Stop button database reset + UI state display (DATABASE-FIRST ARCHITECTURE)
...
CRITICAL FIXES:
1. Stop button now resets database FIRST (before pkill)
- Database cleanup happens even if coordinator crashed
- Prevents stale 'running' chunks blocking restart
- Uses Node.js sqlite library (not CLI - Docker compatible)
2. UI enhancement - 4-state display
- ⚡ Processing (running > 0)
- ⏳ Pending (pending > 0, running = 0)
- ✅ Complete (all completed)
- ⏸️ Idle (no work queued) [NEW]
- Shows pending chunk count when present
TECHNICAL DETAILS:
- Replaced sqlite3 CLI calls with proper Node.js API
- Fixed permissions: chown 1001:1001 cluster/ for container write
- Database-first logic: reset → pkill → verify
- Detailed logging for each operation step
FILES CHANGED:
- app/api/cluster/control/route.ts (database operations refactored)
- app/cluster/page.tsx (4-state UI display)
VERIFIED:
- Stop button successfully reset 3 'running' chunks → 'pending'
- UI correctly shows Idle state after Stop
- Container logs show detailed operation flow
- Database operations work in Docker environment
DEPLOYMENT:
- Container rebuilt with fixed code
- Tested with real stale database (3 running chunks)
- All operations working correctly
2025-12-01 11:34:47 +01:00
mindesbunister
cc56b72df2
fix: Database-first cluster status detection + Stop button clarification
...
CRITICAL FIX (Nov 30, 2025):
- Dashboard showed 'idle' despite 22+ worker processes running
- Root cause: SSH-based worker detection timing out
- Solution: Check database for running chunks FIRST
Changes:
1. app/api/cluster/status/route.ts:
- Query exploration database before SSH detection
- If running chunks exist, mark workers 'active' even if SSH fails
- Override worker status: 'offline' → 'active' when chunks running
- Log: '✅ Cluster status: ACTIVE (database shows running chunks)'
- Database is source of truth, SSH only for supplementary metrics
2. app/cluster/page.tsx:
- Stop button ALREADY EXISTS (conditionally shown)
- Shows Start when status='idle', Stop when status='active'
- No code changes needed - fixed by status detection
Result:
- Dashboard now shows 'ACTIVE' with 2 workers (correct)
- Workers show 'active' status (was 'offline')
- Stop button automatically visible when cluster active
- System resilient to SSH timeouts/network issues
Verified:
- Container restarted: Nov 30 21:18 UTC
- API tested: Returns status='active', activeWorkers=2
- Logs confirm: Database-first logic working
- Workers confirmed running: 22+ processes on worker1, workers on worker2
2025-11-30 22:23:01 +01:00
mindesbunister
8a3141e793
feat: Add cluster page navigation
...
- Add EPYC Cluster card to landing page (first position, purple/pink gradient)
- Add back button to cluster page (animated left arrow, links to dashboard)
- Update landing page grid layout (lg:grid-cols-3 xl:grid-cols-4 for 7 cards)
- Complete bidirectional navigation: dashboard ↔ cluster monitoring
Navigation features:
- Cluster card: 🖥️ icon, "Monitor distributed parameter exploration" description
- Back button: Animated hover effect (arrow slides left, color transitions)
- Responsive grid: 2 cols (mobile), 3 cols (tablet), 4 cols (desktop)
- Consistent styling with existing navigation cards
2025-11-30 13:18:03 +01:00
mindesbunister
b77282b560
feat: Add EPYC cluster distributed sweep with web UI
...
New Features:
- Distributed coordinator orchestrates 2x AMD EPYC 16-core servers
- 64 total cores processing 12M parameter combinations (70% CPU limit)
- Worker1 (pve-nu-monitor01): Direct SSH access at 10.10.254.106
- Worker2 (bd-host01): 2-hop SSH through worker1 (10.20.254.100)
- Web UI at /cluster shows real-time status and AI recommendations
- API endpoint /api/cluster/status serves cluster metrics
- Auto-refresh every 30s with top strategies and actionable insights
Files Added:
- cluster/distributed_coordinator.py (510 lines) - Main orchestrator
- cluster/distributed_worker.py (271 lines) - Worker1 script
- cluster/distributed_worker_bd_clean.py (275 lines) - Worker2 script
- cluster/monitor_bd_host01.sh - Monitoring script
- app/api/cluster/status/route.ts (274 lines) - API endpoint
- app/cluster/page.tsx (258 lines) - Web UI
- cluster/CLUSTER_SETUP.md - Complete setup and access documentation
Technical Details:
- SQLite database tracks chunk assignments
- 10,000 combinations per chunk (1,195 total chunks)
- Multiprocessing.Pool with 70% CPU limit (22 cores per EPYC)
- SSH/SCP for deployment and result collection
- Handles 2-hop SSH for bd-host01 access
- Results in CSV format with top strategies ranked
Access Documentation:
- Worker1: ssh root@10.10.254.106
- Worker2: ssh root@10.10.254.106 "ssh root@10.20.254.100 "
- Web UI: http://localhost:3001/cluster
- See CLUSTER_SETUP.md for complete guide
Status: Deployed and operational
2025-11-30 13:02:18 +01:00