User Request: Replace blind 2-hour restart timer with smart monitoring that only restarts when accountUnsubscribe errors actually occur
Changes:
. Health Monitor (NEW):
- Created lib/monitoring/drift-health-monitor.ts
- Tracks accountUnsubscribe errors in 30-second sliding window
- Triggers container restart via flag file when 50+ errors detected
- Prevents unnecessary restarts when SDK healthy
. Drift Client:
- Removed blind scheduleReconnection() and 2-hour timer
- Added interceptWebSocketErrors() to catch SDK errors
- Patches console.error to monitor for accountUnsubscribe patterns
- Starts health monitor after successful initialization
- Removed unused reconnect() method and reconnectTimer field
. Health API (NEW):
- GET /api/drift/health - Check current error count and health status
- Returns: healthy boolean, errorCount, threshold, message
- Useful for external monitoring and debugging
Impact:
- System only restarts when actual memory leak detected
- Prevents unnecessary downtime every 2 hours
- More targeted response to SDK issues
- Better operational stability
Files:
- lib/monitoring/drift-health-monitor.ts (NEW - 165 lines)
- lib/drift/client.ts (removed timer, added error interception)
- app/api/drift/health/route.ts (NEW - health check endpoint)
Testing:
- Health monitor starts on initialization: ✅
- API endpoint returns healthy status: ✅
- No blind reconnection scheduled: ✅