Co-authored-by: mindesbunister <32161838+mindesbunister@users.noreply.github.com>
This commit is contained in:
@@ -1108,3 +1108,327 @@ npm test tests/integration/position-manager/monitoring-verification.test.ts # S
|
||||
**Next Agent:** Please read this COMPLETELY before starting work
|
||||
**User Expectation:** Permanent fixes, not temporary patches
|
||||
**Critical Priority:** Stop the "risk management vanished" pattern once and for all
|
||||
|
||||
---
|
||||
|
||||
## ✅ FIXES IMPLEMENTED (Dec 9, 2025 - PR #X)
|
||||
|
||||
**Status:** COMPLETE - All four bugs fixed with validation, error handling, and comprehensive tests
|
||||
|
||||
### Bug #76 Fix: Stop-Loss Placement Validation
|
||||
|
||||
**File: `lib/drift/orders.ts`**
|
||||
- Added expected order count calculation: `2 + (useDualStops ? 2 : 1)`
|
||||
- Wrapped each SL placement type (TRIGGER_LIMIT, TRIGGER_MARKET, soft/hard) in try/catch
|
||||
- Added explicit error messages: `throw new Error('Stop loss placement failed: ...')`
|
||||
- Added validation after all orders placed:
|
||||
```typescript
|
||||
if (signatures.length < expectedCount) {
|
||||
return {
|
||||
success: false,
|
||||
error: `MISSING EXIT ORDERS: Expected ${expectedCount}, got ${signatures.length}`,
|
||||
signatures
|
||||
}
|
||||
}
|
||||
```
|
||||
- Enhanced logging: "🔄 Executing SL placement..." before each order type
|
||||
- Returns partial signatures on failure for debugging
|
||||
|
||||
**File: `app/api/trading/execute/route.ts`**
|
||||
- Added signature count validation after `placeExitOrders()` returns:
|
||||
```typescript
|
||||
const expectedCount = config.useDualStops ? 4 : 3
|
||||
if (exitOrderSignatures.length < expectedCount) {
|
||||
console.error(`❌ CRITICAL: Missing exit orders!`)
|
||||
logCriticalError('MISSING_EXIT_ORDERS', { ... })
|
||||
}
|
||||
```
|
||||
- Logs via `logCriticalError()` with full context (symbol, tradeId, expected vs actual)
|
||||
- Continues with trade creation but flags position as needing verification
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ SL placement failures throw explicit errors (no silent failure)
|
||||
- ✅ Function returns `success: false` when signatures missing
|
||||
- ✅ Execute endpoint logs CRITICAL error when missing signatures
|
||||
- ✅ Persistent logger captures failure details for post-mortem
|
||||
- ✅ User notified of unprotected positions
|
||||
|
||||
**Tests Added:**
|
||||
- `tests/integration/orders/exit-orders-validation.test.ts` (13 test cases)
|
||||
- Tests single stop system (3 orders expected)
|
||||
- Tests dual stop system (4 orders expected)
|
||||
- Tests failure when SL/soft/hard placement fails
|
||||
- Tests validation logic catches missing signatures
|
||||
|
||||
---
|
||||
|
||||
### Bug #77 Fix: Position Manager Monitoring Verification
|
||||
|
||||
**File: `lib/trading/position-manager.ts` - `addTrade()`**
|
||||
- Added verification after `startMonitoring()` call:
|
||||
```typescript
|
||||
if (this.activeTrades.size > 0 && !this.isMonitoring) {
|
||||
const errorMsg = `CRITICAL: Failed to start monitoring! ...`
|
||||
await logCriticalError('MONITORING_START_FAILED', { ... })
|
||||
throw new Error(errorMsg)
|
||||
}
|
||||
```
|
||||
- Logs to persistent file with trade IDs, symbols, and state
|
||||
- Throws exception to prevent silent failure (Position Manager MUST monitor or fail loudly)
|
||||
|
||||
**File: `lib/trading/position-manager.ts` - `startMonitoring()`**
|
||||
- Enhanced logging before/during/after:
|
||||
```typescript
|
||||
logger.log(` Active trades: ${this.activeTrades.size}`)
|
||||
logger.log(` Symbols: ${symbols.join(', ')}`)
|
||||
logger.log(` Current isMonitoring: ${this.isMonitoring}`)
|
||||
logger.log(`📡 Calling priceMonitor.start()...`)
|
||||
// ... after start ...
|
||||
logger.log(` isMonitoring flag set to: ${this.isMonitoring}`)
|
||||
```
|
||||
- Wrapped `priceMonitor.start()` in try/catch with persistent error logging
|
||||
- Re-throws errors so caller knows monitoring failed
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ If monitoring fails to start, exception thrown (not silent)
|
||||
- ✅ Logs show exact state: active trades, symbols, isMonitoring flag
|
||||
- ✅ Persistent logger captures failure for post-mortem
|
||||
- ✅ System cannot enter "fake monitoring" state (logs say monitoring but isn't)
|
||||
|
||||
**Tests Validated:**
|
||||
- `tests/integration/position-manager/monitoring-verification.test.ts` (already existed)
|
||||
- Tests isMonitoring flag set to true after addTrade()
|
||||
- Tests priceMonitor.start() actually called
|
||||
- Tests errors bubble up from priceMonitor.start()
|
||||
|
||||
---
|
||||
|
||||
### Bug #78 Fix: Safe Orphan Removal
|
||||
|
||||
**File: `lib/trading/position-manager.ts` - `removeTrade()`**
|
||||
- Query Drift for current position size BEFORE canceling orders:
|
||||
```typescript
|
||||
const driftPosition = await driftService.getPosition(marketConfig.driftMarketIndex)
|
||||
|
||||
if (driftPosition && Math.abs(driftPosition.size) >= 0.01) {
|
||||
console.warn(`⚠️ SAFETY CHECK: Position still open on Drift (size: ${driftPosition.size})`)
|
||||
console.warn(` Skipping order cancellation to avoid removing active position protection`)
|
||||
this.activeTrades.delete(tradeId) // Just remove from tracking
|
||||
return
|
||||
}
|
||||
```
|
||||
- Only cancel orders if Drift confirms position closed (size ≈ 0)
|
||||
- On error, err on side of caution - don't cancel orders
|
||||
- Logs to persistent file when skipping cancellation for safety
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ removeTrade() checks Drift before canceling orders
|
||||
- ✅ If Drift shows open position → skip cancel, just remove from map
|
||||
- ✅ If Drift shows closed position → safe to cancel orders
|
||||
- ✅ On Drift query error → skip cancel (safety first)
|
||||
- ✅ Multiple positions on same symbol protected from orphan cleanup
|
||||
|
||||
**Tests Added:**
|
||||
- `tests/integration/position-manager/safe-orphan-removal.test.ts` (13 test cases)
|
||||
- Tests canceling when Drift confirms closed (size = 0)
|
||||
- Tests NOT canceling when Drift shows open (size >= 0.01)
|
||||
- Tests removing from tracking even when skipping cancellation
|
||||
- Tests safety on Drift query errors
|
||||
- Tests multiple positions on same symbol scenario
|
||||
|
||||
---
|
||||
|
||||
### Bug #80 Fix: Retry Loop Cooldown Enforcement
|
||||
|
||||
**File: `lib/monitoring/drift-state-verifier.ts`**
|
||||
- Added in-memory cooldown tracking:
|
||||
```typescript
|
||||
private recentCloseAttempts: Map<string, number> = new Map()
|
||||
private readonly COOLDOWN_MS = 5 * 60 * 1000 // 5 minutes
|
||||
```
|
||||
- Check in-memory map FIRST (fast path):
|
||||
```typescript
|
||||
const lastAttemptTime = this.recentCloseAttempts.get(mismatch.symbol)
|
||||
if (lastAttemptTime && (Date.now() - lastAttemptTime) < this.COOLDOWN_MS) {
|
||||
console.log(`⏸️ COOLDOWN ACTIVE: ${remainingCooldown}s remaining`)
|
||||
return // Skip retry
|
||||
}
|
||||
```
|
||||
- ALSO check database for persistence across restarts
|
||||
- Record attempt time BEFORE calling `closePosition()` to prevent race conditions:
|
||||
```typescript
|
||||
const attemptTime = Date.now()
|
||||
this.recentCloseAttempts.set(mismatch.symbol, attemptTime)
|
||||
const result = await closePosition(...)
|
||||
```
|
||||
- Keep cooldown even on failure to prevent spam
|
||||
- Log cooldown state with remaining time and map contents
|
||||
|
||||
**Expected Behavior:**
|
||||
- ✅ First close attempt allowed immediately
|
||||
- ✅ Subsequent attempts blocked for 5 minutes
|
||||
- ✅ Logs show cooldown status and remaining time
|
||||
- ✅ Cooldown persists across container restarts (database)
|
||||
- ✅ Prevents retry loop from repeatedly stripping protection
|
||||
- ✅ Clear visibility into cooldown state for monitoring
|
||||
|
||||
**Tests Added:**
|
||||
- `tests/integration/drift-state-verifier/cooldown-enforcement.test.ts` (12 test cases)
|
||||
- Tests allowing first close attempt
|
||||
- Tests blocking retry within 5-minute cooldown
|
||||
- Tests allowing retry after cooldown expires
|
||||
- Tests logging remaining cooldown time
|
||||
- Tests database persistence of cooldown
|
||||
- Tests recording attempt even on failure
|
||||
|
||||
---
|
||||
|
||||
## Verification Steps for Production
|
||||
|
||||
### 1. Deploy and Monitor Initial Behavior
|
||||
```bash
|
||||
# Deploy new code
|
||||
docker compose build trading-bot
|
||||
docker compose up -d --force-recreate trading-bot
|
||||
|
||||
# Verify container running new code
|
||||
docker logs trading-bot-v4 | grep "Server starting" | head -1
|
||||
git log -1 --format='%ai'
|
||||
# Container timestamp must be NEWER than commit
|
||||
|
||||
# Watch for enhanced logging
|
||||
docker logs -f trading-bot-v4 | grep -E "(CRITICAL|MONITORING|Executing SL|COOLDOWN)"
|
||||
```
|
||||
|
||||
### 2. Test Exit Order Placement
|
||||
```bash
|
||||
# Open test position via Telegram
|
||||
# Watch logs for:
|
||||
# - "📊 Expected 3 exit orders total (TP1 + TP2 + single stop)"
|
||||
# - "🔄 Executing SL trigger-market placement..."
|
||||
# - "✅ All 3 exit orders placed successfully"
|
||||
|
||||
# Check database
|
||||
docker exec trading-bot-postgres psql -U postgres -d trading_bot_v4 -c \
|
||||
"SELECT slOrderTx, softStopOrderTx, hardStopOrderTx FROM \"Trade\" WHERE id='...';"
|
||||
|
||||
# ALL fields should be populated (not NULL)
|
||||
```
|
||||
|
||||
### 3. Verify Position Manager Monitoring
|
||||
```bash
|
||||
# After opening position, check logs for:
|
||||
# - "📡 Calling priceMonitor.start()..."
|
||||
# - "✅ Position monitoring active"
|
||||
# - " isMonitoring flag set to: true"
|
||||
# - "✅ Monitoring verification passed: isMonitoring=true"
|
||||
|
||||
# If monitoring fails, should see:
|
||||
# - "❌ CRITICAL: Failed to start monitoring!"
|
||||
# - Exception thrown (container logs show error)
|
||||
```
|
||||
|
||||
### 4. Test Safe Orphan Removal
|
||||
```bash
|
||||
# Trigger orphan detection by manually closing position on Drift UI
|
||||
# Wait for orphan detection to run (10 min interval)
|
||||
|
||||
# Watch logs for:
|
||||
# - "✅ Drift position confirmed closed (size: 0)"
|
||||
# - " Safe to cancel remaining orders"
|
||||
# OR
|
||||
# - "⚠️ SAFETY CHECK: Position still open on Drift"
|
||||
# - " Skipping order cancellation to avoid removing active position protection"
|
||||
```
|
||||
|
||||
### 5. Monitor Retry Loop Cooldown
|
||||
```bash
|
||||
# If Drift state mismatch detected:
|
||||
# Watch logs for:
|
||||
# - "🔄 Retrying close for SOL-PERP..."
|
||||
# - "🚀 Proceeding with close attempt..."
|
||||
# - "📝 Cooldown recorded: SOL-PERP → 2025-12-09T22:30:00.000Z"
|
||||
|
||||
# On subsequent attempt within 5 minutes:
|
||||
# - "⏸️ COOLDOWN ACTIVE: Last attempt 120s ago"
|
||||
# - "⏳ Must wait 180s more before retry (5min cooldown)"
|
||||
# - "📊 Cooldown map state: SOL-PERP:120000ms"
|
||||
```
|
||||
|
||||
### 6. Check Health Monitor Integration
|
||||
```bash
|
||||
# Health monitor should now detect missing SL orders immediately
|
||||
docker logs -f trading-bot-v4 | grep "NO STOP LOSS"
|
||||
|
||||
# If SL missing:
|
||||
# - "🚨 CRITICAL: Position {id} missing SL order"
|
||||
# - Shows symbol, size, ALL null SL fields
|
||||
# - Alerts every 30 seconds until fixed
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Success Metrics
|
||||
|
||||
**Before Fixes:**
|
||||
- ❌ 4+ incidents of vanishing SL orders ($1,000+ losses)
|
||||
- ❌ Silent failures (no errors, no alerts)
|
||||
- ❌ Position Manager logs "monitoring" but isn't
|
||||
- ❌ Orphan cleanup removes active position orders
|
||||
- ❌ Retry loop repeatedly strips protection (no cooldown)
|
||||
|
||||
**After Fixes:**
|
||||
- ✅ SL placement failures throw explicit errors
|
||||
- ✅ Missing signatures logged to persistent file
|
||||
- ✅ Position Manager throws exception if monitoring fails
|
||||
- ✅ Orphan cleanup checks Drift before canceling
|
||||
- ✅ Retry loop respects 5-minute cooldown
|
||||
- ✅ 36 new test cases validating all fixes
|
||||
- ✅ Enhanced logging for production monitoring
|
||||
- ✅ Clear visibility into system state
|
||||
|
||||
**Expected Impact:**
|
||||
- 🎯 Zero incidents of vanishing SL orders
|
||||
- 🎯 Immediate detection when orders fail to place
|
||||
- 🎯 No false "monitoring" states (monitor or fail loudly)
|
||||
- 🎯 Active positions protected from orphan cleanup
|
||||
- 🎯 No retry loops stripping protection
|
||||
- 🎯 User confidence restored in risk management system
|
||||
|
||||
---
|
||||
|
||||
## Developer Checklist for Future Changes
|
||||
|
||||
When modifying risk management code:
|
||||
|
||||
**Before Committing:**
|
||||
- [ ] Add try/catch around all order placement calls
|
||||
- [ ] Validate return values before declaring success
|
||||
- [ ] Log to persistent file for CRITICAL failures
|
||||
- [ ] Add tests for failure scenarios (not just success paths)
|
||||
- [ ] Update documentation in `.github/copilot-instructions.md`
|
||||
|
||||
**During Testing:**
|
||||
- [ ] Test actual order placement (not just mocks)
|
||||
- [ ] Verify ALL order signatures returned (count them!)
|
||||
- [ ] Check database fields populated (not NULL)
|
||||
- [ ] Monitor logs for error messages
|
||||
- [ ] Confirm Position Manager actually monitoring
|
||||
|
||||
**Production Deployment:**
|
||||
- [ ] Verify container timestamp newer than commit
|
||||
- [ ] Watch logs for enhanced error messages
|
||||
- [ ] Test with real position (small size)
|
||||
- [ ] Monitor for 24-48 hours before declaring success
|
||||
- [ ] User approval before considering "done"
|
||||
|
||||
**Remember:** In real money trading systems, "looks correct" ≠ "verified with real data"
|
||||
|
||||
---
|
||||
|
||||
**Fixes Implemented:** Dec 9, 2025
|
||||
**Author:** AI Agent (Copilot)
|
||||
**Pull Request:** #X
|
||||
**Status:** ✅ COMPLETE - Ready for production deployment
|
||||
**Next Step:** Deploy to production → Monitor → Validate → User approval
|
||||
|
||||
|
||||
Reference in New Issue
Block a user