docs: Document position close verification fix (Common Pitfall #47)

- Added comprehensive documentation for close verification gap bug
- Real incident: 6 hours unmonitored exposure after close confirmation
- Root cause: Transaction confirmed ≠ Drift state propagated (5-10s delay)
- Fix: 5s wait + verification + needsVerification flag for Position Manager
- Prevents premature database 'closed' marking while position still open
- TypeScript interface updated: ClosePositionResult.needsVerification
- Deployed: Nov 16, 2025 09:28:20 CET
- Commits: c607a66 (logic), b23dde0 (interface)
This commit is contained in:
mindesbunister
2025-11-16 10:31:23 +01:00
parent b23dde057b
commit 84f40f3e15

View File

@@ -1949,6 +1949,108 @@ trade.realizedPnL += actualRealizedPnL // NOT: result.realizedPnL from SDK
- **Git commit:** 7129cbf "fix: Add 99% safety buffer for 100% position sizing"
- **Lesson:** When integrating with DEX protocols, never use 100% of resources - always leave safety margin for protocol-level calculations
47. **Position close verification gap - 6 hours unmonitored (CRITICAL - Fixed Nov 16, 2025):**
- **Symptom:** Close transaction confirmed on-chain, database marked "SL closed", but position stayed open on Drift for 6+ hours unmonitored
- **Root Cause:** Transaction confirmation ≠ Drift internal state updated immediately (5-10 second propagation delay)
- **Real incident (Nov 16, 02:51 CET):**
* Trailing stop triggered at 02:51:57
* Close transaction confirmed on-chain ✅
* Position Manager immediately queried Drift → still showed open (stale state)
* Ghost detection eventually marked it "closed" in database
* But position actually stayed open on Drift until 08:51 restart
* **6 hours unprotected** - no monitoring, no TP/SL backup, only orphaned on-chain orders
- **Why dangerous:**
* Database said "closed" so container restarts wouldn't restore monitoring
* Position exposed to unlimited risk if price moved against
* Only saved by luck (container restart at 08:51 detected orphaned position)
* Startup validator caught mismatch: "CRITICAL: marked as CLOSED in DB but still OPEN on Drift"
- **Impact:** Every trailing stop or SL exit vulnerable to this race condition
- **Fix (2-layer verification):**
```typescript
// In lib/drift/orders.ts closePosition() (line ~634):
if (params.percentToClose === 100) {
console.log('🗑️ Position fully closed, cancelling remaining orders...')
await cancelAllOrders(params.symbol)
// CRITICAL: Verify position actually closed on Drift
// Transaction confirmed ≠ Drift state updated immediately
console.log('⏳ Waiting 5s for Drift state to propagate...')
await new Promise(resolve => setTimeout(resolve, 5000))
const verifyPosition = await driftService.getPosition(marketConfig.driftMarketIndex)
if (verifyPosition && Math.abs(verifyPosition.size) >= 0.01) {
console.error(`🔴 CRITICAL: Close confirmed BUT position still exists!`)
console.error(` Transaction: ${txSig}, Drift size: ${verifyPosition.size}`)
// Return success but flag that monitoring should continue
return {
success: true,
transactionSignature: txSig,
closePrice: oraclePrice,
closedSize: sizeToClose,
realizedPnL,
needsVerification: true, // Flag for Position Manager
}
}
console.log('✅ Position verified closed on Drift')
}
// In lib/trading/position-manager.ts executeExit() (line ~1206):
if ((result as any).needsVerification) {
console.log(`⚠️ Close confirmed but position still exists on Drift`)
console.log(` Keeping ${trade.symbol} in monitoring until Drift confirms closure`)
console.log(` Ghost detection will handle final cleanup once Drift updates`)
// Keep monitoring - don't mark closed yet
return
}
```
- **Behavior now:**
* Close transaction confirmed → wait 5 seconds
* Query Drift to verify position actually gone
* If still exists: Keep monitoring, log critical error, wait for ghost detection
* If verified closed: Proceed with database update and cleanup
* Ghost detection becomes safety net, not primary close mechanism
- **Prevents:** Premature database "closed" marking while position still open on Drift
- **TypeScript interface:** Added `needsVerification?: boolean` to ClosePositionResult interface
- **Git commits:** c607a66 (verification logic), b23dde0 (TypeScript interface fix)
- **Deployed:** Nov 16, 2025 09:28:20 CET
- **Lesson:** In DEX trading, always verify state changes actually propagated before updating local state
46. **100% position sizing causes InsufficientCollateral (Fixed Nov 16, 2025):**
- **Symptom:** Bot configured for 100% position size gets InsufficientCollateral errors, but Drift UI can open same size position
- **Root Cause:** Drift's margin calculation includes fees, slippage buffers, and rounding - exact 100% leaves no room
- **Error details:**
```
Program log: total_collateral=85547535 ($85.55)
Program log: margin_requirement=85583087 ($85.58)
Error: InsufficientCollateral (shortage: $0.03)
```
- **Real incident (Nov 16, 01:50 CET):**
* Collateral: $85.55
* Bot tries: $1,283.21 notional (100% × 15x leverage)
* Drift UI works: $1,282.57 notional (has internal safety buffer)
* Difference: $0.64 causes rejection
- **Impact:** Bot cannot trade at full capacity despite account leverage correctly set to 15x
- **Fix:** Apply 99% safety buffer automatically when user configures 100% position size
```typescript
// In config/trading.ts calculateActualPositionSize (line ~272):
let percentDecimal = configuredSize / 100
// CRITICAL: Safety buffer for 100% positions
if (configuredSize >= 100) {
percentDecimal = 0.99
console.log(`⚠️ Applying 99% safety buffer for 100% position`)
}
const calculatedSize = freeCollateral * percentDecimal
// $85.55 × 99% = $84.69 (leaves $0.86 for fees/slippage)
```
- **Result:** $84.69 × 15x = $1,270.35 notional (well within margin requirements)
- **User experience:** Transparent - bot logs "Applying 99% safety buffer" when triggered
- **Why Drift UI works:** Has internal safety calculations that bot must replicate externally
- **Math proof:** 1% buffer on $85 = $0.85 safety margin (covers typical fees of $0.03-0.10)
- **Git commit:** 7129cbf "fix: Add 99% safety buffer for 100% position sizing"
- **Lesson:** When integrating with DEX protocols, never use 100% of resources - always leave safety margin for protocol-level calculations
47. **Position close verification gap - 6 hours unmonitored (CRITICAL - Fixed Nov 16, 2025):**
- **Symptom:** Close transaction confirmed on-chain, database marked "SL closed", but position stayed open on Drift for 6+ hours unmonitored
- **Root Cause:** Transaction confirmation ≠ Drift internal state updated immediately (5-10 second propagation delay)