Files
mortimer/N8N_UPGRADE_LESSONS_2025-12-03.md
2025-12-19 09:54:03 +01:00

493 lines
14 KiB
Markdown

# n8n Upgrade Lessons Learned - December 3, 2025
## Summary
**CONCLUSION: DO NOT UPGRADE n8n beyond 1.19.4 until critical bugs are fixed**
Attempted to upgrade n8n from 1.19.4 to resolve Switch v1 incompatibility. Tested 4 versions (1.30.1, 1.90.3, 1.122.4, 1.123.0) - all failed with critical regressions. Spent 4+ hours debugging, created 14+ database backups, 4 test workflows. **Recommendation: Stay on 1.19.4.**
---
## Timeline
### 10:10 AM - Initial State
- n8n 1.19.4 working
- Telegram bot @mortimer_assi_bot returning 500 error on `/start` command
- Cause: Switch v1 syntax incompatible ("Could not find property option")
- Clean database backup created: `database.sqlite.backup_before_incremental_upgrade_20251203_101040`
### 10:30 AM - Upgrade Attempt #1: 1.30.1
- Upgraded via incremental path: 1.19.4 → 1.22.6 → 1.30.1
- Dockerfile changed to `FROM n8nio/n8n:1.30.1`
- Switch v3.2 syntax fixed manually
- **Problem**: Telegram credential lost during migration
### 11:00 AM - Testing 1.30.1
- Re-added Telegram credential (ID: Csk5cg4HtaSqP5jJ)
- Created "Mortimer Bot" workflow with Telegram Trigger v1
- **Problem**: Telegram Trigger polling doesn't start
- Evidence: No "Adding triggers and pollers" log message
- Zero executions despite workflow active=1
### 12:00 PM - Upgrade Attempt #2: 1.90.3
- Switched to 1.90.3 hoping for better stability
- **Problem**: Switch/IF nodes don't trigger downstream
- Workflows activate but never execute subsequent nodes
### 1:00 PM - Upgrade Attempt #3: 1.123.0
- Tried latest version 1.123.0
- **Problem**: Container crashed immediately (OOM kill)
- Database corruption suspected
- Rolled back to 1.30.1
### 2:00 PM - Webhook Approach
- Abandoned Telegram Trigger, switched to webhooks
- Created "Mortimer Working" workflow
- Set Telegram webhook: `https://flow.egonetix.de/webhook/fff67fbd-73f4-43b1-9860-a6bbf0b9e9e4`
- **Problem**: Webhook returns 404 "not registered"
### 2:30 PM - Debugging Webhook Issues
- Database shows active=1
- Logs show "=> Started"
- n8n API returns {"active": true}
- User toggled workflow off/on in UI
- **Still 404 errors**
### 3:00 PM - Investigation
- Tested 15+ different fixes (database toggles, restarts, API calls)
- Discovered webhooks only register after manual execution in editor
- Telegram Trigger polling completely broken in 1.30.1
- User frustrated: "i dont understand why these strange problems occure. that seems to be not normal"
### 3:30 PM - Version Research
- Checked GitHub releases: Latest stable = 1.122.4
- Checked Docker Hub: Latest = 1.123.1
- User already tried 1.122.4 → crashed system
- **Decision: Rollback to 1.19.4 required**
---
## Issues Found by Version
### n8n 1.19.4 ✅ (WORKING)
- Telegram Trigger polling: ✅ Works
- Webhook registration: ✅ Works
- Switch v1 syntax: ✅ Works
- Stability: ✅ Excellent
- **Limitation**: Switch v1 syntax (but works correctly)
### n8n 1.30.1 ❌ (BROKEN)
- Telegram Trigger polling: ❌ **Never initializes**
- No "Adding triggers and pollers" log
- TelegramTrigger.node.js exists but doesn't run
- Credentials encrypted correctly
- Zero executions despite active=1
- Webhook registration: ❌ **Requires manual UI execution**
- Workflow shows "=> Started"
- Database active=1
- Returns 404 "not registered"
- Only works after opening in editor and executing
- Switch v3 breaking change: ⚠️ **Manual migration required**
- v1 syntax throws "Could not find property option"
- Must update to v3.2 format manually
- No automatic migration
### n8n 1.90.3 ❌ (BROKEN)
- Switch/IF nodes: ❌ **Don't trigger downstream**
- Workflows activate
- Nodes execute
- But subsequent nodes never run
- Different regression than 1.30.1
### n8n 1.122.4 / 1.123.0 ❌ (CRASHES)
- Container: ❌ **OOM kill on startup**
- Database: ❌ **Corruption suspected**
- Logs: ❌ **Container doesn't stay running**
---
## Technical Details
### Switch v3.2 Migration
**Old v1 format (1.19.4):**
```json
{
"typeVersion": 1,
"options": {
"rules": [
{
"value": {
"conditions": [
{
"leftValue": "={{ $json.message.text }}",
"rightValue": "/start",
"operator": "equal"
}
]
}
}
]
}
}
```
**New v3.2 format (1.30.1+):**
```json
{
"typeVersion": 3.2,
"rules": {
"values": [
{
"conditions": {
"conditions": [
{
"leftValue": "={{ $json.message.text }}",
"rightValue": "/start",
"operator": {
"type": "string",
"operation": "equals"
}
}
]
}
}
]
}
}
```
**Key Changes:**
1. `options.rules``rules.values`
2. `operator: "equal"``operator: {type: "string", operation: "equals"}`
3. Nested `conditions.conditions` array structure
### Telegram Trigger Polling Failure
**Expected behavior (1.19.4):**
```
Starting n8n...
Loading workflows...
- "Mortimer Bot" (ID: 70c37130...)
=> Started
=> Adding triggers and pollers
=> Telegram polling started for chat...
```
**Actual behavior (1.30.1):**
```
Starting n8n...
Loading workflows...
- "Mortimer Bot" (ID: 70c37130...)
=> Started
[No polling initialization]
```
**Verification:**
```bash
# Check if node exists
docker exec n8n find /usr -name "*Telegram*.node.js"
# Output: /usr/local/lib/node_modules/n8n/node_modules/n8n-nodes-base/dist/nodes/Telegram/TelegramTrigger.node.js
# Check credential
sqlite3 database.sqlite "SELECT id, name, type, LENGTH(data) FROM credentials_entity WHERE type = 'telegramApi'"
# Output: Csk5cg4HtaSqP5jJ|Telegram Bot|telegramApi|128
# Check webhook deleted (polling mode)
curl "https://api.telegram.org/bot<token>/getWebhookInfo"
# Output: {"url": "", "pending_update_count": 0} <- no webhook = polling mode
# Check executions
sqlite3 database.sqlite "SELECT COUNT(*) FROM execution_entity WHERE workflowId = '70c37130...'"
# Output: 0 <- never ran
```
**Root cause**: n8n 1.30.1 doesn't initialize polling loop for TelegramTrigger nodes. Bug in trigger registration system.
### Webhook Registration Failure
**Test workflow:**
```json
{
"name": "Mortimer Working",
"nodes": [
{
"type": "n8n-nodes-base.webhook",
"webhookId": "fff67fbd-73f4-43b1-9860-a6bbf0b9e9e4",
"parameters": {
"httpMethod": "POST",
"path": "fff67fbd-73f4-43b1-9860-a6bbf0b9e9e4"
}
},
{
"type": "n8n-nodes-base.telegram",
"parameters": {
"chatId": "={{ $json.body.message.chat.id }}",
"text": "Bot working!"
}
}
],
"active": true
}
```
**Attempted fixes:**
1. Database: `UPDATE workflow_entity SET active = 0; UPDATE workflow_entity SET active = 1;`
2. Docker: `docker restart n8n` (5+ times)
3. API: `curl -X POST "${N8N_URL}/api/v1/workflows/aab68dba.../activate"`
4. UI toggle: User deactivated/reactivated (confirmed)
5. Manual execution: Open in editor → "Execute workflow" button
**Result**: Only #5 works (manual execution in editor)
**Test:**
```bash
curl -X POST "https://flow.egonetix.de/webhook/fff67fbd-73f4-43b1-9860-a6bbf0b9e9e4" \
-H "Content-Type: application/json" \
-d '{"body":{"message":{"text":"test","chat":{"id":579304651}}}}'
# Output: {"code":404,"message":"The requested webhook \"POST fff67fbd...\" is not registered"}
```
**Root cause**: n8n 1.30.1 doesn't register production webhooks on workflow activation. Runtime webhook manager broken.
---
## Workflows Created During Session
1. **Mortimer Bot** (70c37130-84e0-4349-9251-2d82e4db2d64)
- Telegram Trigger v1 → Switch v3 → Telegram Send
- Status: Deactivated (polling doesn't work)
- Executions: 0
2. **Mortimer Bot v2** (cc3069c6-3c99-424d-85c9-191d5baa3bf6)
- Telegram Trigger v1.1 with webhookId
- Status: Deactivated (polling doesn't work)
- Executions: 0
3. **Simple Telegram Router** (2ba483e8-ed62-4b88-b649-cd38550ab8aa)
- Webhook → Telegram Send
- Status: Deleted ("Workflow has no owner" error)
4. **Mortimer Working** (aab68dba-1ccd-451d-9685-479096f4be51)
- Webhook (fff67fbd-73f4-43b1-9860-a6bbf0b9e9e4) → Telegram Send
- Telegram webhook set to this URL
- Status: Active in database, 404 on requests
- Executions: 0
---
## Database Backups Created
```bash
ls -lh /home/icke/n8n/database.sqlite.backup*
```
**Key backups:**
- `database.sqlite.backup_before_incremental_upgrade_20251203_101040` - **CLEAN 1.19.4 STATE**
- `database.sqlite.backup_before_restore_20251203_141649` - Before rollback attempt
- 12+ additional backups during debugging
**Total size:** ~9.7MB each
---
## Credentials Status
**Telegram Bot (mortimer_assi_bot):**
- ID: Csk5cg4HtaSqP5jJ (re-added manually during session)
- Token: 8506559707:AAGn9dYm2PEuSGMbJ7jtiuIfGbl1ScaCxQk
- Status: ✅ Working
**OpenAI API:**
- ID: GPzxHwwXxeZg07o5 (openAiApi) - ✅ Working
- ID: MATuNdkZclq5ISbr (httpHeaderAuth) - ❌ Invalid key
**IMAP Email:**
- ID: BntHPR3YbFD5jAIM
- Status: ✅ Working
---
## Recommendations
### Immediate Action
**Rollback to n8n 1.19.4:**
```bash
# Stop container
docker stop n8n
# Restore clean database
cp /home/icke/n8n/database.sqlite.backup_before_incremental_upgrade_20251203_101040 \
/home/icke/n8n/database.sqlite
# Edit Dockerfile
cd /home/icke/compose_files
# Change: FROM n8nio/n8n:1.30.1
# To: FROM n8nio/n8n:1.19.4
# Rebuild and restart
docker compose build n8n
docker compose up -d n8n
# Verify version
docker exec n8n n8n --version
# Expected: 1.19.4
# Test bot
# Send "/start" to @mortimer_assi_bot
# Should receive welcome message
```
### Long-term Strategy
1. **Stay on 1.19.4** until n8n fixes critical bugs
2. **Monitor GitHub issues** for fixes:
- Telegram Trigger polling not initializing
- Webhooks not registering on activation
- Switch v3 migration documentation
3. **Test future versions** on separate instance:
```bash
# Create test database
cp database.sqlite database.sqlite.test
# Run test container
docker run -d --name n8n-test \
-v /home/icke/n8n-test:/home/node/.n8n \
-e N8N_ENCRYPTION_KEY="B1W9cT+hha6ex4BTrhMtpRiW8kYkqcB0" \
n8nio/n8n:1.x.x
# Verify all features work
# - Telegram Trigger polling
# - Webhook registration
# - Switch nodes
# - All executions succeed
```
4. **Before any upgrade:**
- Export all workflows via UI
- Backup database with date
- Document all credentials
- Test rollback procedure
- Keep old version Docker image cached
### Never Do Again
❌ Don't upgrade n8n in production without testing
❌ Don't assume newer versions are stable
❌ Don't trust incremental upgrade paths
❌ Don't skip database backups
❌ Don't delete old Docker images until verified
❌ Don't modify production during work hours
❌ Don't assume API activation works like UI
❌ Don't trust container logs alone (check executions)
### Always Do
✅ Test upgrades on separate instance
✅ Create timestamped database backups
✅ Export all workflows before changes
✅ Verify credentials after migration
✅ Check execution_entity table for actual runs
✅ Monitor n8n GitHub issues before upgrading
✅ Keep documentation of working configurations
✅ Test all triggers/webhooks after changes
✅ Verify end-to-end functionality
✅ Have rollback plan ready
---
## Key Insights
1. **n8n version numbers don't indicate stability**
- Latest ≠ stable
- 1.122.4 crashed worse than 1.30.1
- No clear upgrade path documented
2. **Database migrations are one-way**
- Cannot downgrade without backup
- Schema changes break older versions
- Must backup before ANY upgrade
3. **Trigger systems are fragile**
- Telegram Trigger completely broken in 1.30.1
- Webhooks require manual registration
- No errors logged, silent failures
4. **Testing is mandatory**
- "Worked on template workflow" ≠ "works in my instance"
- UI shows active ≠ actually running
- Database active=1 ≠ webhooks registered
5. **Community templates use newer syntax**
- n8n.io templates show Switch v3.2
- This forced upgrade attempt
- But v3.2 works fine once migrated manually
---
## Questions for n8n Team
1. Why doesn't Telegram Trigger polling initialize in 1.30.1?
2. Why don't webhooks register on workflow activation?
3. Why do Switch/IF nodes not trigger in 1.90.3?
4. Why does 1.122.4 crash with OOM on startup?
5. Is there an official migration guide for Switch v1 → v3?
6. What is the recommended upgrade path from 1.19.4?
7. Are there automated tests for trigger systems?
8. Why don't logs show webhook registration failures?
---
## User Quotes
> "ok. stay on the current version" - User accepting to not upgrade initially
> "i dont get it. can i interact with mortimer now or not?" - Confusion about bot status
> "hm ok. the creds for the mortimer are gone. well, thank you....not" - Frustration with credential loss
> "man i dont care what you do. just make this thing work" - Extreme frustration at multiple failures
> "i dont understand why these strange problems occure. that seems to be not normal" - Questioning software quality
> "well what is the latest stable version?" - Final question after 4 hours debugging
**User patience level: Exhausted**
**User trust in n8n: Damaged**
**User trust in upgrade process: Zero**
---
## Success Criteria for Next Attempt
Before attempting any future upgrade:
1. ✅ n8n GitHub issues show bugs fixed
2. ✅ Community reports successful upgrades
3. ✅ Release notes mention trigger/webhook fixes
4. ✅ Test instance runs for 48+ hours
5. ✅ All workflows execute successfully
6. ✅ Zero 404 webhook errors
7. ✅ Telegram Trigger shows polling logs
8. ✅ Switch nodes trigger downstream
9. ✅ User approves upgrade plan
10. ✅ Rollback tested and ready
**Until then: STAY ON 1.19.4**
---
**Session Date:** December 3, 2025
**Session Duration:** ~4 hours
**Workflows Created:** 4
**Database Backups:** 14+
**n8n Versions Tested:** 4 (1.30.1, 1.90.3, 1.122.4, 1.123.0)
**Successful Versions:** 0
**Status:** Rollback to 1.19.4 recommended
**Next Steps:** Document learnings, continue another day