Files
mortimer/N8N_UPGRADE_LESSONS_2025-12-03.md
2025-12-19 09:54:03 +01:00

14 KiB

n8n Upgrade Lessons Learned - December 3, 2025

Summary

CONCLUSION: DO NOT UPGRADE n8n beyond 1.19.4 until critical bugs are fixed

Attempted to upgrade n8n from 1.19.4 to resolve Switch v1 incompatibility. Tested 4 versions (1.30.1, 1.90.3, 1.122.4, 1.123.0) - all failed with critical regressions. Spent 4+ hours debugging, created 14+ database backups, 4 test workflows. Recommendation: Stay on 1.19.4.


Timeline

10:10 AM - Initial State

  • n8n 1.19.4 working
  • Telegram bot @mortimer_assi_bot returning 500 error on /start command
  • Cause: Switch v1 syntax incompatible ("Could not find property option")
  • Clean database backup created: database.sqlite.backup_before_incremental_upgrade_20251203_101040

10:30 AM - Upgrade Attempt #1: 1.30.1

  • Upgraded via incremental path: 1.19.4 → 1.22.6 → 1.30.1
  • Dockerfile changed to FROM n8nio/n8n:1.30.1
  • Switch v3.2 syntax fixed manually
  • Problem: Telegram credential lost during migration

11:00 AM - Testing 1.30.1

  • Re-added Telegram credential (ID: Csk5cg4HtaSqP5jJ)
  • Created "Mortimer Bot" workflow with Telegram Trigger v1
  • Problem: Telegram Trigger polling doesn't start
  • Evidence: No "Adding triggers and pollers" log message
  • Zero executions despite workflow active=1

12:00 PM - Upgrade Attempt #2: 1.90.3

  • Switched to 1.90.3 hoping for better stability
  • Problem: Switch/IF nodes don't trigger downstream
  • Workflows activate but never execute subsequent nodes

1:00 PM - Upgrade Attempt #3: 1.123.0

  • Tried latest version 1.123.0
  • Problem: Container crashed immediately (OOM kill)
  • Database corruption suspected
  • Rolled back to 1.30.1

2:00 PM - Webhook Approach

  • Abandoned Telegram Trigger, switched to webhooks
  • Created "Mortimer Working" workflow
  • Set Telegram webhook: https://flow.egonetix.de/webhook/fff67fbd-73f4-43b1-9860-a6bbf0b9e9e4
  • Problem: Webhook returns 404 "not registered"

2:30 PM - Debugging Webhook Issues

  • Database shows active=1
  • Logs show "=> Started"
  • n8n API returns {"active": true}
  • User toggled workflow off/on in UI
  • Still 404 errors

3:00 PM - Investigation

  • Tested 15+ different fixes (database toggles, restarts, API calls)
  • Discovered webhooks only register after manual execution in editor
  • Telegram Trigger polling completely broken in 1.30.1
  • User frustrated: "i dont understand why these strange problems occure. that seems to be not normal"

3:30 PM - Version Research

  • Checked GitHub releases: Latest stable = 1.122.4
  • Checked Docker Hub: Latest = 1.123.1
  • User already tried 1.122.4 → crashed system
  • Decision: Rollback to 1.19.4 required

Issues Found by Version

n8n 1.19.4 (WORKING)

  • Telegram Trigger polling: Works
  • Webhook registration: Works
  • Switch v1 syntax: Works
  • Stability: Excellent
  • Limitation: Switch v1 syntax (but works correctly)

n8n 1.30.1 (BROKEN)

  • Telegram Trigger polling: Never initializes

    • No "Adding triggers and pollers" log
    • TelegramTrigger.node.js exists but doesn't run
    • Credentials encrypted correctly
    • Zero executions despite active=1
  • Webhook registration: Requires manual UI execution

    • Workflow shows "=> Started"
    • Database active=1
    • Returns 404 "not registered"
    • Only works after opening in editor and executing
  • Switch v3 breaking change: ⚠️ Manual migration required

    • v1 syntax throws "Could not find property option"
    • Must update to v3.2 format manually
    • No automatic migration

n8n 1.90.3 (BROKEN)

  • Switch/IF nodes: Don't trigger downstream
    • Workflows activate
    • Nodes execute
    • But subsequent nodes never run
    • Different regression than 1.30.1

n8n 1.122.4 / 1.123.0 (CRASHES)

  • Container: OOM kill on startup
  • Database: Corruption suspected
  • Logs: Container doesn't stay running

Technical Details

Switch v3.2 Migration

Old v1 format (1.19.4):

{
  "typeVersion": 1,
  "options": {
    "rules": [
      {
        "value": {
          "conditions": [
            {
              "leftValue": "={{ $json.message.text }}",
              "rightValue": "/start",
              "operator": "equal"
            }
          ]
        }
      }
    ]
  }
}

New v3.2 format (1.30.1+):

{
  "typeVersion": 3.2,
  "rules": {
    "values": [
      {
        "conditions": {
          "conditions": [
            {
              "leftValue": "={{ $json.message.text }}",
              "rightValue": "/start",
              "operator": {
                "type": "string",
                "operation": "equals"
              }
            }
          ]
        }
      }
    ]
  }
}

Key Changes:

  1. options.rulesrules.values
  2. operator: "equal"operator: {type: "string", operation: "equals"}
  3. Nested conditions.conditions array structure

Telegram Trigger Polling Failure

Expected behavior (1.19.4):

Starting n8n...
Loading workflows...
   - "Mortimer Bot" (ID: 70c37130...)
     => Started
     => Adding triggers and pollers
     => Telegram polling started for chat...

Actual behavior (1.30.1):

Starting n8n...
Loading workflows...
   - "Mortimer Bot" (ID: 70c37130...)
     => Started
[No polling initialization]

Verification:

# Check if node exists
docker exec n8n find /usr -name "*Telegram*.node.js"
# Output: /usr/local/lib/node_modules/n8n/node_modules/n8n-nodes-base/dist/nodes/Telegram/TelegramTrigger.node.js

# Check credential
sqlite3 database.sqlite "SELECT id, name, type, LENGTH(data) FROM credentials_entity WHERE type = 'telegramApi'"
# Output: Csk5cg4HtaSqP5jJ|Telegram Bot|telegramApi|128

# Check webhook deleted (polling mode)
curl "https://api.telegram.org/bot<token>/getWebhookInfo"
# Output: {"url": "", "pending_update_count": 0}  <- no webhook = polling mode

# Check executions
sqlite3 database.sqlite "SELECT COUNT(*) FROM execution_entity WHERE workflowId = '70c37130...'"
# Output: 0  <- never ran

Root cause: n8n 1.30.1 doesn't initialize polling loop for TelegramTrigger nodes. Bug in trigger registration system.

Webhook Registration Failure

Test workflow:

{
  "name": "Mortimer Working",
  "nodes": [
    {
      "type": "n8n-nodes-base.webhook",
      "webhookId": "fff67fbd-73f4-43b1-9860-a6bbf0b9e9e4",
      "parameters": {
        "httpMethod": "POST",
        "path": "fff67fbd-73f4-43b1-9860-a6bbf0b9e9e4"
      }
    },
    {
      "type": "n8n-nodes-base.telegram",
      "parameters": {
        "chatId": "={{ $json.body.message.chat.id }}",
        "text": "Bot working!"
      }
    }
  ],
  "active": true
}

Attempted fixes:

  1. Database: UPDATE workflow_entity SET active = 0; UPDATE workflow_entity SET active = 1;
  2. Docker: docker restart n8n (5+ times)
  3. API: curl -X POST "${N8N_URL}/api/v1/workflows/aab68dba.../activate"
  4. UI toggle: User deactivated/reactivated (confirmed)
  5. Manual execution: Open in editor → "Execute workflow" button

Result: Only #5 works (manual execution in editor)

Test:

curl -X POST "https://flow.egonetix.de/webhook/fff67fbd-73f4-43b1-9860-a6bbf0b9e9e4" \
  -H "Content-Type: application/json" \
  -d '{"body":{"message":{"text":"test","chat":{"id":579304651}}}}'

# Output: {"code":404,"message":"The requested webhook \"POST fff67fbd...\" is not registered"}

Root cause: n8n 1.30.1 doesn't register production webhooks on workflow activation. Runtime webhook manager broken.


Workflows Created During Session

  1. Mortimer Bot (70c37130-84e0-4349-9251-2d82e4db2d64)

    • Telegram Trigger v1 → Switch v3 → Telegram Send
    • Status: Deactivated (polling doesn't work)
    • Executions: 0
  2. Mortimer Bot v2 (cc3069c6-3c99-424d-85c9-191d5baa3bf6)

    • Telegram Trigger v1.1 with webhookId
    • Status: Deactivated (polling doesn't work)
    • Executions: 0
  3. Simple Telegram Router (2ba483e8-ed62-4b88-b649-cd38550ab8aa)

    • Webhook → Telegram Send
    • Status: Deleted ("Workflow has no owner" error)
  4. Mortimer Working (aab68dba-1ccd-451d-9685-479096f4be51)

    • Webhook (fff67fbd-73f4-43b1-9860-a6bbf0b9e9e4) → Telegram Send
    • Telegram webhook set to this URL
    • Status: Active in database, 404 on requests
    • Executions: 0

Database Backups Created

ls -lh /home/icke/n8n/database.sqlite.backup*

Key backups:

  • database.sqlite.backup_before_incremental_upgrade_20251203_101040 - CLEAN 1.19.4 STATE
  • database.sqlite.backup_before_restore_20251203_141649 - Before rollback attempt
  • 12+ additional backups during debugging

Total size: ~9.7MB each


Credentials Status

Telegram Bot (mortimer_assi_bot):

  • ID: Csk5cg4HtaSqP5jJ (re-added manually during session)
  • Token: 8506559707:AAGn9dYm2PEuSGMbJ7jtiuIfGbl1ScaCxQk
  • Status: Working

OpenAI API:

  • ID: GPzxHwwXxeZg07o5 (openAiApi) - Working
  • ID: MATuNdkZclq5ISbr (httpHeaderAuth) - Invalid key

IMAP Email:

  • ID: BntHPR3YbFD5jAIM
  • Status: Working

Recommendations

Immediate Action

Rollback to n8n 1.19.4:

# Stop container
docker stop n8n

# Restore clean database
cp /home/icke/n8n/database.sqlite.backup_before_incremental_upgrade_20251203_101040 \
   /home/icke/n8n/database.sqlite

# Edit Dockerfile
cd /home/icke/compose_files
# Change: FROM n8nio/n8n:1.30.1
# To:     FROM n8nio/n8n:1.19.4

# Rebuild and restart
docker compose build n8n
docker compose up -d n8n

# Verify version
docker exec n8n n8n --version
# Expected: 1.19.4

# Test bot
# Send "/start" to @mortimer_assi_bot
# Should receive welcome message

Long-term Strategy

  1. Stay on 1.19.4 until n8n fixes critical bugs

  2. Monitor GitHub issues for fixes:

    • Telegram Trigger polling not initializing
    • Webhooks not registering on activation
    • Switch v3 migration documentation
  3. Test future versions on separate instance:

    # Create test database
    cp database.sqlite database.sqlite.test
    
    # Run test container
    docker run -d --name n8n-test \
      -v /home/icke/n8n-test:/home/node/.n8n \
      -e N8N_ENCRYPTION_KEY="B1W9cT+hha6ex4BTrhMtpRiW8kYkqcB0" \
      n8nio/n8n:1.x.x
    
    # Verify all features work
    # - Telegram Trigger polling
    # - Webhook registration
    # - Switch nodes
    # - All executions succeed
    
  4. Before any upgrade:

    • Export all workflows via UI
    • Backup database with date
    • Document all credentials
    • Test rollback procedure
    • Keep old version Docker image cached

Never Do Again

Don't upgrade n8n in production without testing Don't assume newer versions are stable Don't trust incremental upgrade paths Don't skip database backups Don't delete old Docker images until verified Don't modify production during work hours Don't assume API activation works like UI Don't trust container logs alone (check executions)

Always Do

Test upgrades on separate instance Create timestamped database backups Export all workflows before changes Verify credentials after migration Check execution_entity table for actual runs Monitor n8n GitHub issues before upgrading Keep documentation of working configurations Test all triggers/webhooks after changes Verify end-to-end functionality Have rollback plan ready


Key Insights

  1. n8n version numbers don't indicate stability

    • Latest ≠ stable
    • 1.122.4 crashed worse than 1.30.1
    • No clear upgrade path documented
  2. Database migrations are one-way

    • Cannot downgrade without backup
    • Schema changes break older versions
    • Must backup before ANY upgrade
  3. Trigger systems are fragile

    • Telegram Trigger completely broken in 1.30.1
    • Webhooks require manual registration
    • No errors logged, silent failures
  4. Testing is mandatory

    • "Worked on template workflow" ≠ "works in my instance"
    • UI shows active ≠ actually running
    • Database active=1 ≠ webhooks registered
  5. Community templates use newer syntax

    • n8n.io templates show Switch v3.2
    • This forced upgrade attempt
    • But v3.2 works fine once migrated manually

Questions for n8n Team

  1. Why doesn't Telegram Trigger polling initialize in 1.30.1?
  2. Why don't webhooks register on workflow activation?
  3. Why do Switch/IF nodes not trigger in 1.90.3?
  4. Why does 1.122.4 crash with OOM on startup?
  5. Is there an official migration guide for Switch v1 → v3?
  6. What is the recommended upgrade path from 1.19.4?
  7. Are there automated tests for trigger systems?
  8. Why don't logs show webhook registration failures?

User Quotes

"ok. stay on the current version" - User accepting to not upgrade initially

"i dont get it. can i interact with mortimer now or not?" - Confusion about bot status

"hm ok. the creds for the mortimer are gone. well, thank you....not" - Frustration with credential loss

"man i dont care what you do. just make this thing work" - Extreme frustration at multiple failures

"i dont understand why these strange problems occure. that seems to be not normal" - Questioning software quality

"well what is the latest stable version?" - Final question after 4 hours debugging

User patience level: Exhausted
User trust in n8n: Damaged
User trust in upgrade process: Zero


Success Criteria for Next Attempt

Before attempting any future upgrade:

  1. n8n GitHub issues show bugs fixed
  2. Community reports successful upgrades
  3. Release notes mention trigger/webhook fixes
  4. Test instance runs for 48+ hours
  5. All workflows execute successfully
  6. Zero 404 webhook errors
  7. Telegram Trigger shows polling logs
  8. Switch nodes trigger downstream
  9. User approves upgrade plan
  10. Rollback tested and ready

Until then: STAY ON 1.19.4


Session Date: December 3, 2025
Session Duration: ~4 hours
Workflows Created: 4
Database Backups: 14+
n8n Versions Tested: 4 (1.30.1, 1.90.3, 1.122.4, 1.123.0)
Successful Versions: 0
Status: Rollback to 1.19.4 recommended
Next Steps: Document learnings, continue another day