- Comprehensive analysis of all 39 running containers - Identified critical issues: container names, hardcoded passwords, network conflicts - 4-phase improvement plan prioritized by risk and impact - Documents specific tasks for security, stability, and upgrades - Includes statistics and implementation guidelines
10 KiB
Docker Infrastructure Improvement Roadmap
Generated: November 11, 2025
Status: Planning Phase
Total Services: 39 running containers
Overview
This roadmap addresses critical issues, security vulnerabilities, and operational improvements identified in the Docker Compose infrastructure. The plan is divided into 4 phases, prioritizing quick wins and critical security issues first.
Phase 1: Quick Wins (Low Risk, High Impact)
Estimated Time: 2-4 hours
Risk Level: Low
Downtime: Minimal
Tasks
-
Change N8N password from "changeme" to secure password
- File:
n8n.yml - Impact: Critical security fix
- Downtime: < 1 minute
- File:
-
Add healthchecks to critical services
- Bitwarden (password manager)
- Gitea (code repository)
- N8N (automation)
- Synapse (Matrix server)
- MariaDB instances
- Benefit: Auto-restart on failure, better monitoring
-
Enable Loki logging for remaining 15 services
- Services missing logging: element-web, telegram-bridge, whatsapp-bridge, piper, whisper, gitea, coturn, trading-bot, postgres, and others
- Benefit: Centralized log management
-
Add
depends_onto multi-container stacks- Blog → mysql-blog
- Helferlein → mysql-helferlein
- Traccar → mysql-traccar
- Zabbix components
- Matrix bridges → Synapse
- Benefit: Proper startup order
Phase 2: Security Hardening (Medium Risk)
Estimated Time: 4-8 hours
Risk Level: Medium
Downtime: 5-10 minutes per service
Tasks
-
Move passwords to environment files
- Create
/home/icke/env_files/directory structure - Move passwords from compose files to
.envfiles:- blog.yml →
eccmts42* - nextcloud.yml →
eccmts42* - helferlein.yml →
eccmts42* - traccar.yml →
eccmts42* - wallabag.yml →
eccmts42* - zabbix.yml →
eccmts42* - firefly.yml →
firefly_secure_password_123 - matamo.yml →
matomo - n8n.yml → new secure password
- blog.yml →
- Update
.gitignoreto exclude.envfiles - Document password locations in separate secure file
- Create
-
Move admin tokens to secrets
- Bitwarden admin token → env file
- Firefly cron token → env file
- Coturn static auth secret → config file
-
Create dedicated networks for isolated services
- Element-web (currently no network)
- Telegram-bridge (currently no network)
- Whatsapp-bridge (currently no network)
- Piper (currently no network)
- Whisper (currently no network)
- Coturn (currently no network)
-
Remove services from shared default network
- Services on
compose_files_default:- n8n → dedicated network
- plex → dedicated network
- whisper → dedicated network
- unifi → dedicated network
- synapse + bridges → shared matrix network
- piper → dedicated network
- coturn → can stay (needs to be accessible)
- Services on
-
Remove deprecated
links:directives (7 instances)- blog.yml
- helferlein.yml
- traccar.yml
- zabbix.yml
- Replace with network aliases and
depends_on
-
Review and fix user permissions
- Plex: Change from UID=0 to proper user
- Jellyfin: Change from UID=0 to proper user
- Verify other services aren't running as root unnecessarily
Phase 3: Stability & Reliability Improvements (Medium-High Risk)
Estimated Time: 8-16 hours
Risk Level: Medium-High
Downtime: 10-30 minutes per service
Tasks
-
Remove
container_namefrom all services (54 instances)- Use compose project naming with network aliases instead
- Prevents stale endpoint issues after
docker system prune - Priority services:
- bitwarden.yml
- blog.yml
- gitea.yml
- jellyfin.yml
- plex.yml
- synapse.yml
- n8n.yml
- unifi.yml
- zabbix.yml (multiple containers)
- firefly.yml (multiple containers)
- Element-web, bridges (all)
- Trading bot components
- Note: Nextcloud already fixed ✅
-
Remove static IP addresses (16 instances)
- bitwarden.yml → use DNS aliases
- blog.yml → use DNS aliases
- jellyfin.yml → use DNS aliases
- zabbix.yml → use DNS aliases
- Replace with network aliases for service discovery
-
Add resource limits to all services
- Template (adjust per service):
deploy: resources: limits: memory: 1G cpus: '0.5' reservations: memory: 256M - Priority services to limit:
- Plex (media server - high memory)
- Jellyfin (media server - high memory)
- N8N (automation - can grow)
- Nextcloud (web app - high memory)
- Synapse (Matrix - high memory)
- MySQL/MariaDB instances
- Zabbix server
- Less critical services: 512M limits
- Template (adjust per service):
-
Standardize compose file format
- Remove
version:declarations (deprecated in current compose spec) - Use consistent YAML formatting
- Add comments for complex configurations
- Remove
-
Add volume backup labels/annotations
- Label critical data volumes:
- Bitwarden data
- Gitea data
- Nextcloud data
- Database volumes
- N8N workflows
- Prepare for automated backup solutions
- Label critical data volumes:
Phase 4: Software Upgrades (High Risk)
Estimated Time: 4-8 hours
Risk Level: High
Downtime: 30-60 minutes per service
Recommendation: Test in development first
Tasks
-
Upgrade EOL MySQL 5.7 to MariaDB 10.11+
- Blog (mysql-blog)
- Backup database
- Export data
- Switch to MariaDB
- Import data
- Test thoroughly
- Helferlein (mysql-helferlein)
- Same process as blog
- Blog (mysql-blog)
-
Upgrade Zabbix 6.4 → 7.0+
- Current:
zabbix/zabbix-server-mysql:6.4-ubuntu-latest - Target:
zabbix/zabbix-server-mysql:7.0-alpine-latest - Steps:
- Read Zabbix 7.0 migration guide
- Backup Zabbix database
- Update images in zabbix.yml
- Test web UI and agents
- Current:
-
Pin
:latesttags to specific versions- Services currently using
:latest:- Synapse
- Element-web
- Jellyfin
- Gitea
- Telegram-bridge
- Whatsapp-bridge
- And others
- Benefit: Predictable updates, easier rollback
- Services currently using
-
Consider N8N database backend migration
- Current: File-based storage
- Recommended: PostgreSQL for better performance
- Would require N8N reconfiguration
-
Review Unifi duplicate mount
- Currently mounts
/home/icke/unifito both/configand/data - Clean up redundant configuration
- Currently mounts
Critical Services Priority List
Fix these services first due to security/stability concerns:
- N8N (automation) - Weak password, no network isolation
- Bitwarden (passwords) - Exposed admin token
- Gitea (code repo) - No healthcheck, no dedicated network
- Blog/Helferlein - EOL MySQL version
- Synapse + Bridges - Network architecture needs improvement
- Services on compose_files_default - Need network isolation
Statistics
- Total Services: 39 running containers
- Services with
container_name: 54 instances - Services with hardcoded passwords: 20+ instances
- Services using deprecated
links: 7 instances - Services with static IPs: 16 instances
- Services with Loki logging: 24/39 (61%)
- Services with healthchecks: 2/39 (5%)
- Services with resource limits: 1/39 (3%)
- Services using old MySQL 5.7: 2 instances
- Shared networks: 13 custom networks (some overloaded)
Implementation Notes
Before Starting Any Phase
-
Full system backup
- Backup all
/home/icke/directories - Export all databases
- Document current working state
- Backup all
-
Create rollback plan
- Keep old compose files as
.yml.backup - Document current container states
- Test rollback procedure
- Keep old compose files as
-
Schedule maintenance window
- Notify users of potential downtime
- Choose low-traffic time period
- Have monitoring ready
Testing Strategy
- Test changes on one service first
- Monitor for 24 hours
- Apply to similar services in batches
- Keep previous configs for quick rollback
Success Criteria
- All services start successfully
- No stale endpoint errors after
docker system prune - All services accessible via their original URLs/ports
- Logs flowing to Loki
- Healthchecks reporting healthy status
Maintenance Schedule Recommendation
- Phase 1: Can be done immediately, low risk
- Phase 2: Schedule over 2-3 weekends
- Phase 3: One service per weekend, monitor for a week
- Phase 4: Full maintenance window, test environment first
Additional Recommendations
Future Improvements (Not in Roadmap)
- Consider Traefik/Nginx Proxy Manager for unified reverse proxy
- Implement automated backup solution (Duplicati, Restic, etc.)
- Add Prometheus monitoring for metrics collection
- Consider Watchtower for automated updates (carefully configured)
- Create Docker Swarm or K8s cluster for HA (if needed)
- Implement secrets management (Vault, Docker Secrets)
- Add CI/CD pipeline for compose file validation
Documentation
- Document network architecture diagram
- Create service dependency map
- Maintain service inventory with versions
- Document backup and restore procedures
- Create runbooks for common issues
Progress Tracking
Use this section to track completion:
Phase 1: [ ] 0/4 major tasks
Phase 2: [ ] 0/7 major tasks
Phase 3: [ ] 0/5 major tasks
Phase 4: [ ] 0/5 major tasks
Overall Progress: 0%
Notes & Decisions
Document any decisions or deviations from this roadmap here:
- 2025-11-11: Roadmap created based on infrastructure analysis
- 2025-11-11: Nextcloud fixed (removed container_name, added dedicated network)
Last Updated: 2025-11-11
Next Review: After Phase 1 completion