# Network Scanning and Visualization Tool - Architecture Design ## Executive Summary This document outlines the architecture for a network scanning and visualization tool that discovers hosts on a local network, collects network information, and presents it through an interactive web interface with Visio-style diagrams. ## 1. Technology Stack ### Backend - **Language**: Python 3.10+ - Rich ecosystem for network tools - Excellent library support - Cross-platform compatibility - Easy integration with system tools - **Web Framework**: FastAPI - Modern, fast async support - Built-in WebSocket support for real-time updates - Automatic API documentation - Type hints for better code quality - **Network Scanning**: - `python-nmap` - Python wrapper for nmap - `scapy` - Packet manipulation (fallback, requires privileges) - `socket` library - Basic connectivity checks (no root needed) - `netifaces` - Network interface enumeration - **Service Detection**: - `python-nmap` with service/version detection - Custom banner grabbing for common ports - `shodan` (optional) for service fingerprinting ### Frontend - **Framework**: React 18+ with TypeScript - Component-based architecture - Strong typing for reliability - Large ecosystem - Excellent performance - **Visualization**: - **Primary**: `react-flow` or `xyflow` - Modern, maintained library - Built for interactive diagrams - Great performance with many nodes - Drag-and-drop, zoom, pan built-in - **Alternative**: D3.js with `d3-force` for force-directed graphs - **Export**: `html2canvas` + `jsPDF` for PDF export - **UI Framework**: - Material-UI (MUI) or shadcn/ui - Responsive design - Professional appearance - **State Management**: - Zustand or Redux Toolkit - WebSocket integration for real-time updates ### Data Storage - **Primary**: SQLite - No separate server needed - Perfect for single-user/small team - Easy backup (single file) - Fast for this use case - **ORM**: SQLAlchemy - Powerful query builder - Migration support with Alembic - Type-safe with Pydantic models - **Cache**: Redis (optional) - Cache scan results - Rate limiting - Session management ### Deployment - **Development**: - Docker Compose for easy setup - Hot reload for both frontend and backend - **Production**: - Single Docker container or native install - Nginx as reverse proxy - systemd service file ## 2. High-Level Architecture ``` ┌─────────────────────────────────────────────────────────────┐ │ Web Browser │ │ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │ │ │ Dashboard │ │ Network │ │ Settings │ │ │ │ │ │ Diagram │ │ │ │ │ └──────────────┘ └──────────────┘ └──────────────┘ │ └───────────────────────┬─────────────────────────────────────┘ │ HTTP/WebSocket ▼ ┌─────────────────────────────────────────────────────────────┐ │ FastAPI Backend │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ REST API Endpoints │ │ │ │ /scan, /hosts, /topology, /export │ │ │ └──────────────────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ WebSocket Handler │ │ │ │ (Real-time scan progress and updates) │ │ │ └──────────────────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Business Logic Layer │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ │ │ Scanner │ │ Topology │ │ Exporter │ │ │ │ │ │ Manager │ │ Analyzer │ │ │ │ │ │ │ └────────────┘ └────────────┘ └────────────┘ │ │ │ └──────────────────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Scanning Engine │ │ │ │ ┌────────────┐ ┌────────────┐ ┌────────────┐ │ │ │ │ │ Nmap │ │ Socket │ │ Service │ │ │ │ │ │ Scanner │ │ Scanner │ │ Detector │ │ │ │ │ └────────────┘ └────────────┘ └────────────┘ │ │ │ └──────────────────────────────────────────────────────┘ │ │ ┌──────────────────────────────────────────────────────┐ │ │ │ Data Access Layer │ │ │ │ (SQLAlchemy ORM + Pydantic Models) │ │ │ └──────────────────────────────────────────────────────┘ │ └───────────────────────┬─────────────────────────────────────┘ ▼ ┌─────────────────────────────────────────────────────────────┐ │ SQLite Database │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │ │ Hosts │ │ Ports │ │ Scans │ │ Topology │ │ │ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │ └─────────────────────────────────────────────────────────────┘ ``` ### Component Responsibilities #### Frontend Components 1. **Dashboard**: Overview, scan statistics, recently discovered hosts 2. **Network Diagram**: Interactive visualization with zoom/pan/drag 3. **Host Details**: Detailed view of individual hosts 4. **Scan Manager**: Configure and trigger scans 5. **Settings**: Network ranges, scan profiles, preferences #### Backend Components 1. **Scanner Manager**: Orchestrates scanning operations, manages scan queue 2. **Topology Analyzer**: Detects relationships and connections between hosts 3. **Exporter**: Generates PDF, PNG, JSON exports 4. **WebSocket Handler**: Pushes real-time updates to clients ## 3. Network Scanning Approach ### Scanning Strategy (No Root Required) #### Phase 1: Host Discovery ```python # Primary method: TCP SYN scan to common ports (no root) Target ports: 22, 80, 443, 445, 3389, 8080 Method: Socket connect() with timeout Parallelization: ThreadPoolExecutor with ~50 workers ``` **Advantages**: - No root required - Reliable on most networks - Fast with parallelization **Implementation**: ```python import socket from concurrent.futures import ThreadPoolExecutor def check_host(ip: str, ports: list[int] = [22, 80, 443]) -> bool: for port in ports: try: sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.settimeout(1) result = sock.connect_ex((ip, port)) sock.close() if result == 0: return True except: continue return False ``` #### Phase 2: Port Scanning (with nmap fallback) **Option A: Without Root (Preferred)** ```python # Use python-nmap with -sT (TCP connect scan) # Or implement custom TCP connect scanner nmap_args = "-sT -p 1-1000 --open -T4" ``` **Option B: With Root (Better accuracy)** ```python # Use nmap with SYN scan nmap_args = "-sS -p 1-65535 --open -T4" ``` **Scanning Profiles**: 1. **Quick Scan**: Top 100 ports, 254 hosts in ~30 seconds 2. **Standard Scan**: Top 1000 ports, ~2-3 minutes 3. **Deep Scan**: All 65535 ports, ~15-20 minutes 4. **Custom**: User-defined port ranges #### Phase 3: Service Detection ```python # Service version detection nmap_args += " -sV" # OS detection (requires root, optional) # nmap_args += " -O" # Custom banner grabbing for common services def grab_banner(ip: str, port: int) -> str: sock = socket.socket() sock.settimeout(3) sock.connect((ip, port)) banner = sock.recv(1024).decode('utf-8', errors='ignore') sock.close() return banner ``` #### Phase 4: DNS Resolution ```python import socket def resolve_hostname(ip: str) -> str: try: return socket.gethostbyaddr(ip)[0] except: return None ``` ### Connection Detection **Passive Methods** (no root needed): 1. **Traceroute Analysis**: Detect gateway/routing paths 2. **TTL Analysis**: Group hosts by TTL to infer network segments 3. **Response Time**: Measure latency patterns 4. **Port Patterns**: Hosts with similar open ports likely same segment **Active Methods** (require root): 1. **ARP Cache**: Parse ARP table for MAC addresses 2. **Packet Sniffing**: Capture traffic with scapy (requires root) **Recommended Approach**: ```python # Detect default gateway import netifaces def get_default_gateway(): gws = netifaces.gateways() return gws['default'][netifaces.AF_INET][0] # Infer topology based on scanning data def infer_topology(hosts): gateway = get_default_gateway() topology = { 'gateway': gateway, 'segments': [], 'connections': [] } # Group hosts by response characteristics # Connect hosts to gateway # Detect server-client relationships (open ports) return topology ``` ### Safety Considerations 1. **Rate Limiting**: Max 50 concurrent connections, 1-2 second delays 2. **Timeout Control**: 1-3 second socket timeouts 3. **Scan Scope**: Only scan RFC1918 private ranges by default 4. **User Consent**: Clear warnings about network scanning 5. **Logging**: Comprehensive audit trail ## 4. Visualization Strategy ### Graph Layout **Primary Algorithm**: Force-Directed Layout - **Library**: D3-force or react-flow's built-in layouts - **Advantages**: Natural, organic appearance; automatic spacing - **Best for**: Networks with < 100 nodes **Alternative Algorithms**: 1. **Hierarchical (Layered)**: Gateway at top, subnets in layers 2. **Circular**: Hosts arranged in circles by subnet 3. **Grid**: Organized grid layout for large networks ### Visual Design #### Node Representation ```javascript { id: string, type: 'gateway' | 'server' | 'workstation' | 'device' | 'unknown', position: { x, y }, data: { ip: string, hostname: string, openPorts: number[], services: Service[], status: 'online' | 'offline' | 'scanning' } } ``` **Visual Properties**: - **Shape**: - Gateway: Diamond - Server: Cylinder/Rectangle - Workstation: Monitor icon - Device: Circle - **Color**: - By status (green=online, red=offline, yellow=scanning) - Or by type - **Size**: Proportional to number of open ports - **Labels**: IP + hostname (if available) #### Edge Representation ```javascript { id: string, source: string, target: string, type: 'network' | 'service', data: { latency: number, bandwidth: number // if detected } } ``` **Visual Properties**: - **Width**: Connection strength/frequency - **Color**: Connection type - **Style**: Solid for confirmed, dashed for inferred - **Animation**: Pulse effect for active scanning ### Interactive Features 1. **Node Interactions**: - Click: Show host details panel - Hover: Tooltip with quick info - Drag: Reposition (sticky after drop) - Double-click: Focus/isolate node 2. **Canvas Interactions**: - Pan: Click and drag background - Zoom: Mouse wheel or pinch - Minimap: Overview navigator - Selection: Lasso or box select 3. **Controls**: - Layout algorithm selector - Filter by: type, status, ports - Search/highlight hosts - Export button - Refresh/rescan ### React-Flow Implementation Example ```typescript import ReactFlow, { Node, Edge, Controls, MiniMap, Background } from 'reactflow'; import 'reactflow/dist/style.css'; const NetworkDiagram: React.FC = () => { const [nodes, setNodes] = useState([]); const [edges, setEdges] = useState([]); useEffect(() => { // Fetch topology from API fetch('/api/topology') .then(r => r.json()) .then(data => { setNodes(data.nodes); setEdges(data.edges); }); }, []); return ( ); }; ``` ## 5. Data Model ### Database Schema ```sql -- Scans table: Track scanning operations CREATE TABLE scans ( id INTEGER PRIMARY KEY AUTOINCREMENT, started_at TIMESTAMP NOT NULL, completed_at TIMESTAMP, scan_type VARCHAR(50), -- 'quick', 'standard', 'deep', 'custom' network_range VARCHAR(100), -- '192.168.1.0/24' status VARCHAR(20), -- 'running', 'completed', 'failed' hosts_found INTEGER DEFAULT 0, ports_scanned INTEGER DEFAULT 0, error_message TEXT ); -- Hosts table: Discovered network hosts CREATE TABLE hosts ( id INTEGER PRIMARY KEY AUTOINCREMENT, ip_address VARCHAR(45) NOT NULL UNIQUE, -- Support IPv4 and IPv6 hostname VARCHAR(255), mac_address VARCHAR(17), first_seen TIMESTAMP NOT NULL, last_seen TIMESTAMP NOT NULL, status VARCHAR(20), -- 'online', 'offline' os_guess VARCHAR(255), device_type VARCHAR(50), -- 'gateway', 'server', 'workstation', etc. vendor VARCHAR(255), -- Based on MAC OUI lookup notes TEXT, INDEX idx_ip (ip_address), INDEX idx_status (status), INDEX idx_last_seen (last_seen) ); -- Ports table: Open ports for each host CREATE TABLE ports ( id INTEGER PRIMARY KEY AUTOINCREMENT, host_id INTEGER NOT NULL, port_number INTEGER NOT NULL, protocol VARCHAR(10) DEFAULT 'tcp', -- 'tcp', 'udp' state VARCHAR(20), -- 'open', 'closed', 'filtered' service_name VARCHAR(100), service_version VARCHAR(255), banner TEXT, first_seen TIMESTAMP NOT NULL, last_seen TIMESTAMP NOT NULL, FOREIGN KEY (host_id) REFERENCES hosts(id) ON DELETE CASCADE, UNIQUE(host_id, port_number, protocol), INDEX idx_host_port (host_id, port_number) ); -- Connections table: Detected relationships between hosts CREATE TABLE connections ( id INTEGER PRIMARY KEY AUTOINCREMENT, source_host_id INTEGER NOT NULL, target_host_id INTEGER NOT NULL, connection_type VARCHAR(50), -- 'gateway', 'same_subnet', 'service' confidence FLOAT, -- 0.0 to 1.0 detected_at TIMESTAMP NOT NULL, last_verified TIMESTAMP, metadata JSON, -- Additional connection details FOREIGN KEY (source_host_id) REFERENCES hosts(id) ON DELETE CASCADE, FOREIGN KEY (target_host_id) REFERENCES hosts(id) ON DELETE CASCADE, INDEX idx_source (source_host_id), INDEX idx_target (target_host_id) ); -- Scan results: Many-to-many relationship CREATE TABLE scan_hosts ( scan_id INTEGER NOT NULL, host_id INTEGER NOT NULL, FOREIGN KEY (scan_id) REFERENCES scans(id) ON DELETE CASCADE, FOREIGN KEY (host_id) REFERENCES hosts(id) ON DELETE CASCADE, PRIMARY KEY (scan_id, host_id) ); -- Settings table: Application configuration CREATE TABLE settings ( key VARCHAR(100) PRIMARY KEY, value TEXT NOT NULL, updated_at TIMESTAMP NOT NULL ); ``` ### Pydantic Models (API) ```python from pydantic import BaseModel, IPvAnyAddress from datetime import datetime from typing import Optional, List class PortInfo(BaseModel): port_number: int protocol: str = "tcp" state: str service_name: Optional[str] service_version: Optional[str] banner: Optional[str] class HostBase(BaseModel): ip_address: str hostname: Optional[str] mac_address: Optional[str] class HostCreate(HostBase): pass class Host(HostBase): id: int first_seen: datetime last_seen: datetime status: str device_type: Optional[str] os_guess: Optional[str] vendor: Optional[str] ports: List[PortInfo] = [] class Config: from_attributes = True class Connection(BaseModel): id: int source_host_id: int target_host_id: int connection_type: str confidence: float class TopologyNode(BaseModel): id: str type: str position: dict data: dict class TopologyEdge(BaseModel): id: str source: str target: str type: str class Topology(BaseModel): nodes: List[TopologyNode] edges: List[TopologyEdge] class ScanConfig(BaseModel): network_range: str scan_type: str = "quick" port_range: Optional[str] = None include_service_detection: bool = True class ScanStatus(BaseModel): scan_id: int status: str progress: float # 0.0 to 1.0 hosts_found: int current_host: Optional[str] ``` ## 6. Security and Ethical Considerations ### Legal and Ethical 1. **Authorized Access Only**: - Display prominent warning on first launch - Require explicit confirmation to scan - Default to scanning only local subnet - Log all scanning activities 2. **Privacy**: - Don't store sensitive data (passwords, traffic content) - Encrypt database if storing on shared systems - Clear privacy policy 3. **Network Impact**: - Rate limiting to prevent network disruption - Respect robots.txt and similar mechanisms - Provide "stealth mode" with slower scans ### Application Security 1. **Authentication** (if multi-user): ```python # JWT-based authentication # Or simple API key for single-user ``` 2. **Input Validation**: ```python import ipaddress def validate_network_range(network: str) -> bool: try: net = ipaddress.ip_network(network) # Only allow private ranges return net.is_private except ValueError: return False ``` 3. **Command Injection Prevention**: ```python # Never use shell=True # Sanitize all inputs to nmap import shlex def safe_nmap_scan(target: str): # Validate target if not validate_ip(target): raise ValueError("Invalid target") # Use subprocess safely cmd = ["nmap", "-sT", target] result = subprocess.run(cmd, capture_output=True) ``` 4. **API Security**: - CORS configuration for production - Rate limiting on scan endpoints - Request validation with Pydantic - HTTPS in production 5. **File System Security**: - Restrict database file permissions (600) - Validate export file paths - Limit export file sizes ### Deployment Security 1. **Docker Security**: ```dockerfile # Run as non-root user USER appuser # Drop unnecessary capabilities # No --privileged flag unless explicitly needed for root scans ``` 2. **Network Isolation**: - Run in Docker network - Expose only necessary ports - Use reverse proxy (nginx) 3. **Updates**: - Keep dependencies updated - Regular security audits - Dependabot/Renovate integration ## 7. Implementation Roadmap ### Phase 1: Core Scanning (Week 1-2) - [ ] Basic host discovery (socket-based) - [ ] SQLite database setup - [ ] Simple CLI interface - [ ] Store scan results ### Phase 2: Enhanced Scanning (Week 2-3) - [ ] Integrate python-nmap - [ ] Service detection - [ ] Port scanning profiles - [ ] DNS resolution ### Phase 3: Backend API (Week 3-4) - [ ] FastAPI setup - [ ] REST endpoints for scans, hosts - [ ] WebSocket for real-time updates - [ ] Basic topology inference ### Phase 4: Frontend Basics (Week 4-5) - [ ] React setup with TypeScript - [ ] Dashboard with host list - [ ] Scan configuration UI - [ ] Host detail view ### Phase 5: Visualization (Week 5-6) - [ ] React-flow integration - [ ] Force-directed layout - [ ] Interactive node/edge rendering - [ ] Real-time updates via WebSocket ### Phase 6: Polish (Week 6-7) - [ ] Export functionality (PDF, PNG, JSON) - [ ] Advanced filters and search - [ ] Settings and preferences - [ ] Error handling and validation ### Phase 7: Deployment (Week 7-8) - [ ] Docker containerization - [ ] Documentation - [ ] Security hardening - [ ] Testing and bug fixes ## 8. Technology Justification ### Why Python? - **Proven**: Industry standard for network tools - **Libraries**: Excellent support for network operations - **Maintainability**: Readable, well-documented - **Community**: Large community for troubleshooting ### Why FastAPI? - **Performance**: Comparable to Node.js/Go - **Modern**: Async/await support out of the box - **Type Safety**: Leverages Python type hints - **Documentation**: Auto-generated OpenAPI docs ### Why React + TypeScript? - **Maturity**: Battle-tested in production - **TypeScript**: Catches errors at compile time - **Ecosystem**: Vast library ecosystem - **Performance**: Virtual DOM, efficient updates ### Why react-flow? - **Purpose-Built**: Designed for interactive diagrams - **Performance**: Handles 1000+ nodes smoothly - **Features**: Built-in zoom, pan, minimap, selection - **Customization**: Easy to style and extend ### Why SQLite? - **Simplicity**: No separate database server - **Performance**: Fast for this use case - **Portability**: Single file, easy backup - **Reliability**: Well-tested, stable ## 9. Alternative Architectures Considered ### Alternative 1: Electron Desktop App **Pros**: Native OS integration, no web server **Cons**: Larger bundle size, more complex deployment **Verdict**: Web-based is more flexible ### Alternative 2: Go Backend **Pros**: Better performance, single binary **Cons**: Fewer network libraries, steeper learning curve **Verdict**: Python's ecosystem wins for this use case ### Alternative 3: Vue.js Frontend **Pros**: Simpler learning curve, good performance **Cons**: Smaller ecosystem, fewer diagram libraries **Verdict**: React's ecosystem is more mature ### Alternative 4: Cytoscape.js Visualization **Pros**: Powerful graph library, many layouts **Cons**: Steeper learning curve, heavier bundle **Verdict**: react-flow is more modern and easier ## 10. Monitoring and Observability ### Logging Strategy ```python import logging from logging.handlers import RotatingFileHandler # Structured logging logger = logging.getLogger("network_scanner") handler = RotatingFileHandler( "scanner.log", maxBytes=10*1024*1024, # 10MB backupCount=5 ) logger.addHandler(handler) # Log levels: # INFO: Scan started/completed, hosts discovered # WARNING: Timeouts, connection errors # ERROR: Critical failures # DEBUG: Detailed scanning operations ``` ### Metrics to Track - Scan duration - Hosts discovered per scan - Average response time per host - Error rates - Database size growth ## 11. Future Enhancements 1. **Advanced Features**: - Vulnerability scanning (integrate with CVE databases) - Network change detection and alerting - Historical trend analysis - Automated scheduling 2. **Integrations**: - Import/export to other tools (Nessus, Wireshark) - Webhook notifications - API for external tools 3. **Visualization**: - 3D network visualization - Heat maps for traffic/activity - Time-lapse replay of network changes 4. **Scalability**: - Support for multiple subnets - Distributed scanning with agents - PostgreSQL for larger deployments --- ## Quick Start Command Summary ```bash # Install dependencies pip install fastapi uvicorn python-nmap sqlalchemy pydantic netifaces # Frontend npx create-react-app network-scanner --template typescript npm install reactflow @mui/material axios # Run development uvicorn main:app --reload # Backend npm start # Frontend # Docker deployment docker-compose up ``` --- **Document Version**: 1.0 **Last Updated**: December 4, 2025 **Author**: ArchAgent