Overview
A State Wide Area Network (SWAN) is the secure backbone that interconnects state, district, and block government offices into a Closed User Group (CUG). It delivers e-governance services reliably and at scale.
Designed as a tiered network with Points of Presence (PoPs) at the State Headquarters (SHQ), District Headquarters (DHQ), and Block Headquarters (BHQ), a SWAN underpins critical workloads. Typical examples include citizen service delivery, treasury systems, public safety coordination, telemedicine, and education.
For State CIOs and network leaders, a well-governed SWAN improves uptime and reduces last-mile risk. It also enforces compliance and controls total cost of ownership (TCO).
This guide provides architecture patterns, bandwidth formulas, SLA targets, security blueprints, compliance mappings, procurement criteria, and a migration playbook. Use it to move from concept to contract and operations with confidence.
Definition and scope of a State Wide Area Network
A State Wide Area Network (SWAN) is a dedicated, state-run or state-managed CUG network. It connects government entities across the SHQ–DHQ–BHQ hierarchy.
It provides secure IP connectivity for data, voice (VoIP), video conferencing, and application access. This connectivity spans departments, state data center(s), and GovCloud.
Unlike the public internet, a SWAN is engineered for deterministic performance and strong identity and encryption controls. It also enforces policy-aligned data residency.
The scope typically includes:
- Core and aggregation backbones at SHQ and DHQs, with access PoPs at BHQs and selected field offices.
- Last-mile links via optical fiber (OFC), leased lines/MPLS VPNs, RF, VSAT, or cellular (4G/5G), chosen per terrain and risk.
- Managed services (e.g., NOC/SOC), QoS enforcement for voice/video, and interconnects to state data centers and GovCloud.
Where does SWAN fit versus SD-WAN or MPLS? SWAN is the governance and service construct for a statewide CUG. MPLS VPNs and SD-WAN are technologies or service abstractions used inside it.
Many modern SWANs combine MPLS backbones with SD-WAN at the edge. This approach enables path selection, application-aware routing, and faster rollout while preserving compliance and CUG security.
Core objectives and policy context (NeGP, NIC)
SWANs were championed under the National e-Governance Plan (NeGP). The goal is to provide states a secure, shared network utility for digital public services.
MeitY’s program guidance defines SWAN’s role in connecting all administrative tiers and fostering reuse across departments. See the MeitY SWAN program for policy background and objectives.
In many states, the National Informatics Centre (NIC) operates or co-operates parts of the network. This is common where states adopt the NIC implementation model.
A mature SWAN aligns to key outcomes. These include availability for critical services, an auditable security posture, and interoperability with state data centers and GovCloud (e.g., MeghRaj). It also delivers cost transparency that enables departmental chargeback.
Governance typically includes a nodal agency, a Project Management Unit (PMU), a Third Party Auditor (TPA), and service integrators or telcos. These parties are bound by SLAs and penalties.
Core architecture: PoPs and tiered topology (SHQ/DHQ/BHQ)
The SWAN topology is hierarchical to balance scalability and fault containment. The SHQ hosts the core with redundant routers, firewalls, key management, and DCI to primary and DR data centers.
DHQs aggregate BHQs and district applications. They often host caching, VoIP or media anchors, and local security controls.
BHQs serve as access PoPs for block offices, police stations, schools, PHCs, and kiosks.
Vertical links run SHQ↔DHQ↔BHQ. Horizontal links (DHQ↔DHQ or BHQ↔BHQ) are selectively enabled for resilience and low-latency collaboration.
Dual-homing critical DHQs to separate SHQ cores, or to metro aggregation rings, reduces single points of failure. IPv6 dual-stack is recommended statewide. It future-proofs addressing and supports modern services without NAT complexity.
PoP roles, redundancy, and inter-PoP routing choices
Each PoP has distinct responsibilities. SHQ provides core services such as routing, DNS, identity, PKI/HSM, SIEM/SOAR, and DDoS protection. It also manages DCI and central internet breakout policies.
DHQs aggregate traffic and implement local QoS. They sometimes host edge compute and maintain district-level identity caches and logging.
BHQs focus on secure access, QoS marking, and last-mile termination. SD-WAN CPEs can abstract underlay heterogeneity at these edges.
For inter-PoP routing, combine interior protocols (e.g., OSPF or IS-IS) with route summarization toward the core. Use eBGP for service-provider or internet edges.
MPLS L3VPNs or segment routing offer deterministic paths and fast reroute in the backbone. SD-WAN overlays at BHQ or DHQ edges enhance application-aware path selection across diverse underlays.
Resilience comes from dual uplinks and provider diversity. Backbone designs such as ring or mesh topologies, Graceful Restart, and BFD help with fast convergence. Where available, use MACsec on fiber backbones to secure L2 links without IP overhead.
Access technologies and last‑mile selection by terrain and risk
Last-mile selection drives both uptime and cost. In urban and semi-urban corridors, OFC on diverse paths offers the best throughput and MTTR.
In mixed or rugged terrain, combine leased-line/MPLS, licensed RF, and 4G/5G for rapid deployment and diversity. In remote, forested, or island blocks, VSAT or emerging LEO satellite can deliver baseline connectivity while you work toward terrestrial upgrades.
Match last-mile options to each BHQ’s geography and hazards such as flood, landslide, or road cuts. Consider service criticality and user impact.
Where fiber exists, negotiate two physically segregated routes. Aim for separate ducts, bridges, and telco POPs.
For non-fiber primaries, consider RF or 4G/5G as primary with VSAT/LEO as backup. This guarantees a floor of service for critical voice and data.
Decision criteria: availability, MTTR, diversity, and cost
Choose last-mile per site using a simple scorecard you can operationalize and audit:
- Availability target and MTTR: Can the medium meet 99.5–99.9% site uptime with MTTR ≤8 hours? Fiber often beats RF/cellular on MTTR when paths are diverse.
- Physical and provider diversity: Are routes and providers truly independent (separate trenches, POPs, power grids)?
- Throughput and latency: Can it sustain QoS targets for voice/video (e.g., <150 ms one-way voice per ITU-T G.114)?
- Time-to-deploy and permitting: Will RF licensing, right-of-way, or tower works delay rollout?
- TCO: Do recurring charges and field support fit the 3–5 year budget window, including spares and SLAs?
Score options 1–5 on each criterion. Pick the top configuration per site.
Enforce a minimum of two independent last-mile paths for high-criticality BHQs and all DHQs.
Bandwidth planning benchmarks and QoS for voice, video, and data
Right-sizing bandwidth avoids congestion and unplanned Opex. Size by application mix and concurrency. Then enforce QoS to protect voice, video, and critical apps.
A workable sizing model is:
- Voice: calls_concurrent × per_call_bw. Use 90 kbps for G.711 or ~30 kbps for G.729 including overhead.
- Video conferencing: streams_concurrent × 1.2–2.5 Mbps per HD stream (downsize to 512–768 kbps for SD).
- Data: concurrent_users × 150–300 kbps for general productivity and e-services; raise for heavy transactional or GIS workloads.
- Overhead and growth: add 20–30% headroom for bursts and telemetry; plan 20–30% YoY growth or per your digital roadmap.
QoS classes keep service outcomes predictable. Mark VoIP as EF (Expedited Forwarding), video as AF41/AF42, and critical apps as AF31–AF33 with per-hop policing and shaping; see IETF RFC 2474 DSCP.
Target class-based SLOs. Voice latency ≤100 ms one-way within the state, jitter ≤20 ms, and loss ≤0.3%. Video latency ≤150 ms and loss ≤0.5%. Critical data loss ≤0.1% where feasible.
Enforce trust boundaries at BHQs with ingress remarking. Maintain shaping at DHQs, and validate class utilization in NOC dashboards.
Example calculations by tier (SHQ/DHQ/BHQ) and growth models
Consider a district with 1,500 users at BHQ-linked offices, of whom 25% are concurrently active (375). Assume 10% make VoIP calls simultaneously (38 calls, G.729 ≈30 kbps), and 4 concurrent HD video streams (1.5 Mbps each):
- Voice: 38 × 30 kbps ≈ 1.14 Mbps
- Video: 4 × 1.5 Mbps = 6 Mbps
- Data: 375 × 200 kbps = 75 Mbps
- Subtotal: ~82.14 Mbps; with 30% headroom ≈ 107 Mbps. Round to a 150 Mbps last-mile to manage bursts and telemetry.
At DHQ aggregating five such BHQs with traffic diversity (not all peak at once), apply a diversification factor (e.g., 0.6–0.7). Five × 107 Mbps × 0.65 ≈ 348 Mbps. Provision 500–1000 Mbps at DHQ, depending on local apps.
At SHQ, aggregate all DHQs. Add data-center east–west, internet egress, and DR replication. For a mid-sized state (30 districts), core uplinks of 10–40 Gbps are typical. Size DCI for replication RPO (e.g., 2–5 Gbps per critical system).
For growth, assume 25% YoY. A 500 Mbps DHQ link grows to ~1.22 Gbps over three years. Plan upgrade cadence and optical capacity accordingly.
Security architecture and compliance for SWANs
Treat SWAN as critical infrastructure. Enforce zero trust principles, encrypt in transit, segment aggressively, and centralize visibility.
Use IPsec between PoPs where the underlay is untrusted. Use MACsec on fiber backbones and TLS 1.3 for application flows.
Anchor keys in HSM-backed PKI. Rotate keys and certificates via automated workflows. Restrict crypto to FIPS-validated modules for sensitive systems.
Identity is the new perimeter. Deploy strong MFA for admins and 802.1X/NAC at PoPs. Enforce privilege segmentation via RBAC and just-in-time access.
Microsegmentation limits blast radius. Use VRFs per department or sector, ACLs and security groups per application, and layer-7 firewalls at SHQ and DHQ egress points.
DDoS defenses combine upstream scrubbing, BGP Flowspec, and on-box rate limiting. Mandate centralized logging (netflow/IPFIX, syslog, DNS logs) to a SIEM. Use SOAR-driven playbooks for rapid containment.
Back this with continuous vulnerability management and secure config baselines. Run periodic red/blue exercises to validate readiness.
Compliance mapping: CERT-In/MeitY, ISO 27001, and NIST
Map SWAN controls to frameworks your auditors reference:
- CERT-In/MeitY: Implement incident reporting and log retention per the CERT-In 2022 Directions (e.g., reporting within 6 hours and maintaining logs for 180 days within India). Align with MeitY’s e-governance security expectations for data residency and auditability.
- ISO 27001: Build your ISMS around ISO/IEC 27001:2022; map SWAN to Annex A control themes (e.g., access control, cryptography, operations, communications, supplier relationships, incident response) and maintain risk treatment plans, SoA, and internal audit cycles.
- NIST: Use NIST SP 800-53 Rev. 5 as a control catalog for AC (access control), AU (audit), CM (configuration), CP (contingency), IR (incident response), SC (system and communications), and SI (systems and information integrity) to validate defense-in-depth across the network stack.
Create an audit-ready checklist. Include network diagrams with data flows, VRF/ACL policies, crypto inventories and key lifecycles, and identity or NAC configurations. Add logging coverage and retention proofs, incident runbooks, vendor SLAs and TPRM artifacts, and test evidence such as DR drills, patch KPIs, and vulnerability reports.
Service and contracting models: PPP/BOOT vs NIC
States typically implement SWAN via two models. In PPP/BOOT, a private partner builds, owns, operates, and transfers the network after the concession term. The partner delivers services against SLAs with payment linked to performance.
This model accelerates rollout and transfers certain risks such as delivery and operations. It can also lock in costs. It demands strong contract management and TPA oversight.
In the NIC model, the state and NIC co-design and operate SWAN using a blend of government procurement and NIC services. It can improve policy alignment and reuse of national platforms. It requires stronger in-house capacity and may stretch timelines.
Choose the model that fits your capabilities and urgency. PPP/BOOT works well when rapid expansion and predictable Opex are critical and the state has a mature PMU/TPA. NIC-led models suit states emphasizing public sector control. They leverage NIC’s backbone and integrate closely with national stacks while investing in internal skills.
Decision matrix and accountability structures
Align model choice to four dimensions: delivery speed, internal capability, budget flexibility, and risk tolerance. If speed and Opex predictability outrank control, lean PPP/BOOT. If control and alignment top the list, lean NIC.
Regardless of model, clarify accountability:
- Owner: Principal Secretary (IT) with a nodal agency for policy and funding.
- PMU: Procurement, vendor coordination, SLA governance, and roadmap.
- Service Integrator(s)/Telcos: Build and operate under SLAs; expose telemetry.
- TPA: Independent SLA verification, security audits, and billing validation.
- NOC/SOC: 24×7 monitoring, change/incident/problem management linked to ITSM.
RACI these roles early. Publish the structure to all stakeholders and bind it into contracts and runbooks.
Service level management: SLAs, QoS monitoring, and audits
Set SLAs that reflect service outcomes for citizens and departments. Distinguish backbone versus last-mile targets. Define objective measurement methods.
As a baseline, consider backbone availability ≥99.95% (monthly). Set DHQ last-mile at ≥99.5–99.9% and BHQ last-mile at ≥99.0–99.5%, depending on terrain and diversity.
For performance, target state-internal latency ≤50 ms (median) and ≤80 ms (95th percentile). Keep jitter ≤20 ms for voice and packet loss ≤0.3% in the EF class.
Define time-to-respond (TTR) and MTTR by severity. For example, Sev-1 TTR ≤15 minutes, MTTR ≤4 hours for SHQ/DHQ, and ≤8 hours for BHQs.
Implement continuous QoS monitoring. Use synthetic probes for voice and video. Apply flow-based analytics for class utilization and schedule independent TPA audits.
Penalties should scale with impact. Include availability shortfalls, chronic congestion in priority classes, and delayed incident response. Use service credits, liquidated damages, and, for repeated misses, step-in and termination rights.
Publish monthly SLA scorecards to department heads to sustain governance focus.
NOC workflows, observability stack, and incident MTTR
Operational excellence sustains SLAs. Standardize the NOC workflow: detect (telemetry and alerts) → triage (classify and correlate) → diagnose (path, device, or app layer) → remediate (change, failover, provider ticket) → recover (validate SLOs) → RCA (problem management).
Use SNMPv3 where devices lack modern telemetry. Prefer streaming telemetry (gNMI/NETCONF) for real-time health. Collect NetFlow/IPFIX for traffic insights.
Integrate syslog, DNS, and firewall logs into a SIEM. Automate common playbooks through SOAR. Consider AIOps for anomaly detection and noisy alert suppression.
Set MTTR targets by tier. SHQ/DHQ ≤4 hours for Sev-1 and BHQ ≤8 hours. Track mean time to acknowledge (MTTA) and change failure rate.
Enforce maintenance windows with stakeholder notifications. A uniform device baseline, a golden config library, and tested rollback procedures reduce incident duration and change risk.
Disaster recovery and resilience: RPO/RTO and failover patterns
Design for failure and practice for speed. For critical applications, aim for RPO 0–15 minutes and RTO 15–60 minutes.
Use synchronous or near-synchronous replication between primary and DR data centers via DCI. Network failover patterns include active-active SHQ cores with ECMP and dual-homed DHQs with diverse uplinks. Apply policy-based failover for internet breakouts.
Use route dampening and BFD to avoid flapping. Keep DNS TTLs low (e.g., 60–300 seconds) for fast service re-pointing.
Resilience starts at the last mile. Enforce physical diversity (different ducts, bridges, power), provider diversity, and technology diversity (e.g., fiber + RF/4G/LEO) for critical PoPs.
Run DR drills at least semi-annually. Include regional isolation, control-plane failure, and last-mile outages. Track resilience KPIs such as failover time, data loss during failover, and success rate of DR runbooks. Fold lessons learned into design and contracts.
Cloud and data center integration for government workloads
A modern SWAN integrates cleanly with state data centers and GovCloud while preserving segmentation and data residency.
For DCI, use dual diverse paths. Options include MPLS/E-line with EVPN/VXLAN or dark fiber with DWDM. Keep VRF-consistent policies end-to-end.
For GovCloud (e.g., MeghRaj (GI Cloud) by MeitY), establish private connectivity via MPLS, VPN, or partner exchanges. Extend identity and PKI. Enforce egress filtering and CASB controls for SaaS.
SASE can complement SWAN. It extends secure access to roaming officials and field sites beyond BHQs while centralizing policy.
Keep data residency central. Ensure logs, keys, and regulated data remain within jurisdiction. Bind this in vendor contracts.
Publish an application placement policy. Define what runs on-prem DC, what runs in GovCloud, and the network or security prerequisites for each. Include east–west segmentation and backup/DR topologies.
Funding, budgeting, and TCO/ROI models
Budget with transparency across CapEx and OpEx horizons. CapEx includes PoP build (space, racks, power/UPS, cooling), network hardware and licensing, security stacks, optical gear, and initial last-mile installs.
OpEx covers last-mile rentals, backbone bandwidth, NOC/SOC operations, spares and logistics, energy, software subscriptions, TPA audits, and training. Express TCO as a per-PoP and per-user cost to support chargeback or showback to departments.
A simple TCO formula for a district over five years: TCO_5yr = CapEx_poP + 60 × (Circuit_Opex + NOC/SOC + Energy + Facilities + Licenses + TPA_share) Unit cost per PoP per month = TCO_5yr ÷ 60 Unit cost per user per month = (TCO_5yr ÷ total_users ÷ 60)
Illustrative example (assumptions only): DHQ CapEx ₹90L; monthly OpEx ₹8L (last-mile and backbone ₹5L; NOC/SOC ₹1.5L; facilities/energy ₹0.8L; licenses/TPA ₹0.7L). TCO_5yr ≈ ₹90L + 60×₹8L = ₹570L. Per-PoP ≈ ₹9.5L/month. With 2,000 users, ≈ ₹475/user/month.
Use your tendered rates and staffing plans to compute precise figures. Run sensitivity (e.g., ±30% on last-mile) to bound risk.
Sample TCO worksheet and sensitivity analysis
Build your worksheet around these line items so finance and procurement can validate assumptions:
- CapEx: network gear, security appliances, optics, racks/PDUs/UPS, site readiness.
- Recurring: last-mile/backbone, NOC/SOC staffing or MSP, licenses/subscriptions, facilities/energy, sparing/logistics, TPA audits.
- One-time services: design/integration, migration, training, documentation.
Stress-test with three levers: (1) last-mile cost ±30% (market/terrain variance), (2) traffic growth +20–35% YoY (digital program acceleration), and (3) energy cost +10–20%. Present best, base, and worst cases to the steering committee before final award.
Procurement and vendor evaluation criteria
An RFP that bakes in governance yields better outcomes. Structure it in sections: background and scope; reference architecture and security blueprint; compliance (CERT-In/MeitY, ISO 27001), data residency, and audit requirements; SLAs/SLOs with measurement and TPA; technical specifications (routing, QoS, IPv6, multicast, SD-WAN interworking); observability and ITSM integration; DR and resilience; staffing and knowledge transfer; and detailed pricing templates for CapEx/OpEx.
Score vendors across several areas. Assess technical merit (architecture fit, resilience, security) and operational approach (NOC/SOC runbooks, tooling, MTTR track record). Review compliance posture (certifications, auditability), delivery capability (rollout plan, local presence, spares), and commercials (TCO transparency, unit rates). Evaluate governance (reporting, TPA cooperation, openness of interfaces).
Include penalty regimes such as service credits and LDs. Add benchmarking or right-to-audit, step-in or termination, data ownership and exit plans, and a change-control process with price caps. Favor open standards and avoid lock-in. Require visibility via streaming telemetry and access to configuration templates.
Migration roadmap from MPLS WAN to SWAN
Migrate in phases to minimize downtime and risk while preserving citizen services:
- Discover and plan: inventory circuits/devices/apps, map dependencies, define QoS classes and address plans, and agree cutover windows and rollback criteria.
- Build the overlay: deploy SHQ/DHQ cores, security, and monitoring; pilot SD-WAN/CPE at a few BHQs; validate QoS and telemetry.
- Parallel run: dual-home pilot sites to MPLS and SWAN; mirror policies; run acceptance tests (voice/video, critical apps).
- Phased cutover: district-by-district or cluster-by-cluster, during low-traffic windows; pre-stage configs and verify health checks before and after.
- Stabilize and optimize: monitor KPIs for 2–4 weeks per wave; tune QoS, fix address conflicts, and decommission legacy circuits progressively.
Risk controls include change freezes on adjacent systems during cutovers. Keep on-site spares and backout scripts. Ensure real-time coordination among NOC, field teams, and providers.
Maintain a running risk register and update it after each wave with lessons learned.
KPIs and governance dashboards
Govern what you expect to improve. Track a core set of KPIs on a monthly operations dashboard and review quarterly at the steering committee:
- Availability: backbone, DHQ, BHQ (by class and geography).
- Performance: latency, jitter, packet loss (overall and per QoS class).
- Capacity: link utilization, class utilization, top talkers, and headroom.
- Incidents and changes: MTTA/MTTR, incident counts by severity, change failure rate.
- Security: blocked threats, time-to-contain, patch/vulnerability KPIs, log coverage.
- Compliance and audits: TPA findings closed on time, DR drill outcomes.
Set thresholds and early-warning triggers (e.g., EF class utilization >70% sustained). Assign owners per KPI and time-box corrective actions.
Publish simple, human-readable dashboards to department heads. This sustains executive attention and reinforces a culture of accountability.