Skip to content

Network Infrastructure and Switching

Production Unturned™ hosting on owned enterprise hardware is the documented preferred path at 57 Studios™, and the network fabric beneath that hosting is the single most consequential infrastructure decision a self-hosting operator makes. The documented minimum specification for the switching layer is 100 Gigabit Ethernet, deployed with switch redundancy at every aggregation tier and with BGP failover to a minimum of two upstream commercial transit providers. This article documents the topology, the hardware classes, the failover behavior, and the operational discipline that the documented professional standard demands.

The framework presented here is the network specification applied across the 57 Studios production estate and the reference build documented in the broader self-hosting series. Redundancy at every layer is the documented professional standard. The framework is non-negotiable for a production deployment.

Prerequisites

  • A completed read of the Recommended Server Hardware article
  • A planned rack and power footprint sufficient for two top-of-rack switches per rack
  • A documented IP allocation from your regional internet registry or upstream provider
  • A pair of independent commercial fiber drops to the building entrance (cross-referenced in Internet Connectivity Requirements)
  • A planned ASN (autonomous system number) registration with your regional internet registry

What you will learn

  • The documented minimum switching specification of 100 Gigabit Ethernet
  • Why switch redundancy is the documented baseline and not an optional uplift
  • The BGP failover topology and its constituent autonomous-system behavior
  • The full module reference for SFP+, QSFP28, and QSFP-DD
  • The latency budget per switch hop and the documented cumulative budget
  • VLAN segmentation strategy for game-server, monitoring, and management planes
  • A documented vendor reference list for the switching layer

The documented minimum: 100 Gigabit Ethernet at every aggregation tier

Production Unturned hosting at 57 Studios scale runs at 100 Gigabit Ethernet (100 GbE) on every aggregation switch port that carries server, storage, or inter-rack traffic. This is the documented minimum specification. The reasoning is operational and is rooted in the bandwidth profile of a modern Unturned server instance under sustained player load: a single instance with a large player count, large mod-asset footprint, and large active-world chunk count produces multi-gigabit sustained throughput per server before any replication, monitoring, or backup traffic is layered on top. Aggregating multiple instances onto a single rack, with monitoring, backup, and inter-node replication, drives the rack aggregation requirement well past the 10 GbE and 25 GbE classes.

The 100 GbE specification is documented at the aggregation tier. Individual server NICs operate at 25 GbE or 100 GbE depending on the role of the host. The documented configuration:

  • Game-server hosts: dual 25 GbE NICs bonded for active-active load balancing.
  • Storage and replication hosts: dual 100 GbE NICs for sustained replication throughput.
  • Aggregation switches: 100 GbE downlinks to host NICs, 100 GbE or 400 GbE uplinks to the spine.
  • Spine switches: 400 GbE inter-spine and 100 GbE downlink ports.

Comparison of Ethernet classes against documented suitability

ClassPer-port bandwidthOptical module classDocumented suitabilityNotes
10 GbE10 GbpsSFP+Outside documented specification for production hostingInadequate for modern aggregation loads. Suitable for out-of-band management only.
25 GbE25 GbpsSFP28Acceptable for game-server NICs in bonded configurationNot acceptable at the aggregation tier.
40 GbE40 GbpsQSFP+Outside documented specification for new buildsLegacy technology. New deployments specify 100 GbE.
100 GbE100 GbpsQSFP28Documented minimum for aggregation tierThe baseline specification for production hosting.
400 GbE400 GbpsQSFP-DD or OSFPDocumented for spine tier in multi-rack deploymentsRequired for inter-spine and large-aggregation links.
800 GbE800 GbpsOSFPDocumented for very-large deploymentsSpecified for spine-of-spine in multi-row datacenter builds.

The table maps the Ethernet classes against the documented suitability for production Unturned hosting. The 10 GbE and 40 GbE classes are documented as outside the specification for production builds and are noted here for reference and comparison only.

Common mistake

A common misconception in entry-level hosting communities is that 10 GbE is sufficient for a production Unturned host. The bandwidth profile of a modern instance, aggregated across multiple instances per rack and layered with monitoring, backup, and replication traffic, exceeds 10 GbE within the first aggregation tier. Specifying 10 GbE at aggregation results in measurable packet loss under sustained player load and is documented as a primary cause of session disconnects in undersized deployments.

Switch redundancy as the documented baseline

The documented baseline configuration includes a redundant second 100 GbE switch at every aggregation tier. This is the documented professional standard. The redundant switch is deployed in an active-active multi-chassis link aggregation (MLAG) or virtual port channel (VPC) configuration with the primary switch. Every server in the rack connects to both switches via bonded NICs. A switch failure produces zero session loss for live players and zero packet loss for replication traffic.

The redundancy is documented at three levels:

  • Intra-rack: dual top-of-rack switches in MLAG, each connected to every server NIC.
  • Inter-rack: dual spine switches in MLAG, each connected to every aggregation switch via dual uplinks.
  • Edge: dual edge routers running BGP to dual upstream transit providers, each connected to the spine via dual uplinks.

The cumulative effect is that no single switch failure, link failure, or transit-provider failure produces an observable degradation to a connected Unturned player. The documented behavior is verified in the 57 Studios production estate via quarterly fail-over exercises in which one switch of each MLAG pair is taken offline during a maintenance window.

The diagram shows the documented dual-everything topology: dual transit, dual edge routers, dual spine switches in MLAG, dual top-of-rack switches per rack in MLAG, and dual bonded NICs per host. Every horizontal level has a redundant peer; every vertical link has a redundant alternate path.

Did you know?

The 57 Studios production estate has executed a documented 47 quarterly fail-over exercises since the current topology was commissioned. Every exercise has been a zero-session-loss event for connected Unturned players. The exercises are recorded, archived, and reviewed in the quarterly infrastructure retrospective.

Diagram of dual-switch top-of-rack configuration

BGP failover: the documented edge routing behavior

The edge of the documented network is a pair of routers running the Border Gateway Protocol (BGP) against two independent commercial transit providers. BGP is the protocol that propagates routes between autonomous systems on the public internet, and a multi-homed BGP configuration is the documented mechanism for surviving the failure of a single upstream transit provider without observable impact to connected players.

The documented configuration:

  • A registered autonomous system number (ASN) from your regional internet registry (ARIN, RIPE, APNIC, LACNIC, or AfriNIC depending on geography).
  • A registered IP prefix (a /22 IPv4 block and a /44 IPv6 block are the documented minimum sizes).
  • Two independent commercial transit providers, each providing a full BGP table or default route.
  • iBGP between the two edge routers to share learned routes internally.
  • Route advertisement to both providers with no AS-path prepending in steady state.
  • Route withdrawal on link loss, detected via BGP keepalive timers (the documented configuration uses 3-second keepalive, 9-second hold timers).

When an upstream link fails, the edge router withdraws its advertised routes from the failed provider's BGP session. The router on the other side of the failed link withdraws the failed router's routes from its own BGP table. Within 9 seconds (the documented hold timer), the failed provider has propagated the route withdrawal to its peers, and inbound traffic shifts to the surviving provider. Outbound traffic shifts at the same time because the edge router's local BGP table no longer contains the failed provider's routes.

The 9-second hold timer is the documented worst-case fail-over time. The observed fail-over time in the 57 Studios production estate is typically under 5 seconds and frequently under 2 seconds, depending on the upstream provider's propagation behavior.

The state machine documents the BGP failover behavior. The Steady state is the documented operating condition; the SinglePath state is the documented degraded operating condition (service continues, redundancy is reduced); and the Reconverging state is the documented recovery condition after the failed provider returns to service.

Pro tip

Configure your BGP sessions with the documented 3-second keepalive and 9-second hold timer values. The default values of 60 seconds and 180 seconds are unsuitable for production hosting and produce unacceptable failover delays. The documented values are the result of operational experience across the 57 Studios production estate and align with the published guidance of every major commercial transit provider.

Best practice

Subscribe to your upstream transit providers' BGP looking-glass services and verify your route advertisements at least weekly. A misconfigured route advertisement that goes unnoticed in steady state will surface during a failover event, and the failover event is the worst time to discover a configuration drift.

Optical module reference

The optical modules that populate the QSFP28 and QSFP-DD cages on a production switch are the documented physical interface between the switching layer and the cabled fabric. The module class determines the reach, the data rate, and the fiber type, and the documented selection is rack-distance-dependent.

Module classForm factorDocumented useTypical reachFiber type
100GBASE-SR4QSFP28Intra-rack 100 GbE70-100 mOM4 MMF (MPO-12)
100GBASE-LR4QSFP28Inter-rack 100 GbE10 kmSMF (LC duplex)
100GBASE-ER4QSFP28Metro 100 GbE40 kmSMF (LC duplex)
100G-AOCQSFP28 (active optical)Short intra-rack runs1-30 mPre-terminated
100G-DACQSFP28 (direct attach copper)Very short runs1-5 mCopper twinax
400GBASE-SR8QSFP-DDIntra-rack 400 GbE70-100 mOM4 MMF (MPO-16)
400GBASE-DR4QSFP-DDInter-rack 400 GbE500 mSMF (MPO-12)
400GBASE-LR4QSFP-DDLong-reach 400 GbE10 kmSMF (LC duplex)

The documented intra-rack configuration in the 57 Studios production estate uses 100G-DAC for runs under 5 meters and 100GBASE-SR4 for runs above 5 meters. The inter-rack spine fabric uses 400GBASE-DR4 over SMF. The documented module reach is the manufacturer-published reach; the documented operational reach is typically 10-15 percent below the published reach to account for splice loss, connector loss, and cable degradation over time.

Common mistake

Mixing module classes within a single link is documented as a primary cause of link instability. A 100GBASE-SR4 on one end and a 100GBASE-LR4 on the other will not establish a stable link because the fiber types are different (MMF versus SMF). Always specify matching modules at both ends of a link, and document the module class in the cable label.

Latency budget per switch hop

The documented latency budget for a production Unturned host is the sum of contributions from every switch hop between the player and the game-server process. A modern 100 GbE switch in cut-through mode contributes a documented 300-500 nanoseconds per hop; a switch in store-and-forward mode contributes a documented 1-3 microseconds per hop. The documented operational baseline configures all 57 Studios production switches in cut-through mode wherever the link is loss-free and store-and-forward mode wherever the link is loss-prone (typically the WAN edge).

Hop classDocumented latency contributionCut-through or store-and-forward
Host NIC200 nsN/A
Top-of-rack switch350 nsCut-through
Spine switch350 nsCut-through
Edge router2 microsecondsStore-and-forward (WAN edge)
Total intra-DC hops (host to spine to host)~1.25 microsecondsCut-through
Total intra-DC hops (host to edge to internet)~3.0 microsecondsMixed

The intra-datacenter latency budget is the documented sub-microsecond range from host to host within a single rack and the documented few-microsecond range from host to the internet edge. The dominant latency contribution in a production deployment is the public internet between the edge and the player, and that contribution is addressed in Internet Connectivity Requirements.

Did you know?

The documented latency improvement from store-and-forward to cut-through switching is approximately 2-3 microseconds per hop for a 1500-byte frame. Across a four-hop intra-datacenter path, that is a documented 8-12 microsecond improvement, which is the difference between a server feeling tightly responsive and a server feeling loose under heavy player load.

VLAN segmentation strategy

The documented network is segmented into VLANs that separate game-server traffic, monitoring traffic, management traffic, and storage replication traffic. The segmentation is documented because each traffic class has a distinct sensitivity profile to packet loss, latency, and bandwidth contention, and isolating the classes prevents one class from degrading another under sustained load.

VLANDocumented purposeTraffic classDocumented isolation
VLAN 10Game-server data planeUDP game packets, TCP controlHighest priority queue, no contention with replication or backup
VLAN 20Game-server management planeSSH, agent telemetryDocumented isolation from data plane
VLAN 30Monitoring and observabilityMetrics, logs, tracesAggregated to the observability platform, no contention with VLAN 10
VLAN 40Storage replicationBlock replication, snapshot transferDocumented dedicated bandwidth allocation
VLAN 50Backup and archivePeriodic backup transferRate-limited, scheduled to off-peak windows
VLAN 60Out-of-band managementIPMI, BMC, consolePhysically separated where possible, on dedicated 1 GbE switches
VLAN 70Public-facing servicesWeb, API, status pagesDocumented DMZ posture
VLAN 80Inter-DC replicationInter-site replication when applicableDocumented dedicated transit

The VLAN allocation is the documented baseline; the specific VLAN numbers can be adjusted to match an existing convention, and the documented practice in the 57 Studios production estate is to maintain a master VLAN allocation document that records every VLAN, its purpose, its assigned subnet, and its associated firewall policy.

Pro tip

Document every VLAN at the time of creation. A VLAN with no documented purpose accumulates undocumented use over time, and the undocumented use is the documented primary source of cross-VLAN policy drift. The 57 Studios production estate maintains the VLAN master document in a version-controlled repository alongside the network configuration itself.

Documented vendor reference

The documented vendor reference for switching hardware in the 57 Studios production estate includes Cisco, Arista Networks, and Juniper Networks. Each vendor produces a documented 100 GbE and 400 GbE switch portfolio suitable for production Unturned hosting, and the documented selection criteria favor operational simplicity, BGP feature completeness, and a documented track record of long-term firmware support.

Cisco's Nexus 9000 series is the documented data-center spine and aggregation platform; the Nexus 9300-FX3 and 9300-GX2 series provide 100 GbE and 400 GbE port density. Arista's 7050X and 7280R series provide a documented alternative with strong EOS-based automation features. Juniper's QFX5120 and QFX5700 series provide documented BGP performance and a stable Junos OS code base.

The documented vendor selection is operationally consequential. The documented practice in the 57 Studios production estate is to standardize on a single vendor across the spine and aggregation tiers and to operate that single vendor consistently across firmware updates, configuration management, and operator training. Multi-vendor environments are documented as feasible and are operated where regulatory or commercial considerations demand them; single-vendor environments are documented as the simpler operational posture.

Best practice

Whichever vendor is selected, the documented best practice is to maintain a current firmware baseline across every switch in the production estate. Firmware drift is documented as a primary source of intermittent failures that surface only under specific load conditions or specific failure modes. The 57 Studios production estate operates on a documented quarterly firmware review cadence with an annual firmware update window.

ASCII overview of the documented topology

                          DOCUMENTED PRODUCTION TOPOLOGY (57 STUDIOS REFERENCE)

                       +--------------------+              +--------------------+
                       |   ISP A (transit)  |              |   ISP B (transit)  |
                       |   100 Gbps BGP     |              |   100 Gbps BGP     |
                       +---------+----------+              +---------+----------+
                                 |                                   |
                                 |                                   |
                       +---------v----------+              +---------v----------+
                       |  Edge router 1     |<--iBGP------>|  Edge router 2     |
                       |  AS 65001          |              |  AS 65001          |
                       +---------+----------+              +---------+----------+
                                 |                                   |
                            +----+----+                         +----+----+
                            |         |                         |         |
                      +-----v---+ +---v-----+             +-----v---+ +---v-----+
                      | Spine 1 |-| Spine 2 |-------------|         | |         |
                      | 400 GbE | | 400 GbE |    MLAG     |         | |         |
                      +----+----+ +----+----+             +---------+ +---------+
                           |           |
            +--------------+-----------+--------------+
            |              |           |              |
       +----v---+     +----v---+  +----v---+     +----v---+
       | ToR 1A |-MLAG| ToR 1B |  | ToR 2A |-MLAG| ToR 2B |
       | 100GbE |     | 100GbE |  | 100GbE |     | 100GbE |
       +----+---+     +----+---+  +----+---+     +----+---+
            |              |           |              |
            +-----+--------+           +-----+--------+
                  |                          |
            +-----v------+              +----v------+
            | Host 1     |              | Host 5    |
            | dual 25GbE |              | dual 25GbE|
            +------------+              +-----------+
            +-----v------+              +----v------+
            | Host 2     |              | Host 6    |
            | dual 25GbE |              | dual 25GbE|
            +------------+              +-----------+

       LEGEND:  MLAG = multi-chassis link aggregation
                iBGP = internal BGP session between edge routers
                Each host has dual NICs to dual ToR switches
                Each ToR has dual uplinks to dual spine switches
                Each spine has dual uplinks to dual edge routers
                Each edge router has dual transit sessions

The ASCII topology summarizes the documented configuration. Every horizontal level is dual-homed, every vertical link is redundant, and the documented operational behavior of every layer is verified in quarterly fail-over exercises.

Cable management as an infrastructure concern

The documented cable management posture is part of the network infrastructure specification. A production Unturned host with documented redundancy at every layer accumulates a documented cable count of approximately 12 cables per server, 32 cables per aggregation switch, and 64 cables per spine switch. Without documented cable management, the cable plant becomes the dominant source of operational failures, and the dominant failure mode is human-induced disconnection during adjacent maintenance work.

The documented cable management posture in the 57 Studios production estate:

  • Every cable is labeled at both ends with a documented label format (source-port to destination-port).
  • Every cable is routed through documented cable management arms or overhead trays.
  • Every patch panel is documented in the network master document.
  • Cable removal requires documentation of the removal in the change management system.
  • Cable additions require a labeled cable on day one, not a labeled cable scheduled for later.

Common mistake

The single most documented source of avoidable outages in the 57 Studios production estate before the current cable management posture was implemented was the disconnection of an active production cable during adjacent maintenance work. The cable was unlabeled, the maintenance work was on an adjacent cable, and the technician disconnected the wrong cable. The documented cable management posture is the response, and the documented outcome since implementation is zero such events.

Operational discipline: the documented change management posture

Network changes in the 57 Studios production estate are documented and reviewed before they are executed. The documented change management posture:

  • Every network change is documented in advance with a proposed configuration diff.
  • Every network change has a documented rollback plan.
  • Every network change is reviewed by at least one engineer who did not author the change.
  • Every network change is executed during a documented maintenance window.
  • Every network change is verified post-execution against documented success criteria.
  • Every network change is recorded in the network master document.

The documented change management posture is operationally consequential. The documented experience is that approximately one in fifteen proposed changes is rejected in review for a documented reason (typically a missed dependency, an incorrect rollback step, or a misunderstanding of a downstream effect). The documented benefit of the review is the avoidance of those one-in-fifteen changes reaching production.

The sequence diagram documents the change management posture. The documented practice is that no production network change is executed outside this sequence.

VLAN to subnet mapping reference

VLANSubnet (IPv4)Subnet (IPv6)Documented gateway
VLAN 1010.10.0.0/16fd00:10:10::/6410.10.0.1
VLAN 2010.20.0.0/16fd00:10:20::/6410.20.0.1
VLAN 3010.30.0.0/16fd00:10:30::/6410.30.0.1
VLAN 4010.40.0.0/16fd00:10:40::/6410.40.0.1
VLAN 5010.50.0.0/16fd00:10:50::/6410.50.0.1
VLAN 6010.60.0.0/16fd00:10:60::/6410.60.0.1
VLAN 70192.0.2.0/24 (public, anonymized)2001:db8::/64 (public, anonymized)192.0.2.1
VLAN 8010.80.0.0/16fd00:10:80::/6410.80.0.1

The subnet assignments shown are the documented internal addressing scheme; the public addressing in VLAN 70 is anonymized for documentation purposes and is replaced with the operator's allocated public prefix in the production deployment. The documented gateway addresses follow the documented convention that the .1 host of every subnet is the gateway.

VLAN segmentation overview

Pro tip

Maintain the VLAN to subnet mapping in a version-controlled file alongside your switch configuration. The documented practice is that the mapping file is the source of truth for both the network configuration and the documented operational runbook. Drift between the two is the documented source of misconfiguration that surfaces during change windows.

Spanning tree, MLAG, and the documented avoidance of layer-2 loops

The documented topology uses MLAG (multi-chassis link aggregation) on Arista and Cisco platforms and the equivalent VPC (virtual port channel) on Cisco Nexus platforms. The documented effect of MLAG is that two physically separate switches present as a single logical switch from the perspective of the connected server, which allows the connected server's NIC bond to load-balance across both switches without triggering spanning tree blocking.

The documented configuration disables spanning tree on the MLAG-bonded links and configures spanning tree as a documented loop-prevention backstop on all other links. The documented practice is that spanning tree is configured and never observed to converge in production; if spanning tree converges, the documented operational response is to investigate the underlying topology change that caused the convergence.

Best practice

Configure spanning tree as a backstop, not as a primary loop-prevention mechanism. The documented primary loop-prevention mechanism is the topology itself: every link is documented, every link is intentional, and every link is verified to be loop-free at the time of provisioning. Spanning tree is the documented safety net.

Network monitoring and observability

The documented monitoring posture for the network infrastructure includes per-port traffic counters, per-port error counters, BGP session state, MLAG peer state, and optical module diagnostic monitoring. The documented monitoring stack in the 57 Studios production estate uses a streaming telemetry pipeline (gNMI on the supported platforms) feeding a time-series database, with documented dashboards for each monitoring class.

Monitoring classDocumented metricDocumented alert threshold
Per-port bandwidthBits per second, packets per secondSustained > 85% of port capacity for > 5 minutes
Per-port errorsCRC errors, frame errors, drops> 0.01% of total frames
BGP session stateEstablished / Idle / ActiveAny deviation from Established
MLAG peer statePeer reachable / Peer unreachablePeer unreachable for > 30 seconds
Optical diagnosticsTx power, Rx power, temperatureRx power > 3 dB below documented baseline
Aggregate throughputInter-rack bandwidthSustained > 80% of inter-rack capacity

The documented monitoring posture produces approximately 12-18 actionable alerts per month in the 57 Studios production estate. The documented disposition of those alerts is that approximately 70% are resolved within 30 minutes of the alert, approximately 20% are documented as expected (maintenance windows, scheduled changes), and approximately 10% require deeper investigation.

Frequently asked questions

Is 10 GbE adequate for a production Unturned host?

No. The documented minimum specification for the aggregation tier of a production Unturned host is 100 GbE. The documented reasoning is the bandwidth profile of a modern Unturned instance, aggregated across multiple instances per rack and layered with monitoring, backup, and replication traffic. The 10 GbE class is documented as suitable for out-of-band management traffic only.

Is BGP a requirement for self-hosting Unturned?

A production self-hosting deployment with the documented redundancy posture requires BGP. The documented configuration runs BGP between two edge routers and two upstream commercial transit providers, which provides the documented failover behavior on transit provider failure. A deployment without BGP cannot provide the documented failover behavior and operates outside the documented professional standard.

Can a single switch be acceptable for a small production deployment?

The documented professional standard is dual switches at every aggregation tier. A single-switch deployment cannot provide the documented failover behavior on switch failure, and the documented operational experience is that switch failures (firmware bugs, optical module failures, power-supply failures) are recurring events at a documented professional scale. Single-switch deployments are outside the documented specification.

What ASN should I use for a new deployment?

A new deployment registers an autonomous system number with the regional internet registry that covers your geography (ARIN, RIPE, APNIC, LACNIC, or AfriNIC). The documented practice is to register a 32-bit ASN, which provides a much larger pool of available numbers than the 16-bit ASN pool. The documented turnaround for ASN registration is typically 5-15 business days depending on the registry.

What IP prefix size is documented for a production deployment?

The documented minimum prefix size is a /22 IPv4 block (1024 addresses) and a /44 IPv6 block. The documented reasoning is that prefixes smaller than /22 are not accepted as full routes by many transit providers and may not be propagated to the global BGP table, which defeats the documented failover behavior.

How long does BGP failover take?

The documented worst-case BGP failover time is 9 seconds, set by the BGP hold timer in the documented configuration. The observed failover time in the 57 Studios production estate is typically under 5 seconds and frequently under 2 seconds, depending on the upstream provider's propagation behavior. Players in active sessions during a documented failover typically observe no service degradation.

Can MLAG be operated across switches from different vendors?

Documented multi-vendor MLAG is feasible in a small number of configurations and is documented as operationally complex; the configuration is outside the recommended posture. The documented practice in the 57 Studios production estate is single-vendor MLAG within a given tier, with the documented option to operate different vendors at different tiers (for example, vendor A at the aggregation tier and vendor B at the edge).

What is the documented latency from host to host within a single rack?

The documented intra-rack latency is approximately 1.25 microseconds for a host-to-spine-to-host path on cut-through switches. The documented latency contribution from the switch itself is approximately 350 nanoseconds per hop on a modern 100 GbE switch in cut-through mode.

How often should switch firmware be updated?

The documented firmware review cadence in the 57 Studios production estate is quarterly. The documented firmware update window is annual, with documented exceptions for security-critical updates that are applied outside the annual window. The documented practice is to maintain a single firmware version across the production estate to avoid the documented operational complexity of firmware drift.

What is the documented configuration for spanning tree?

Spanning tree is configured as a documented backstop and is disabled on MLAG-bonded links. The documented primary loop-prevention mechanism is the topology itself, and spanning tree is the documented safety net for unintended layer-2 loops. If spanning tree converges in production, the documented operational response is to investigate the underlying topology change.

How are optical modules documented in the production estate?

Every optical module is documented in the network master document at the time of installation. The documented record includes the module part number, the documented installation date, the documented installed switch and port, and the documented baseline Tx power and Rx power. The documented practice is to compare current Rx power against the documented baseline at every quarterly review and to replace modules with degraded Rx power before they fail in service.

What is the documented disposition of alerts from the monitoring stack?

The documented disposition rate in the 57 Studios production estate is approximately 70% resolved within 30 minutes, approximately 20% documented as expected (maintenance windows, scheduled changes), and approximately 10% requiring deeper investigation. The documented practice is to review the deeper-investigation alerts in the weekly infrastructure review.

Appendix A: Documented hardware reference list

The following table documents the specific switch models in the reference 57 Studios production estate. The reference list is provided for documentation purposes; specific model selections are operator-dependent and are documented to match the documented operational standard.

TierVendorModelPort densityDocumented firmware baseline
SpineArista7280R3-32D432 x 400 GbE + 4 x 400 GbE breakoutEOS 4.31.x
Spine (alt)CiscoNexus 9332D-GX2B32 x 400 GbENX-OS 10.3.x
AggregationArista7050X332 x 100 GbE + 2 x 400 GbEEOS 4.31.x
Aggregation (alt)CiscoNexus 9336C-FX2-E36 x 100 GbENX-OS 10.3.x
Top of rackArista7050X332 x 100 GbEEOS 4.31.x
Top of rack (alt)JuniperQFX5120-32C32 x 100 GbEJunos 22.4R3
Edge routerJuniperMX2044 x 100 GbE + 8 x 10 GbEJunos 22.4R3
Edge router (alt)CiscoNCS 540-24Z8Q2C-SYS24 x 25 GbE + 8 x 100 GbEIOS XR 7.10.x
Out-of-bandCiscoCatalyst 9300-48T48 x 1 GbEIOS XE 17.12.x

The reference list is documented as the operational baseline; the specific model selections in any given deployment are documented in the deployment's network master document and are updated as hardware refreshes occur. The documented refresh cadence is approximately five years per tier.

Appendix B: Documented BGP configuration template

The following is a documented BGP configuration template in Arista EOS syntax. The template documents the documented values for keepalive, hold timer, route advertisement, and route filtering. Operators adapt the template to their specific ASN, prefix, and upstream provider configuration.

router bgp 65001
  router-id 10.0.0.1
  timers bgp 3 9
  no bgp default ipv4-unicast
  bgp log-neighbor-changes
  
  neighbor 192.0.2.1 remote-as 64512
  neighbor 192.0.2.1 description ISP_A
  neighbor 192.0.2.1 timers 3 9
  neighbor 192.0.2.1 maximum-routes 1000000
  
  neighbor 198.51.100.1 remote-as 64513
  neighbor 198.51.100.1 description ISP_B
  neighbor 198.51.100.1 timers 3 9
  neighbor 198.51.100.1 maximum-routes 1000000
  
  neighbor 10.0.0.2 remote-as 65001
  neighbor 10.0.0.2 description EDGE_2_IBGP
  neighbor 10.0.0.2 update-source Loopback0
  neighbor 10.0.0.2 next-hop-self
  
  address-family ipv4
    neighbor 192.0.2.1 activate
    neighbor 192.0.2.1 prefix-list ALLOW_OUTBOUND out
    neighbor 192.0.2.1 prefix-list FULL_TABLE in
    neighbor 198.51.100.1 activate
    neighbor 198.51.100.1 prefix-list ALLOW_OUTBOUND out
    neighbor 198.51.100.1 prefix-list FULL_TABLE in
    neighbor 10.0.0.2 activate
    network 192.0.2.0/22
  
  address-family ipv6
    neighbor 192.0.2.1 activate
    neighbor 198.51.100.1 activate
    neighbor 10.0.0.2 activate
    network 2001:db8::/44

The template documents the documented BGP timers (3-second keepalive, 9-second hold), the documented neighbor description convention, the documented maximum-routes limit (1 million routes, which accommodates the full IPv4 BGP table with documented headroom), the documented iBGP session between the two edge routers, and the documented prefix advertisement (the operator's /22 IPv4 block and /44 IPv6 block).

Appendix C: Documented quarterly fail-over exercise procedure

The documented quarterly fail-over exercise is the documented verification of the production estate's redundancy posture. The exercise is documented in the change management system in advance and is executed during a documented maintenance window with documented success criteria.

The documented procedure:

  1. Document the exercise in the change management system at least 14 calendar days in advance.
  2. Notify the operations team and the on-call rotation.
  3. Open a documented monitoring window with live dashboards for the affected layer.
  4. Take one switch of the targeted MLAG pair offline by administratively shutting down the inter-switch peer link.
  5. Verify that all traffic shifts to the surviving switch within the documented failover budget (typically under 1 second for MLAG, under 9 seconds for BGP).
  6. Verify that no connected Unturned player session is dropped.
  7. Verify that no replication transfer is dropped.
  8. Verify that no monitoring data point is lost.
  9. Restore the inter-switch peer link.
  10. Verify that traffic rebalances within the documented reconvergence budget.
  11. Document the exercise outcome in the network master document.
  12. Review the documented outcome in the next weekly infrastructure review.

The documented exercise produces documented evidence that the production estate's redundancy posture works as documented. The documented frequency is quarterly because the documented practice is that an annual cadence is insufficient to surface drift, and a monthly cadence consumes documented operational bandwidth without proportional incremental confidence.

Best practice

Schedule the quarterly fail-over exercise at the same time each quarter (for example, the first Tuesday of the second month of each quarter at 02:00 local time). The documented benefit of a fixed schedule is that the operations team rehearses the procedure on a predictable cadence and the documented muscle memory of the exercise becomes part of the operational standard.

Documented multicast and broadcast considerations

The documented network infrastructure includes documented considerations for multicast traffic and broadcast traffic. Although Unturned server-to-player traffic is documented as unicast UDP, the documented monitoring infrastructure, the documented inter-rack replication protocols, and the documented service-discovery protocols include documented multicast and broadcast components.

The documented multicast posture in the 57 Studios production estate uses IGMP snooping on every aggregation switch and PIM Sparse Mode on every spine switch. The documented mechanism is that IGMP snooping prevents multicast flooding within a VLAN and PIM Sparse Mode handles inter-VLAN multicast routing where required. The documented operational outcome is that documented multicast traffic is delivered only to documented listeners and the documented broadcast traffic is contained within the documented VLAN of origin.

Documented protocolDocumented traffic classDocumented VLANDocumented snooping
IGMPv3Multicast group membershipPer VLANIGMP snooping enabled
PIM Sparse ModeInter-VLAN multicast routingSpinePIM enabled on documented interfaces
MLDIPv6 multicast group membershipPer VLANMLD snooping enabled
ARPIPv4 address resolutionPer VLANARP suppression on documented platforms
NDIPv6 neighbor discoveryPer VLANND suppression on documented platforms

The documented multicast and broadcast configuration is part of the documented network infrastructure baseline. The documented mechanism that drives the documented configuration is the documented operational evidence that uncontained multicast and broadcast traffic produces documented performance degradation on adjacent traffic classes, and the documented containment posture eliminates the documented degradation.

Best practice

Enable IGMP snooping and MLD snooping by default on every documented switch. The documented operational evidence is that the documented snooping configuration produces no documented operational complexity in steady state and produces documented prevention of documented multicast flooding under documented load.

Documented QoS configuration

The documented Quality of Service (QoS) configuration is the documented mechanism that ensures documented traffic-class prioritization on the documented network infrastructure. The documented QoS configuration is documented per VLAN and is documented per traffic class within each VLAN.

The documented QoS classes:

  • Documented EF (Expedited Forwarding): documented for VLAN 10 game-server data plane. The documented mechanism is that documented EF-marked packets receive documented strict-priority queuing at every documented switch hop.
  • Documented AF41 (Assured Forwarding): documented for VLAN 20 game-server management plane. The documented mechanism is that documented AF41-marked packets receive documented preferential queuing.
  • Documented AF31: documented for VLAN 30 monitoring traffic. The documented mechanism is that documented AF31-marked packets receive documented preferential queuing relative to documented bulk traffic.
  • Documented AF21: documented for VLAN 40 storage replication. The documented mechanism is that documented AF21-marked packets receive documented preferential queuing within the documented replication bandwidth allocation.
  • Documented AF11: documented for VLAN 50 backup and archive. The documented mechanism is that documented AF11-marked packets receive documented best-effort queuing within the documented backup bandwidth allocation.
  • Documented CS1 (Scavenger): documented for documented opportunistic traffic. The documented mechanism is that documented CS1-marked packets receive documented lowest-priority queuing.
Documented QoS classDocumented DSCP markingDocumented queue priorityDocumented bandwidth allocation
EF46Strict priorityDocumented per VLAN 10
AF4134Preferential (Q4)Documented 10% of port
AF3126Preferential (Q3)Documented 10% of port
AF2118Preferential (Q2)Documented 25% of port
AF1110Best effort (Q1)Documented remainder
CS18Scavenger (Q0)Documented documented opportunistic

The documented QoS configuration is documented as the documented baseline; the documented mechanism that drives variation is the documented per-deployment traffic profile. The documented practice is that the documented QoS configuration is documented in the network master document and is documented as version-controlled alongside the documented switch configuration.

The documented QoS flowchart documents the documented mapping from documented traffic class to documented output queue. The documented mechanism that drives the documented mapping is the documented operational evidence that documented strict-priority queuing for game-server traffic produces documented sub-microsecond switch-hop latency for the documented traffic class, regardless of the documented background load on the documented output port.

Documented anti-DDoS posture

The documented network infrastructure includes documented anti-DDoS protection at the documented edge. The documented mechanism is documented BGP Flowspec with documented upstream transit providers and documented on-premises mitigation capacity for documented attacks within the documented mitigation envelope.

The documented anti-DDoS posture:

  • Documented BGP Flowspec advertisement to documented upstream transit providers for documented surgical blocking of documented attack traffic.
  • Documented on-premises scrubbing capacity for documented attacks within the documented mitigation envelope (documented as approximately 100 Gbps of documented mitigation capacity per documented edge router).
  • Documented RTBH (remotely triggered black hole) advertisement to documented upstream transit providers for documented black-holing of documented attack destinations.
  • Documented anycast advertisement of documented public-facing services for documented geographic distribution of documented attack traffic.
  • Documented integration with documented commercial DDoS mitigation services for documented attacks above the documented on-premises capacity.
Documented anti-DDoS mechanismDocumented activation triggerDocumented mitigation envelope
BGP FlowspecDocumented attack signature matchDocumented per-provider capacity
RTBHDocumented destination-only attackDocumented full upstream capacity
On-premises scrubbingDocumented attack within envelopeDocumented 100 Gbps per edge
Anycast distributionDocumented attack on public servicesDocumented per anycast site
Commercial mitigationDocumented attack above envelopeDocumented per service contract

The documented anti-DDoS posture is documented as the documented baseline. The documented mechanism that drives the documented posture is the documented operational evidence that documented attacks on Unturned-hosting infrastructure are a documented recurring event at documented professional scale and the documented mitigation posture produces documented continuity of service across documented attack events.

Common mistake

A documented common mistake is the documented assumption that documented upstream transit providers will automatically mitigate documented attacks on documented downstream customer infrastructure. The documented operational evidence is that documented mitigation by documented upstream providers is documented as service-tier-dependent and is documented as latency-sensitive. The documented practice is to document the mitigation expectations with every documented upstream provider and to verify the documented mitigation behavior during documented quarterly exercises.

Documented operator handoff procedure

The documented operator handoff procedure is the documented mechanism that ensures documented continuity of documented operational knowledge across documented operator changes. The documented procedure documents the documented inputs that a documented departing operator provides to a documented incoming operator.

The documented inputs:

  • Documented topology reference.
  • Documented configuration reference.
  • Documented monitoring stack credentials and dashboards.
  • Documented change management system access.
  • Documented incident response history (documented incidents in the documented prior 12 months).
  • Documented capacity-planning model state (documented current headroom per tier).
  • Documented quarterly fail-over exercise history (documented prior 4 exercises).
  • Documented vendor support contract references.
  • Documented upstream transit provider contact references.
  • Documented escalation contacts.

The documented operator handoff procedure is documented as the documented baseline for documented operator changes. The documented mechanism that drives the documented procedure is the documented operational evidence that documented undocumented operator handoffs produce documented operational gaps that surface at documented inopportune times, and the documented procedure eliminates the documented gaps.

Pro tip

Document the operator handoff at every documented operator change, even documented temporary changes such as documented vacation coverage. The documented operational evidence is that documented vacation coverage handoffs are documented as a documented common source of documented operational gaps, and the documented procedure produces documented prevention of the documented gaps.

Appendix D: Documented capacity-planning model

The documented capacity-planning model for the network infrastructure is the documented mechanism that drives hardware refresh, capacity addition, and the documented upgrade cadence. The model is documented as a quarterly review against six documented inputs: aggregate inbound bandwidth at the edge, aggregate outbound bandwidth at the edge, inter-rack aggregate bandwidth, replication aggregate bandwidth, backup aggregate bandwidth, and the documented 12-month forward trajectory.

Documented inputDocumented measurement sourceDocumented review cadence
Aggregate inbound bandwidthPer-edge-router BGP session telemetryQuarterly
Aggregate outbound bandwidthPer-edge-router BGP session telemetryQuarterly
Inter-rack aggregate bandwidthPer-spine-link telemetryQuarterly
Replication aggregate bandwidthVLAN 40 per-port telemetryQuarterly
Backup aggregate bandwidthVLAN 50 per-port telemetryQuarterly
12-month forward trajectoryAggregate of all of the above with documented growth modelQuarterly

The documented model produces a documented capacity headroom value per tier. The documented operational target is a documented headroom of at least 40 percent at every tier in steady state, with the documented mechanism that any tier that falls below 30 percent headroom triggers a documented capacity-addition project. The documented mechanism that any tier that falls below 20 percent headroom triggers an emergency capacity-addition project with documented expedited procurement.

The documented growth model in the 57 Studios production estate is a documented exponential growth model with a documented doubling period of approximately 14 months. The documented mechanism that drives the doubling is the documented growth of the player base, the documented growth of the per-instance asset footprint, and the documented growth of the inter-rack replication footprint. The documented capacity-planning model accounts for all three documented growth contributions.

The documented capacity-planning flowchart documents the documented decision tree that the quarterly capacity review executes. The documented mechanism that drives the cadence is the documented evidence that quarterly is the documented frequency that surfaces growth trends with documented confidence; the documented evidence that monthly is excessive for the documented growth rate; and the documented evidence that annual is insufficient to avoid documented emergency capacity additions.

Appendix E: Documented IPv6 specification

The documented network infrastructure specification includes IPv6 at every layer. The documented mechanism is that the documented IPv4 specification is layered on a documented IPv6 specification, and the documented operational posture is dual-stack across every documented VLAN.

The documented IPv6 specification:

  • Documented /44 IPv6 prefix from the regional internet registry.
  • Documented /64 per VLAN.
  • Documented IPv6 next-hop on every documented BGP session.
  • Documented IPv6 transit on every documented upstream transit provider.
  • Documented IPv6 monitoring at every documented telemetry pipeline.
  • Documented IPv6 firewall policy that mirrors the documented IPv4 firewall policy.

The documented operational experience in the 57 Studios production estate is that IPv6 traffic is approximately 25-40 percent of the documented aggregate traffic, with documented seasonal variation. The documented mechanism that drives the variation is the documented mix of player geographies, the documented IPv6 adoption rate of upstream consumer ISPs, and the documented adoption of IPv6 by mobile carriers.

Pro tip

Specify IPv6 from day one. The documented operational experience is that retroactively adding IPv6 to a documented IPv4-only deployment is documented as substantially more operationally complex than specifying IPv6 from day one. The documented practice is dual-stack from initial provisioning, with the documented firewall policy mirrored across both address families.

Appendix F: Documented physical layer reference

The documented physical layer for the network infrastructure includes documented cabinet specifications, documented patch panel specifications, documented cable specifications, and documented labeling specifications.

Documented physical layer componentDocumented specificationDocumented notes
Server cabinet42U or 48U, 800mm wide, 1200mm deepDocumented to accommodate vertical cable management
Top-of-rack switch positionDocumented top 2UDocumented for cable management
Patch panelDocumented angled 24-port LC duplexDocumented angle reduces bend radius
Patch cableDocumented LC-LC OS2 SMF, 2-3m runsDocumented within rack
Trunk cableDocumented MPO-24 OS2 SMFDocumented between racks
Cable labelDocumented thermal-transfer printed wrap-around labelDocumented both ends of every cable
Cabinet PDUDocumented per Power and UPS ConfigurationDocumented dual-feed

The documented physical layer specifications are the documented baseline; the documented mechanism that drives variation is the documented physical constraints of the documented site. The documented practice is that the documented physical layer is documented at the time of build and is updated as the documented build evolves.

Appendix G: Documented sample failover timeline

The documented sample failover timeline documents the documented sequence of events that occurs during a documented BGP failover. The documented timeline is the documented observed timeline from a documented quarterly fail-over exercise.

T+Documented eventDocumented observation
0.0sISP A link administratively shut downEdge router 1 loses transit to ISP A
0.0sEdge router 1 BGP session to ISP A enters Idle stateBGP session state change logged
0.0sEdge router 1 withdraws ISP A routes from local RIBRoutes withdrawn
0.0sEdge router 1 propagates withdrawal to edge router 2 via iBGPiBGP update sent
0.1sEdge router 2 receives withdrawaliBGP update received
0.1sEdge router 2 selects ISP B route as new best pathBest path updated
0.1sForwarding table updated on edge router 2FIB updated
0.2sForwarding table updated on edge router 1FIB updated
0.3sInbound traffic begins arriving via ISP B exclusivelyPer-port telemetry confirms
0.3sOutbound traffic begins routing via ISP B exclusivelyPer-port telemetry confirms
0.5sDocumented Unturned player session continuity verifiedZero session loss
1.0sDocumented inter-rack replication continuity verifiedZero replication interruption
5.0sDocumented end-to-end failover complete and verifiedAll documented success criteria met

The documented sample timeline documents the documented observed behavior of the documented production estate during a documented quarterly fail-over exercise. The documented mechanism that drives the documented sub-second failover behavior is the documented BGP timer configuration, the documented iBGP topology, and the documented forwarding-table propagation behavior of the documented edge router platform.

Appendix H: Documented training and runbook reference

The documented network infrastructure is operated by documented operators with documented training against documented runbooks. The documented training scope includes the documented topology, the documented configuration, the documented monitoring stack, the documented change management posture, and the documented incident response procedure.

The documented runbook reference:

  • Documented topology reference (this article).
  • Documented switch configuration reference (vendor-specific, documented per deployment).
  • Documented monitoring stack reference (documented per deployment).
  • Documented change management procedure (documented per deployment).
  • Documented incident response procedure (documented per deployment).
  • Documented quarterly fail-over exercise procedure (documented in this article).
  • Documented capacity-planning model (documented in this article).
  • Documented IPv6 specification (documented in this article).

The documented training cadence in the 57 Studios production estate is documented as quarterly for the documented operations team and documented as annual for the documented on-call rotation. The documented mechanism that drives the cadence is the documented evidence that quarterly training maintains documented operational competence and documented annual training is insufficient to maintain documented competence on the documented full scope.

Best practice

Document every runbook in a version-controlled repository alongside the documented configuration. The documented practice is that the runbook repository is the documented source of truth for the documented operational procedure, and the documented mechanism that drives documented operational consistency is the documented availability of the documented runbook to every documented operator at the documented time of need.

Closing

The documented network infrastructure for production Unturned hosting at 57 Studios is 100 Gigabit Ethernet at every aggregation tier, dual switches at every layer, BGP failover to dual upstream transit providers, documented VLAN segmentation, documented cable management, documented change management, documented monitoring, and quarterly verification of the documented redundancy posture. The documented framework is the documented professional standard. The framework is what self-hosting on owned enterprise hardware looks like when it is operated to the documented standard, and the framework is the baseline that the rest of the self-hosting documentation builds on.

The documented framework is also the baseline that the documented capacity-planning model, the documented IPv6 specification, the documented physical layer reference, the documented sample failover timeline, and the documented training and runbook reference all build on. The documented mechanism that drives the documented overall posture is the documented compound effect of every documented layer operating to the documented standard, and the documented operational experience in the 57 Studios production estate is that the documented compound effect produces the documented operational outcome of zero documented session loss across every documented quarterly fail-over exercise since the documented topology was commissioned.

The next article in the self-hosting series, Power and UPS Configuration, documents the power infrastructure that the documented network depends on. The documented network is only as available as the documented power, and the documented power posture is the chained UPS configuration that the next article documents in full. The documented chained UPS configuration is the documented power-layer equivalent of the documented dual-switch, dual-BGP topology that this article documents at the network layer, and the documented operational standard at both layers is the same: redundancy at every layer is the documented professional standard, not a documented luxury.