Why I love long-distance Layer 2

There has been a growing number of providers offering “layer 2” links under a number of different names. You may have heard of ELan, ELine, and TLS connections to name a few. While most implementations I have seen use a simple routed network configuration we can unlock so much more.

In its simplest description you can think of a layer 2 link simply as a VERY long network cable between 2 or more sites. You are in control of all VLAN tagging and QoS policies on your own equipment.

  • Remove the need to work with the carrier for most infrastructure changes. When the carrier provisions the circuit you will generally be left an ethernet handoff at both sites. This basic configuration will simply extend whatever vlan you have the link plugged into as a simple switchport with an untagged vlan. When you make the infrastructure changes needed to implement higher level configs such as multiple VLANs traversing the circuit, or the desire to add additional routers or firewalls between the sites it is completely in your control. There’s no need to find you or your customers’ support line. There’s no need to schedule a change. There’s no need to wait for a call-back. You simply add and implement the changes on your own equipment like the other site is in the room next door. Anyone who has needed to coordinate changes to an MPLS or P2P circuit for simple routing updates should appreciate the potential lack of downtime with instant configuration changes.
  • Stretch those VLANs to massively speed up and give flexibility to Disaster Recovery. A “stretched” VLAN is the same broadcast domain extended between multiple sites. You may stretch multiple VLANs across that same ethernet handoff to enable non-routed connectivity to your server network, your hypervisor network, your SAN network, ANY network! This gives the ability to have the same IPs from your production site to be used on your recovery site for critical resources.
    • A simplified DR plan without a stretched VLAN may look like this:
      • Power down servers at your production site.
      • Synchronize all remaining changes/datastores/VMs.
      • Power on resources at disaster recovery site.
      • Update IPs on all resources to coincide with DR subnets.
      • Restart all resources in necessary order, usually starting with your AD and DNS servers.
      • Update associated DNS records.
      • Verify replication of key records such as Active Directory and DNS entries are successful and timely.
      • Manually verify client systems are able to recognize these changes, possibly flushing DNS caches manually, and critical applications.
    • Now let’s look at what happens if we implement a stretched VLAN:
      • Power down servers at your production site.
      • Synchronize all remaining changes/datastores/VMs.
      • Power on resources at disaster recovery site
  • Failover what you want when you want. When you are working in a stretched VLAN you can choose the servers or VMs that you want to failover 1 at a time. There’s no need to move all 10 file servers, or all 3 domain controllers, or all 50 RDS servers at the same maintenance window. Start with your redundant servers first. Power them off, replicate changes, power them back on. All servers retain their existing IP configuration and network paths. All clients still access those same resources through their same route tables.
  • Remove tedious application changes and client testing. As I’ve stated many times – you resources get to keep their IPs! This is tremendous in terms of time to recovery. Many items I’ve seen can be avoided this way.
    • That legacy application that has resources defined in it as IP for some reason.
    • Application licensing schemes that may require an update for a new IP.
    • IT Monitoring systems such as SolarWinds Orion require no changes to accurately report the current state of the environment as resources come back online.
    • SIEM systems will not need to be notified of the changes of IPs or the flow of data.
    • Basically – If I have a continuous ping to a server, failover that server and see the ping come back online again, I’m pretty much done with the testing of that individual server (assuming that it can handle a reboot without intervention).

In a future post I will detail more advantages and pitfalls of doing DR over a layer 2 link in a much more encompassing way. I’ll point out how to deal with hypervisor outages, circuit outages, and entire site outages.

Leave a comment

Your email address will not be published. Required fields are marked *