Overlay networking - what took so long?

Emulex
By Rahul Shah, Product Marketing Manager, Emulex
Friday, 13 June, 2014

Virtualisation’s success story has been built on delivering on the initial value propositions of optimising hardware utilisation, reducing server sprawl and maximising return on server hardware investments. This was accomplished by abstracting and virtualising a server’s computing resources (CPU and RAM memory) to be shared among multiple virtualised application workloads. To date, 40 million virtual machines (VMs) have been deployed - testimony to the technology’s acceptance.

Non-disruptive VM migration technology was introduced in 2003, bringing with it the promise of IT agility - built on the foundation of delivering VM mobility and flexible VM placement. The scope and capability of this technology has expanded, making it better and faster - an ongoing 10-year development process to fully realise the potential of using virtualisation the way it was intended.

Virtualisation’s networking challenges

However, despite this continuing development of VM migration-related capabilities, networking challenges continue to fester. These include:

Onerous networking reconfigurations resulting from VM mobility
Limitations to scale-out of virtualisation beyond Layer 2 boundaries
Inadequate virtual LAN (VLAN) IDs for scaling secure private networks

As a result, the promise of true IT agility remains partially fulfilled. For example, users either deal with a new type of server sprawl - namely virtual server sprawl, or spend up to $1800 per VM migration to reconfigure multiple network elements after a VM migration event. Anecdotal evidence suggests that users have stayed put, embracing workload virtualisation for hardware cost reductions, but not progressing to true agility.

As a corollary, VM migrations and inter-VM communications are typically restricted to host servers on a single rack, or a handful of racks, all belonging to a single Layer 2 (L2) subnet, due to L2 communications requirements for VMs.

Finally, the limit of 4096 (actually 4094 + 2 restricted IDs) VLAN IDs stunts the ability to add more secure and isolated user groups in a private cloud infrastructure, or tenants if in a public/hybrid cloud.

Overlay networking to the rescue

The arrival and deployment of overlay networking is finally allowing IT managers to unleash the full potential of virtualisation and deliver true IT agility - hitherto limited to compute, and, to a lesser degree, storage infrastructures. Unsurprisingly, a VMware CIO survey indicates that extending virtualisation to networking and storage is one of the top IT priorities for 2015.

Arguably, as an analogy, the 10-year evolution from DOS in 1985 to Windows 95 was more impactful in fulfilling its mission of enhancing the interface experience than data centre virtualisation has been for delivering unrestricted IT agility, from 2003 to around 2012!

Overlay networking is basically building a virtual L2 network on top of a Layer 3 network, hence the word “overlay” in its description. Traffic from a VM is mapped to this virtual network. Network packets are encapsulated in a MAC-in-IP format and then routed over the existing infrastructure.

There are currently two proposals (essentially de facto ‘standards’ given widespread industry support) for building overlay networks submitted to the Internet Engineering Task Force (IETF). They are Network Virtualisation using Generic Routing Encapsulation (NVGRE), supported by Microsoft (starting with Windows Server 2012), and Virtual eXtensible Local Area Networks (VXLAN), supported by VMware (starting with vSphere 5.1). Both standards are designed to enable the fluid movement of virtual workloads across infrastructures for large-scale and cloud-scale VM deployments.

NVGRE encapsulates Ethernet L2 Frames in a GRE packet. GRE is an existing protocol that was first proposed in 1994.
VXLAN encapsulates the Ethernet L2 Frames in a UDP packet. One benefit of the VXLAN standard is that it also defines some control plane functionality.

Both proposals encapsulate Ethernet L2 Frames into an IP packet and insert a new 24-bit virtual network identifier (VNI). These identifiers can enable more than 16 million L2 subnets to operate, a significant increase in scalability over the 4094 VLAN ID limit mentioned earlier.

The question one asks is - what took so long to address a core issue around VM mobility?

There is nothing inherently technologically advanced about overlay networking that couldn’t have been delivered earlier. The two key innovations for overlay networks, namely tunnelling and MAC-in-IP encapsulation, could have been accomplished earlier and alleviated the problems discussed above.

For example, overlay transport virtualisation (OTV) achieved similar goals of extending Layer 2 domains over Layer 3, albeit for wide area networks (WANs) in Cisco switches as far back as 2009, and a similar industry standardisation effort around virtual private LAN services was proposed to the IETF back in 2006!

That said, as the old saying goes, the deployment of overlay networking from 2012 is a case of “better late than never”.

VM mobility - all’s well that ends well?

Does the arrival of overlay networking technology mean that all the VM mobility-related challenges are now resolved and we draw a curtain on the issue of IT agility? Indeed, the two popular overlay networking formats are gaining traction inside densely virtualised data centres and enabling full value extraction of virtualisation technology, while also enabling large scale-out networks.

However, solving the problem of VM mobility and networking reconfiguration has spawned another. Implementing overlay networking in software imposes a server ‘CPU tax’, taking away the very resource that enables workload consolidation through virtualisation!

Careful consideration to the choice of the server’s network adapter (NIC) mitigates this problem.

Recommendation: Make server networking I/O selection a strategic decision. Most leading network interface card (NIC) adapters include a suite of TCP/IP offloads to minimise server CPU utilisation, thus enabling higher virtualisation density for maximising returns on server investments. However, adapters without explicitly engineered overlay networking offload support ‘break’ these TCP/IP offloads. Such adapters can increase CPU utilisation by as much as 50%, drastically impacting server efficiency and VM scalability.

Selecting an NIC platform that explicitly supports overlay networking offloads futureproofs your data centre as you expand your virtualisation efforts and embark on your journey towards implementing a private or hybrid cloud infrastructure.

You’ve waited 10 years to make virtualisation a truly effective IT agility tool - don’t wait longer by making a bad NIC selection.

Image courtesy Antoinebercovici under CC.

Overlay networking - what took so long?

Virtualisation’s networking challenges

Overlay networking to the rescue

VM mobility - all’s well that ends well?

Storage strategy in the multicloud era

Private AI models: redefining data privacy and customisation

Why having an observability strategy is critical for effective AI adoption

Content from other channels on our network