Quantifying SD-WAN’s reliability over multiple circuits

18 December 2020

Compared to traditional wide area networks (WANs), SD-WAN offers many benefits, such as better performance, enhanced visibility and control, and increased agility. Most notably, SD-WAN is known for its improved reliability, which it achieves by actively using multiple transport networks at the same time and dynamically routing traffic around network failures when they occur. Most of us have an intuitive understanding of why this improves the reliability of the network, but the mathematics that proves this can be surprising.

Laying out the calculations

When considering the reliability of a WAN, the critical measure is the network availability at each location. You can think of availability in two different ways: historically and predictively.

The historical availability of a network service over a set period can be calculated by dividing the amount of time the service was “up” by the total amount of time measured. In simple terms, the predicted availability of a network service is the probability that the service will be available when it’s needed. We can estimate this by taking a figure for the average uptime between outages and dividing it by that figure plus the average total time between outages. Putting it mathematically: the mean time between failure (MTBF) divided by the MTBF + the mean time to repair/restore (MTTR).

The closer the resulting number is to 1.0, often expressed as a percentage (100%), the more reliable the expected service. For example, the MTBF of a typical broadband circuit might be around 600 hours (25 days). The MTTR for broadband can be lengthy, approximately 12 hours. Applying the equation (in the image above) would yield an expected availability of about 98%. On average, this equates to downtime of 14.1 hours in a month.

For many businesses, broadband’s potential downtime is unacceptable, prompting them to procure more reliable (and more expensive) services such as MPLS or DIA over carrier-grade Ethernet. The MTBF for these services might be 2,400 hours (100 days), and the MTTR might be 4 hours, yielding an expected availability of about 99.8%. On average, this equates to downtime of 1.2 hours in a month, a significant improvement over broadband.

But with SD-WAN, the expected availability can be improved further, due to its ability to utilise multiple network services simultaneously. Suppose those services are diverse and independent of one another. In that case, SD-WAN increases resilience because a location is only ever likely to be entirely out of service if all the underlying services are down at the same time.

To calculate the expected availability of a location served by two independent networks, you first multiply together their individual probabilities of not being available (i.e., the figures by which each falls short of 1.0). Then, 1.0 minus the probability that both services are unavailable produces the probability that at least one service is available — shown in the equation below.

Now consider a site served by two independent and diverse broadband services (e.g., one cable and one 4G cellular, a combination frequently seen in the retail and hospitality industries). Inputting the broadband availability value from the previously described example above, the expected availability of the combination would be 99.96%. On average, this works out to having only 16.6 minutes per month of downtime, substantially better than relying on only carrier-grade MPLS or DIA.

Similarly, consider a site with one MPLS or DIA over telco-grade Ethernet and one cable broadband, a configuration commonly seen in hybrid WANs. Inputting the corresponding availability values from earlier calculations, the expected availability of this combination would be 99.9967%. On average, this works out as only 1.4 minutes per month of downtime, a figure small enough to satisfy all but the most demanding enterprises. These calculations demonstrate how relying on multiple independent network services like SD-WAN would result in improved reliability of the network.

Optimising costs with SD-WAN

SD-WAN’s ability to incorporate low-cost broadband services as part of its underlay without compromising reliability can help businesses reduce the overall cost per Mb of their WANs. Many that have historically relied on higher cost telco-grade services like DIA and MPLS are replacing or augmenting those services with lower-cost broadband.

Businesses with strict uptime requirements are replacing expensive secondary telco-grade networks with economical, diverse broadband at a fraction of the cost per Mb with negligible impact to site availability. These cost savings, along with the reduction of costly downtime, are critical drivers in the business case to adopt SD-WAN.

Today, with the rise of cloud computing, digital business and an increasingly dispersed workforce, enterprise networks have had to evolve. Businesses need a reliable network to ensure that their employees, sites, customers and partners around the world can stay connected. As a result, many are turning toward SD-WAN to make this possible. SD-WAN not only ensures the performance of critical applications but also helps to simplify the management of these increasingly complex enterprise networks.

By Richard Vidil, VP sales engineering at GTT