Preparing for a second coronavirus wave

21 September 2020

Casey Zandbergen, head of global strategy, ITRS Group

Casey Zandbergen, head of global strategy, ITRS Group

By Casey Zandbergen, head of global strategy, ITRS Group

Covid-19 has placed massive pressure on IT infrastructure, with managers having to respond quickly to increased volumes and changing consumption patterns, while simultaneously being weighed down by large and complex infrastructure. While the sector hasn’t fallen to its knees, there has been a notable increase in the number of outages. With the Covid-19 situation constantly evolving, network providers need to be prepared for a second wave or face customer losses.

As the world shifted to working from home, IT infrastructure everywhere was placed under enormous stress. The obvious problem was volume; home broadband networks had never seen this amount of traffic before, as entire households were home, utilising bandwidth for both work and entertainment. Another problem was unpredictable consumption patterns. Before Covid-19, providers could predict what services were going to be used and when, but with blurred lines now existing between work and life, the patterns became less predictable.

This shift has been difficult to handle for IT providers with network outages increasing by 42% between mid-February and mid-April. Why have providers struggled? IT networks are particularly vulnerable, having become increasingly more complex over the past couple of decades. At the same time, providers are not able to scale quickly enough to match the demand due to a finite capacity that cannot be increased at the drop of a hat.

There’s no doubt that providers face serious challenges, but they must be proactive to minimise the number of outages.

The answer to this problem is operational resilience, which allows providers to both minimise outages and reduce the severity of an outage if one occurs. Through achieving operational resilience, IT network providers can minimise downtime, boost customer satisfaction, maintain reputation and minimise financial loss.
For operational risk to be identified and reduced, network and IT mangers must ensure that the software solutions they are using are sophisticated and operate in real time. There are three essential pillars at the heart of operational resilience for IT managers to consider: synthetic monitoring, performance monitoring and capacity planning.

Synthetic monitoring

The first essential pillar to get right is synthetic monitoring. A trap that many providers fall into is not being able to see their systems from the users’ perspective. It is therefore imperative to track the external health of applications and infrastructure in real-time so providers can react to problems quickly and proactively in order to fix any issues right away, rather than being alerted to problems through angry rants on Twitter.

Performance monitoring

Once it’s been established that the end user is able to make the initial connection to data centres, providers must ensure that they are able to track the internal health of their entire IT estate. With consumption patterns having changed due to Covid-19, it’s important for providers to be able to monitor their entire system and collect data across the whole estate, which will grant them complete end-to-end visibility into any performance issues.

A trap that many IT managers fall into is utilising monitoring tools that give them a glimpse if their IT estate every few minutes, which they then average out over a period of time. This increases the potential of outages; managers need to be using tools that provide insight into the status of their IT estate at any single point in time. This makes it easier to anticipate problems and act quickly when issues do occur, stopping them from impacting the business and their clients.

Capacity planning

Once IT managers have a firm grip on the present, it’s important that they plan for the future. One thing that the virus has taught us is that it’s important to be prepared for anything. All networks have a limit to how much traffic they can cope with at one time, yet many firms don’t know what that limit is. Capacity planning grants IT managers three key benefits. First, the ability to report what is occurring in real-time on the IT estate and calculate present headroom. Second, the ability to identify potential pinch points existing within the estate. Third, the ability to predict future outcomes on an IT estate if the present configuration were to continue as it is.

The best capacity planning solutions allow for the modelling and stress testing of a variety of worst-case scenarios to help firms better predict what their systems can and cannot withstand. For example, if an IT estate has experienced 4x usual demand, they can model what 6x or 8x normal volumes might look like.
IT providers face an extremely difficult situation, with changing customer demands on the one hand and an infrastructure that is difficult to change quickly on the other. Despite there being no quick fix, there are actions that they can take to regain a sense of control. At a time of fragile customer loyalty, providers who neglect operational resilience will be unprepared for a second wave and could face customer losses.

By Casey Zandbergen, head of global strategy, ITRS Group