How to avoid the top five IT infrastructure monitoring mistakes

11 December 2019

Mark Banfield

Mark Banfield, chief revenue officer at LogicMonitor

Most businesses know that they need to monitor their IT infrastructure to ensure uptime and deliver a seamless customer experience.

Unfortunately, five common mistakes made by IT teams worldwide directly contribute to costly outages and brownouts. The top five IT monitoring mistakes made by enterprises include:

1. Relying on individuals and manual processes rather than comprehensive monitoring software. Humans and manual processes can be error-prone and may overlook important tasks or information.

2. Resolving an issue once but then failing to put early warning processes in place to stop it from recurring in the future.

3. Not addressing an alert overload. Frequent alert storms result in administrators missing critical alerts due to noise.

4. Using disparate monitoring systems for servers, software and storage. Even if each system is highly functional, having multiple systems contributes to longer MTTR.

5. Not monitoring the monitoring system, as these too can fail.

To avoid these common mistakes, businesses should implement a comprehensive infrastructure monitoring solution that covers all IT systems and limits human configuration. Not only does automation save time, it makes the monitoring process – and therefore the IT infrastructure monitored – much more reliable.

What should companies look for in a monitoring solution?

An intelligent monitoring solution should examine all systems continually for modifications and should automatically add new volumes, interfaces, load balancer VIPs and database instances. It should also scan subnets and instantly add new machines or instances to the monitoring process so that nothing is missed.

Businesses should also shop around for a unified monitoring system that provides detailed information to help IT teams examine the root cause of a service issue. This allows early warning thresholds to be created to prevent problems from recurring. Advance warnings create a window of time in which to address and resolve an issue before it leads to a full-blown IT outage.

IT teams should also consider enabling intelligent alert filtering, classification and escalation to help avoid alert overload. It is important to not only distinguish clearly between warnings and critical alert levels, but also to route the right alerts to the right people. Putting procedures into place that ensure all alerts are acknowledged, resolved and cleared will help IT teams avoid the top five monitoring mistakes.

Finally, although IT infrastructure monitoring can help prevent a huge range of issues, even the best monitoring solution can run into problems. Businesses can minimise their risk of outages by configuring a health check of their monitoring system from a location outside its reach, or by selecting a monitoring solution with its own checks that’s hosted in a separate location. When customer satisfaction is on the line, it never hurts to put measures in place to minimise risk.

By Mark Banfield, chief revenue officer at LogicMonitor