Joining the dots

17 July 2018

When asked exactly what should be monitored on the network, many specialist vendors say the answer is simple: everything. For example, ExtraHop claims its platform uses real-time analytics to auto discover and classify every asset in the enterprise, map all connections and dependencies, and monitor traffic flow at up to 100Gbps.

When asked exactly what should be monitored on the network, many specialist vendors say the answer is simple: everything. For example, ExtraHop claims its platform uses real-time analytics to auto discover and classify every asset in the enterprise, map all connections and dependencies, and monitor traffic flow at up to 100Gbps.

Many network managers have a lot more than usual on their plates at present. 

As well trying to lasso networks that seem to be carrying a non-stop supply of traffic fed by an ever-growing number of devices(both authorised and unauthorised), there’s the never-ending quest to keep on top of security issues, facilitating ongoing digital transformations as well as other projects, and all while dealing with day-to-day issues and trying to please the taskmasters.

When asked to identify the current pain points when it comes to monitoring LANs, many of the companies that specialise in network performance analytics that we spoke to agreed that the move to cloud-based services is certainly making things trickier.

For instance, Ian Hameroff, director of product marketing at  ExtraHop, says the complexity and surface area of today’s enterprise networks are growing at a rapid rate, and it’s not only what’s behind the firewall that’s critical: “It’s also the fact that every organisation is now a hybrid enterprise with a healthy mix of on-premises, edge, and cloud investments. 

“NetOps professionals often find themselves metaphorically stuck watching all of their haystacks grow, while they struggle to know where to begin when the business expects them to quickly find theneedle if there’s a security, user experience, performance, or compliance issue.”

Steve Brown, director of solutions marketing at VIAVI Solutions, says the decision to move networks and applications to hybrid clouds or maintain on-premise services is typically made outside of the network team. He points out that this creates visibility, control and resolution challenges. “The movement to the cloud means rethinking how you view, assess and troubleshoot performance and user experience. These same visibility challenges can be seen as the network pushes to a more software-defined model, and the question is whether visibility and performance management is built-in during deployment or bolted on after it.” 

Jay Botelho, director of products at Savvius (which has now been acquired by LiveAction – see News, p4) agrees that the accelerating move towards private/public cloud application deployment is leading to significantly decreased network and application visibility. 

He says: “Most network performance management and diagnostics (NPMD) solutions on the market today were designed and optimised for ‘traditional’ on-premise, data-centre-style deployments. These provide a good view of data going to and from the cloud (private or public) but are completely blind to traffic within the cloud, hiding critical data needed to troubleshoot problems like poor application performance.”

For Edmund Cartwright, sales and marketing director at Highlight, Wi-Fi is probably the biggest pain point when it comes to LAN monitoring. Unlike switched and wired networks, which tend to be predictable and stable, he says Wi-Fi is “highly susceptible” to interference and signal strength problems, despite increasingly being the path of choice for people working on-site. “A wide range of vendors with no clear market leader, and rapidly changing management interfaces as people scramble to develop their offerings, makes monitoring Wi-Fi a real challenge.”

Enterprise cloud networking specialist Aerohive also considers the oft reported issue about the Wi-Fi not working as a typical problem. “Any networking professional out there will instantly understand the frustration behind this complaint because it provides little context,” says the company’s product marketing director Mathew Edwards. “In actuality, it could be a number of issues – incorrect credentials, DHCP server out of leases, unresponsive DNS server, account lockout in AD, WAN-related issues, etc. It could also be something Wi-Fi related – poor signal strength, low S/N ratio, incompatible encryption, and much more.”

Edwards continues by saying that once enough information has been gathered, the network manager begins the troubleshooting process. “You might check the DHCP server, AD account, Wi-Fi conditions in that area… the list goes on. All that time and effort is spent attempting to resolve a simple issue for a single client.”

And as Edwards goes on to highlight, when you’re dealing with an organisation with hundreds or thousands of user complaints, each with varying degrees of complexity, the IT team suddenly ends up with a major crisis to resolve.

That’s assuming that they can get all the data they need to resolve the issue in the first place. Botelho believes another pain point in monitoring LANs lies in the depth of information available in today’s monitoring systems. He says NPMD solutions are increasingly relying on flow-based data to provide networking monitoring information. Although this is very useful and in most cases plentiful, he says it often lacks sufficient detail to truly determine the root cause of a problem.

Clearly then, and in the words of Paessler’s UK&I head Martin Hodgson, network performance has never been more crucial, whether it’s at the core, edge or WAN. He adds: “It’s important to consider and prepare for the influx of new devices and the extension of responsibility to items outside the network manager’s usual scope, such as CCTV, IoT and BMS, for example.”

Paracetamol for the pain

According to Mark Boggia, partner for learning and development at Nexthink, having an end-to-end monitoring solution that centres itself around the end user experience is critical. He says this essentially represents a customer-first approach and, importantly, is not presumptive. 

“When you’re a hammer, everything looks like a nail. So, when you’re just using a network monitoring solution, you tend to assume that you’re having network problems, when in practice it could be something as mundane as running out of disk space, an application crashing, or an altogether different and relatively minor issue.”

Boggia believes network monitoring needs to be positioned as a component to a service that a user is consuming. “At a metric level, we typically see latency and slow response times as good indicators, which typically could also be due to related issues such as how shared storage has been configured in the environment and not the network itself. For this reason, it’s important to start with a holistic approach that puts the end user first.”

Brown also highlights the importance of prioritising people as well as processes. He says that the questions that need to be asked here include if there is any active communication and decision-making between all the IT stakeholders when new initiatives are rolled out; and if the server teams are working with the application and network teams when migrating to virtual or cloud environments to ensure the service functions as expected. “Being transparent and having active discussions during the roll out of initiatives before implementing configuration and network changes can reduce unexpected performance problems.”

Next, Brown says there should be performance solutions in place that can be used by any team from the network to support positive end user experiences in hybrid environments. “This is where visibility, ease of use, and accurate analysis are key to providing teams with the right information to prioritise and solve performance issues.”

Of course, the subject of data security is never far away from all things networking, and especially when it comes to monitoring. While new data rules and legislation may be a hassle for IT teams to implement, they also present a good opportunity for them to re-assess all aspects of the enterprise network infrastructure.

For instance, Paessler’s Hodgson says: “As many vendors have used GDPR to focus on single elements of threat and exposure, now is the time that you really need to know your entire stack – from the physical aspect of the network upwards. And, of course, where additional levels of protection have been drafted in, these add an additional layer of complexity, something that the IT team need to monitor and develop.”

Dan Payerle Barrera, global product manager for data cable testers at IDEAL Networks, points out that network protection technologies like IEEE 802.1x, which requires any device to log in to the network at Layer 2, have unfortunately proven difficult to implement and manage. As a result, he reckons many organisations are abandoning these solutions.“This means that network managers need to be continuously monitoring their networks for unauthorised network devices. Many tools exist to accomplish such tasks ranging from PC/mobile device applications, to handheld testers and dedicated network security ‘boxes’.” 

Barrera says handheld testers provide a simple way to connect to a network, either using a wired or wireless link, and scan all the devices, generating a list that can be compared to previous scans or a list of known MAC addresses. 

But he goes on to warn that as most networks are configured to have some level of segregation between the wireless and wired portions, a network scanner that connects via wireless only may not detect intruders on the wired LAN. 

Barrera also warns that MAC level security is not 100 per cent secure. “Software tools allow network intruders to spoof another MAC address. Spoofing alters the data frames of an attacker’s computer and changes its own MAC address to be the same as and approved device that is already on the network.

“For increased protection, network managers can go so far as to track the association between MAC addresses (permanent hardware ID) and IP addresses (temporary software address). They can then be alerted when a device on the network is using an unauthorised combination of MAC and IP addresses.”

Monitor everything

So what exactly needs to be monitored on the network? 

Aerohive’s Edwards says it’s firstly essential to understand users and their associated devices. This will be a by-product of UID-based authentication (PPSK or 802.1x, for example). 

Secondly, he says managers need the information that relates to a client’s connection and journey on the network – IP address, connected SSID (or switch/port for wired), location, and connection health (for wireless, this could be S/N ratio, RSSI, data rates, frequency, etc.). 

Thirdly, something along the lines of a packet capture tool will be needed to record network traffic, interpret that information, and then relay it for the purposes of monitoring and diagnoses.

Brown’s advice is to have visibility into performance wherever services are hostedand users reside. Hodgson echoes this view and says that when asked what needs to be monitored, the answer is simple: everything. 

“That’s the first enquiry we get when we discuss our product with an administrator. You want to be able to monitor the physical aspects, such as whether a device is responding, the status of the hardware within it, and how it’s performing. 

“Then you want to step this up a level and see a management view of what’s going on. Flow-based views are a good example. Following this, you would need to get the full instrumentation of your OS, virtual and application stacks. For most of our users, the ability to dynamically change in an instant what their focus is, is compelling.”

Savvius’ Botelho also agrees that you need to monitor “everything and everywhere”, adding that it is important to collect as much data as you can and to find a solution that can do just that. 

“In the past, solutions only focused on a specific type of data, such as just packet, just flow, or just SNMP. But today’s modern solutions can collect data from all three sources simultaneously, aggregating the data into a single view that indicates overall device health (SNMP), general network behaviour (flow), and specific details for root cause analysis (packets). 

“And the more sources of this data, the better. More measurement points lead to a more complete network view, and significantly better metrics, especially for KPIs like network and application latency.”

ExtraHop’s Hameroff points out that while everything needs to be monitored, not all packets are created equal. He believes every organisation needs to identify and understand what the most critical assets that impact user experiences and the delivery of business are. They then need to get a firm grip on the dependencies that make those experiences possible. 

“No organisation can afford to have tunnel vision, and having a way to effectively know how to balance what’s most critical from a performance and security perspective can look like trying to score a goal from outside the stadium,” says Hameroff. “You may know where the target is, but the chances of getting the ball to bulge the net seems nearly impossible.” 

So how do you get the “net to bulge”, especially  – as VIAVI’s Brown says – as engineers don’t often know what they need until they are knee-deep into assessing a problem? 

“Network monitoring should be capturing all wire data and then automate the process to help the engineer to decide what data they need to solve the issue,” he says. “The network monitoring platform should be smart enough to actively flag the issues, provide guidance to the correct root causes, and granularity to solve the issue.

Brown reckons that many tools currently on the market provide “very limited” perspectives to address only infrastructure, system, or application optimisation. From this, he says the engineer cannot assess user experience or contextually understand service performance.

So what should network managers look for when it comes to choosing a network monitoring platform? Hameroff “strongly” encourages them to find a visibility solution that offers three things: scalability; real-time, definitive insights (not just more alerts or log entries that require data scientists to uncover anything actionable); and a solution that can provide a fluid, simplified investigatory workflow. “It should be one that starts by giving you the needle instead of asking you to start with the haystack, i.e., not beginning the investigation from a pile of potentially unrelated packets.”

He goes on to warn readers to be wary of solutions that only look at Level 2 to Level 4. “User experiences are not solely built on whether packets can flow between hosts. It’s critical to have the context of the entire conversation, from L2 through L7. Otherwise, you’ll only be able to confirm that a conversation took place – you will not know if the conversation was in the right language or actually accomplished the expected outcome if you just consider the network flows.”

Highlight’s Cartwright says some network and managed service providers are beginning to invest in SaaS tools that are “truly multi-tenanted” and designed to be used by the provider and the enterprise network manager. He says this allows both the users of a service like Wi-Fi and the provider delivering it to see a common picture and collaborate on fixing both short- and long-term issues. So for Cartwright, finding a provider that is focused on enterprise customer service through ensuring visibility and transparency is crucial to overcoming the challenges noted earlier.

IDEAL Networks says a handheld tester, such as its LanXPLORER PRO for example, can detect issues where two different devices are using the same IP address or where two IP addresses are coming from the same MAC. Both are indicators of an unauthorised device on the network.

IDEAL Networks says a handheld tester, such as its LanXPLORER PRO for example, can detect issues where two different devices are using the same IP address or where two IP addresses are coming from the same MAC. Both are indicators of an unauthorised device on the network.

Meanwhile, IDEAL Networks’ Barrera says that when managers are choosing testers, they should opt for models that are in-line or dual port. “Users are often disappointed when they plug their network monitoring tester into a switch port and see a low level of network traffic. By design, only special broadcast traffic or traffic intended for a specific network device is sent to a physical network port. Therefore, it is impossible to see all network traffic by simply plugging into a switch.”

According to Barrera, in-line or dual port testers are designed to sit silently between any two points in the network, like a switch and a router, and monitor the packets going back and forth. “It is with this type of connection that all devices and the total bandwidth can be monitored. Think of it as a police checkpoint. Very few vehicles can be inspected if a checkpoint is positioned on a rural road. They will encounter only the vehicles travelling to and from those homes. For maximum effectiveness, a checkpoint needs to be positioned between two main thoroughfares.”

Paessler’s Hodgson says managers should consider the following when choosing a monitoring platform: “Agentless is essential; the concept of having to install and maintain anything other than the fewest possible number of software components is a given.”

He adds that you should also look for support for standard protocols, flexibility of deployment options, and inclusion of all features without the need to purchase enhanced elements.

Edwards adds a caveat to Hodgson’s latter point when he says: “Avoid solutions that provide all the bells and whistles but require you to be a vendor-specific expert to actually translate and dig through all the information.”

He also says that as user complaints about the network are often raised hours or even days after the event, managers should use a platform that provides historical insight, enabling them to view the information for a relevant point in time. 

Edwards continues by saying that at a fundamental level, you should look for a platform with some kind of packet capture capability. “Taking things to the next level, you want something with an automated and proactive means to relay this information. I cannot stress enough the difference this makes. Rather than waiting for user complaints, you’ll be informed of detected issues before anyone has a chanceto bring it to you. You’ll identify issues that may otherwise go undetected.” 

More artificial than intelligent?

On the subject of automation, there has been much talk of artificial intelligence of late – will this have an impact on network monitoring?

“AI and machine learning are the latest buzzwords and they are being used in the context of monitoring and troubleshooting,” says Edwards. “Be careful you don’t get burnt by AI-driven tools that help with basic tasks but lack any real value when it comes to getting to the bottom of complex items. 

Hameroff is also sceptical: “Unfortunately, we’ve entered an era where just about everyone speaks of ‘AI’ as the answer. However, we’d encourage your readers to be suspicious of overly bold claims that cannot be backed up by tangible substance. There’s a lot of ‘AI-washing’ rhetoric out there.”

But he is far more upbeat about the concept of advanced machine-learning and says this will have a sizeable impact on networking monitoring and will become a necessity. 

“All NetOp pros have lived with the metaphorical ‘alert cannon’ created by trying to craft the perfect trigger or alert based on some known condition that may take place on the network. 

“But, our networks and user experiences are too dynamic and the surface is way too large for any human-driven process. Machine learning (true machine learning, not just basic pattern matching) is critical to proactively squelching the noise, getting the signal, and also making it immediately actionable whether it impacts the performance or security of an user experience. There’s truly no other way to keep up with the petabytes of analytical data that can be extracted from even the most common enterprise networks.”

VIAVI’s Brown also agrees that at this stage, it is more appropriate to discuss adaptive machine learning: “For example, in performance management solutions we have the ability to take disparate measurements and data sets that contribute to user experience and based on algorithms that understand what’s normal and acceptable within our customers’ environment. This presents a single user experience score along with a plain worded explanation of where failures are occurring.”

He’s also keen to point out that adaptive machine learning doesn’t replace the human: “Its aim is to empower any level of engineer to solve more issues in less time by eliminating the issue of deciding where to begin troubleshooting. Right now, engineers are drowning in data and too many key performance indicators, and they don’t know where to begin troubleshooting. Think of it as ‘analysis paralysis’. Through adaptive machine learning, easy-to-read analysis and performance visualisations, and ‘three-clicks-to-fix’ workflows, our solutions guide the engineer to the right answer and resolution.” 

Others are not so dismissive of AI. For instance, Nexthink’s Boggia says it will drive more predictive type use cases in all areas, including network monitoring where it can be used to forecast potential outages or other problems.

Aerohive’s Edwards supports this view. He says AI affects more than just monitoring and is a “really exciting” evolution. “Bringing AI to network monitoring has the potential to revolutionise the way we manage our networks. Imagine a platform that doesn’t just provide technical information relating to a problem but plainly states exactly why a problem is occurring and how to fix it (better yet, resolves the issue automatically).

“The ultimate goal is for networks to be AI-driven, aided by machine learning. From a monitoring perspective, AI-driven networks provide information that is genuinely insightful. No longer are you told ‘you have five authentication issues’; but that: ‘Sally has entered her password incorrectly five times, this is because her password expired yesterday, would you like me to send a one-time password to Sally’s mobile number?’.

Savvius’ Botelho also believes AI has significant potential in the network monitoring space, especially with NPMD solutions that can use the technology to process huge volumes of information from multiple sources representing large numbers of flows, devices and endpoints. 

However, he also says that vendors have been promising predictive network analysis (which is essentially what AI is expected to provide) for well over a decade, and it has yet to be delivered. “Although AI could get us there, every network is unique, and one network’s problem is another network’s baseline. It takes a tremendous amount of data over a long period of time to truly get to a point where AI can begin to predict what is truly anomalous on a network.”

Future monitoring 

So apart from the potential of AI and machine learning, what else does the future hold when it comes to innovations in network monitoring?

Hameroff says that as enterprise networks continue to grow in complexity, scale and surface area, as well as transform into policy-driven and orchestrated fabrics that effectively abstract Layer 2 to Layer 4, it will become increasingly more important to avoid approaching them from a “very narrow, potentially myopic” scope.

“IT professionals will soon find themselves in a very different place if they fail to recognise that the success of digital business initiatives are predicated on both delivering a compelling, differentiated and highly satisfactory customer experience, and establishing and maintaining trust – all without compromising the organisation’s agility, operational efficacy, or scalability.”

Nexthink’s Boggia is likely to concur here when he says that integration or consolidation into monitoring platforms that allow for multiple perspectives to be considered is increasingly important. 

“It is not enough to have just one answer from one tool and another from a different one, and to have to do all of the joining together and thinking for yourself.

“We are seeing a shift away from traditional SLAs as the measure of service, and an increasing trend towards xLAs where the user experience itself is monitored.”

Botelho echoes this last point as he concludes: “End-user experience will become the most important metric used to evaluate the performance of a network. As enterprises move into the cloud with SaaS vendors, QoE is the unifying metric that must be satisfied to ensure a truly high-performance network.”