World faces ‘biggest internet outage in history’

19 July 2024

TV channels, transport networks and banks around the world have been knocked offline in a massive outage causing Windows computers to suddenly shut down.

Individuals have reported that their Windows computers are suddenly showing the “blue screen of death” and entering recovery mode. The issue mostly seems to be impacting PCs running on Windows 10 software.

A software problem with Crowdstrike Falcon Sensor is reportedly to blame for the outage. “Crowdstrike is aware of reports of crashes on Windows related to the Falcon Sensor,” said Crowdstrike on its website.

Microsoft has issued a statement announcing that it would take 'mitigation actions': "Our services are still seeing continuous improvements while we continue to take mitigation actions. We still expect that users will continue to see gradual relief as we continue to mitigate the issue."

In the UK, National Rail warned commuters to expect delays and cancellations with the IT outage. The organisation highlighted a “network wide IT issue affecting a number of train operators services this morning.”

Avanti West Coast, Gatwick Express, Great Northern, Great Western Railway, Hull Trains, London Northwestern, Lumo, Northern, Southern, Thameslink, TransPennine and West Midlands were all impacted.

Some train operators are unable to access driver diagrams at certain locations, leading to potential short-notice train cancellations. Other key systems including real-time customer information platforms, are also affected.

Govia Thameslink Railway (GTR) - parent company of Southern, Thameslink, Gatwick Express and Great Northern – said that it is currently experiencing widespread IT issues across the entire network and is unable to access driver diagrams at certain locations, leading to potential short-notice cancellations, particularly on the Thameslink and Great Northern networks.

On Ryanair’s website, the company urged passengers to arrive at airports three hours early blaming a “third party IT issue, which is outside Ryanair’s control and affect all airlines operating across the network.”

As of 10.30am, more than 1,000 flights have been cancelled globally.

GP surgeries have said they are unable to access patient records or book appointments as they cannot access the EMIS Web system, the most widely used clinical system for primary care in the UK. The Wilmslow Health Centre in Cheshire wrote on X that "all practices in the UK using the NHS commissioned GP computer system EMIS are currently without access to their IT systems. This is beyond the control of GP surgeries."

In Australia, banks have reported issues with online services and transactions, and customers at supermarkets have been unable to buy their weekly shop as checkouts have been hit by the IT outage.

In the US, there are reports that 911 emergency lines are down in several states. Alaska, Arizona, Indiana, Minnesota, New Hampshire and Ohio have reportedly been hit by the global IT outages.

Industry expert Troy Hunt, an Australian computer expert who runs the website ‘HaveIBeenPwned,’ said that “I don’t think it’s too early to call it: this will be the largest IT outage in history. The financial impact of this is already hard to fathom.”

Industry reaction

The global IT sector didn't wait long before statementes began to be issued.

"The current event appears – even in July – that it will be one of the most significant of cyber issues of 2024. The damage to business processes at the global level is dramatic. The glitch is due to a software update of CrowdStrike's EDR product. This is a product that runs with high privileges that protects endpoints. A malfunction in this can, as we are seeing in the current incident, cause the operating system to crash," says Omer Grossman, CIO at CyberArk.

There are two main issues on the agenda, says Grossman: the first is how customers get back online and regain continuity of business processes. It turns out that because the endpoints have crashed - the Blue Screen of Death - they cannot be updated remotely and this the problem must be solved manually, endpoint by endpoint. This is expected to be a process that will take days.

"The second is around what caused the malfunction? The range of possibilities ranges from human error - for instance a developer who downloaded an update without sufficient quality control - to the complex and intriguing scenario of a deep cyberattack, prepared ahead of time and involving an attacker activating a 'doomsday command' or 'kill switch.' CrowdStrike's analysis and updates in the coming days will be of the utmost interest," adds Grossman.

According to Jake Moore, global cybersecurity advisor, ESET, “many people are quick to suspect a cyberattack or find similarities to Netflix’s 'Leave The World Behind' but this can often add to the confusion. It highlights the importance of these services and the millions of people they serve. Businesses must test their updates and infrastructure and have multiple fail safes in place, however large the company is. But as often it is with the case, it is simply impossible to simulate the size and magnitude of the issue in a safe environment without testing the actual network. The inconvenience caused by the loss of access to services for thousands of people serves as a reminder of our dependence on Big Tech in running our daily lives and businesses. Upgrades and maintenance can make systems and networks more vulnerable to small errors, which can have wide-reaching consequences as demonstrated today. Another aspect of this incident relates to 'diversity' in the use of large-scale IT infrastructure. This applies to critical systems like operating systems (OSes), cybersecurity products and other globally deployed (scaled) applications. Where diversity is low, a single technical incident, not to mention a security issue, can lead to global-scale outages with subsequent knock-on effects.”

“The Crowdstrike outage goes to show the need for diligent quality assurance when issuing an update but it also serves as a warning about the risk associated with relying on single providers and complex cyber ecosystems. While a fix has been issued, those with additional security in place are finding it difficult to rollback their systems and get them back up and running," agreed Jack Porter, public sector specialist at Logpoint. "Long term this has the potential to see such software dependencies regarded as an additional risk. Large cybersecurity vendors may now be included with the likes of digital service providers such as AWS, Microsoft and Google services as key suppliers by insurance companies as this has illustrated the devasting impact a security software failure can have.”

Notably, virtualDCS’s CEO, Richard May, and CTO, John Murray, report that this outage won't be a quick fix, potentially taking weeks.

“It appears that the issue is not the infamous ‘blue screen of death’, but rather a boot loop preventing machines from starting up. This raises questions about why so many other infrastructure elements are also affected," said May. "From my perspective, Microsoft might be at fault here. It seems Crowdstrike is blocking the operating system from starting. Without booting up, these machines can’t connect to the network, meaning they’ll all need manual fixes – potentially taking weeks. There’s also been an Azure outage, though there’s no concrete evidence linking the two issues. Despite speculation, I don’t think they’re connected. A major concern is that people might start disabling Crowdstrike to get their systems running, which could leave them vulnerable to cyberattacks. This situation might be exploited, creating significant risks.”

Ultimately, "this Microsoft IT outage demonstrates the need for more robust and resilient solutions, so that when these issues do arise, they can be resolved quickly without causing such widespread customer chaos and security risk," says Mark Jow, security evangelist, Gigamon. "Preparedness is key - every IT and security vendor must have a robust system in place across its software development lifecycle to test upgrades before they are rolled out to ensure that there are no flaws within the updates."

Check back later for updates as this story develops…