CrowdStrike: what happened?

At the end of last week, a software update issue by IT security firm CrowdStrike caused widespread major functionality problems for IT users around the world. CrowdStrike had released an update for its Falcon sensor software, a critical endpoint (IT device) security solution. Shortly after the update was deployed, users began reporting major problems with their computer systems.

Who was affected? 

The outage affected thousands of organisations around the globe that rely on Microsoft operating systems protected by CrowdStrike software. This includes hospitals, doctors’ surgeries, retailers, airports, rail operators, news and media outlets and financial institutions. Many of these organisations had no access at all to their computer systems, meaning no access to vital information and processes such as medical records and payments. According to Microsoft’s estimates, the incident has affected more than 8.5 million devices around the globe.

What now? 

In the immediate aftermath, CrowdStrike recommended that affected users roll back to a previous stable version of the software, while a fix was being developed. Within a few days, the firm released a patch to address the performance and stability issues. In a statement, CEO of CrowdStrike, George Kurtz said “This is not a security incident or cyber attack. The issue has been identified, isolated, and a fix has been deployed.” The issue only impacted Microsoft-based users of Crowdstrike – other operating systems such as Mac were not affected by the outage. 

Compatibility issues

The update to CrowdStrike’s Falcon sensor exposed compatibility issues with certain versions of Microsoft Windows. This led to functionality problems with devices, including high CPU (central processing unit) usage and system instability. Many users were met with a blank blue screen when they tried to log in on Friday morning. High CPU usage is linked to long loading times and can cause the computer to repeatedly crash. It is usually caused when a computer has to work too hard, perhaps due to running too many apps at one time, or running a very high-intensity app. 

Damage control

CrowdStrike and Microsoft had to collaborate closely to diagnose the root cause of the problem and develop a patch to resolve the compatibility issues quickly. Both firms provided guidance to their mutual customers on how to mitigate the issues temporarily, such as rolling back to a previous version of the Falcon sensor, or applying specific Windows updates until the patch was available.

Were financial institutions affected? 

A number of financial institutions reported issues due to the outage, but the impact seems to be minimal so far. Charles Schwab posted on its website on Friday: “Due to a third-party, global, industry-wide issue, certain online functionality may be intermittently slow or unavailable. We’re actively monitoring the issue. Phone services may be disrupted and hold times may be longer than usual.” Other banks reportedly impacted by the outage included Wells Fargo, TD Bank, Barlcays and Metro Bank. Barclays said on Friday that all of its services were “operating as normal at this time other than our digital investing platform Smart Investor, where customers are currently unable to manage their account in the app, Online Banking or over the phone.” Payments systems provided by Visa were also affected, with many supermarkets and other retailers unable to take card payments on Friday. 

What can we learn from this? 

The incident highlighted the interdependencies between security software and operating system components, emphasising the need for thorough compatibility testing and cooperation between software vendors and operating system developers to resolve conflicts like this swiftly. 

“This incident provides a timely reminder of the risks of digitisation and the continual drive for everything to go into the Cloud,” said Mike Finlay, CEO of RiskBusiness. “It is also an interesting case study for why the European Union (EU) has published its Digital Operational Resiliency Act (DORA) and why it is good risk management practice to have alternate arrangements in the event of such incidents.”

In the EU, both the Network and Information Security Directive 2 (NIS2) and DORA (which will be implemented for financial institutions in January 2025) require regulated organisations to take appropriate steps to manage cyber risk within their own organisations – and also in their supply chain. 

IT concentration risk

IT concentration risk in particular is  a key focus of DORA. This refers to organisations having an over-reliance on just a handful of critical vendors or suppliers around the world. “DORA will require financial services firms to assess their own internal IT concentration risk before entering into IT contracts,” says William White from Bristows LLP. “In particular, DORA focuses on the need for firms to have an understanding of how subcontracting affects concentration risk: a business may have two suppliers providing a similar service, but if they both rely on the same cloud provider then there may be a hidden single point of failure.

“DORA will also allow the European Supervisory Authorities (ESAs) to designate certain IT service providers as ‘critical ICT third-party service providers’ and establishes an oversight regime for them. A number of factors are to be taken into consideration, including the impact that an outage in the provider’s systems would have on the financial system given the number of financial entities relying on them. The legislation makes clear that it is targeted primarily at the major cloud vendors, but the CrowdStrike incident demonstrates that concentration on a small number of software vendors at the on premises and endpoint level can be just as risky as concentration on the cloud hyperscalers.”

Michael Veale, a computer science expert and associate professor at University College London says the incident will raise awareness about the vulnerabilities created by our reliance on technology, and third-party vendors in particular. “Modern software development is all about giving centralised, remote control of what code is running on your device to companies like CrowdStrike and Microsoft,” he told the i newspaper. “When this goes wrong, as we have seen, everything goes wrong. We’re building a dangerous digital monoculture, and like agricultural monocultures, these are very susceptible to blight and disease.” 

CrowdStrike confirmed this morning that a “significant number of devices” that were impacted by the incident are now back up and running. However, it did not confirm how many devices were still down. Who will be liable for the losses incurred by this event is still to be determined, but it is widely believed to be the largest IT outage in history. 

Facebook
Twitter
LinkedIn