Crowdstrike outage and global software’s single-point failure problem

Large-scale attacks on corporate IT are becoming more frequent. This is neither unusual nor unexpected as companies spend billions on cyber defenses, preparing for an asymmetric war against hackers who can wreak havoc with just a few lines of code.

But Friday’s biggest-ever IT outage, caused not by a malicious attack but by a bug in CrowdStrike software uploaded into Microsoft’s operating system, illustrates a type of technological threat that, alongside hacking, is on the rise but receives less attention: a massive domino effect in which an error in one part of a system causes technological disaster across an industry, function or interconnected communications network.

Earlier this year, AT&T suffered a nationwide outage due to a technical update, and last year the FAA experienced an outage when a route update was disrupted after one person replaced a critical file (the FAA now has backup systems in place to ensure that this never happens again).

“Even in the case of routine patching and updates, they’re happening more frequently,” Chad Sweet, co-founder and CEO of The Chertoff Group and a former chief of staff at the Department of Homeland Security, told CNBC on Friday.

Due to a global communications outage caused by CrowdStrike, a company that provides cybersecurity services to the American technology company Microsoft, on July 19, 2024, some digital billboards in Times Square in New York City, USA, were observed to display blue screens and some screens to go completely black.

Selcuk Acari | Anadolu | Getty Images

Single-point-of-failure risk management is an issue that companies need to plan for and protect against. There is no software released that doesn’t require patches or updates later, and best security practices exist to cover ongoing software maintenance even after a product is released, Sweet said.

Companies that Chertoff Group works with are taking a hard look at software development and update standards in the wake of the CrowdStrike outage. Sweet pointed to a set of protocols already offered by the government, the Secure Software Development Framework (SSDF), that could provide an indication of what to expect in the market as Congress begins to take a closer look at the issue. That’s likely after a string of recent incidents, from AT&T to the FAA to CrowdStrike, that show the far-reaching impacts that these types of technical failures can have on civilian life and the operation of critical infrastructure.

“Businesses need to be prepared,” Sweet said.

Aneesh Chopra, chief strategy officer at Arcadia and a former White House chief technology officer, told CNBC on Friday that critical sectors like energy, banking, healthcare and aviation have separate regulations monitoring risk, and measures may be unique in the most heavily regulated sectors. But the question now for any business leader is, “If the system goes down, what’s the second plan? You’re going to see a lot more scenario planning. If that’s not job number one, outlining the scenarios is job number two or third,” he said.

Former White House CTO Aneesh Chopra on massive global tech outage: 'This is a wake-up call'

Chopra noted that unlike many issues in Washington, there is bipartisan work on issues of critical infrastructure and systemic risk, and that technical standards are a “hallmark” of the U.S. system. There may be efforts underway now designed to “improve competition” as a means to strengthen accountability, he said.

“If there was a mechanism for updates in a more open and competitive way, there might be pressure to make sure it’s done in a meticulous way,” Chopra said.

Sweet said that will inevitably lead to concerns in the business community about the risks of overregulation. While there’s no way to know for sure right now whether CrowdStrike could have operated with a more open process that would have allowed for detection of single points of failure, he said it’s a legitimate question.

The best way to avoid overregulation, according to Sweet, is to look to market-enforcing mechanisms like those in the insurance industry. “The short answer is, ‘Let the free market do its thing through something like the insurance industry, which rewards good actors with lower premiums,'” he said.

Sweet also said more companies, like his clients, should embrace the idea of ”antifragile” organizations, a term coined by risk analyst Nassim Nicholas Taleb. “It’s not just organizations that are resilient after disruptions, but organizations that thrive, innovate, and outperform their competitors,” he said. In his view, any law or regulation would be hard-pressed to address both malicious attacks and technology updates that have unintended consequences.

“This is certainly a wake-up call,” Chopra said.

Subscribe for Updates

What's Hot

Crowdstrike outage and global software’s single-point failure problem

Related Posts