Cyber resilience: key takeaways from a global IT outage
It’s now been a few weeks since one of the industry’s largest IT outages affected airlines, hospitals and businesses worldwide. We have all seen the reports on the impact. But what is also very evident about this event is that it could easily have happened to any software provider.
CrowdStrike is one of the most revered cyber vendors in our industry but at the end of the day it is a software provider. Software is written by humans and neither software nor humans are perfect. As a community we spend a lot of time looking to detect and prevent events caused by bad actors. We see here that even when there is no malicious actor involved, an error of a critical application or related process can cause widespread impact.
No application is invulnerable
That a software package could introduce a quality issue in an update or misconfiguration is not, in and of itself, news. Veracode reported in 2023 that 70% of software applications they examined still had at least one identifiable flaw five years after shipping. Also, Synopsys found that vulnerabilities existed in as many as 92% of applications tested.
I would go as far to say this number should really be 100%.
The issue is complexity
As we have described in our Cyber Resilience Risk Index 2024 Report, the complexity we face in this industry is enormous. For example, there are hundreds of variants of Windows X, and over 100 applications installed on every computer — each with its own patches and fixes and updates, on a variety of networks, connecting to a profound number of peripherals. There is not a test matrix on the planet that will get you to perfection. And even if an application were to be perfect the day it shipped, it changes over time through usage, updates, combinations with other applications, upgrades and patches … resulting in new flaws.
Where you have complexity, you will find risk
I am in no way absolving technology vendors from their obligation to deliver quality products — especially in mission-critical applications. Clearly security and quality from design to delivery is crucial. However, there is value in understanding the reality of this situation: that this is not purely a software quality or update process problem. Where you have complexity, you will find risk — and where there is risk, your resilience plan is as critical to your business continuity as your detection and prevention plan.
What can we responsibly do?
As we reflect on the aftermath of this historical global outage, what can we responsibly do across our industry to better mitigate these types of events?
- Technology providers: Technology providers should continue striving for quality and security right from design, incorporating resilience strategies into customer success plans and roadmaps. In addition, by enabling applications to automatically remediate problems and maintain health, organisations can ensure they can address issues responsibly.
- Enterprise customers: End-user organisations need to implement resilience strategies in their environments by conducting thorough tabletop exercises that extend to business continuity and disaster recovery (BCDR) plans. They should also utilise built-in capabilities already within the devices they own to remediate or restore devices promptly.
- Shared responsibility: Managing risk requires partnership and collaboration. All parties should leverage tools that enhance resilience today and work together to eliminate complexity over time.
- Help, don’t harm: In a world of tightening budgets and increasing competition, it’s tempting to point fingers when something goes wrong; it’s harder to find a productive way to assist victims in any given situation. The difficult challenge, however, is the path that will lead us all to a more prosperous outcome: in the interconnected world of hardware and software, we are all interdependent on overall success.
Cyber resilience is critical in our complex digital world
As organisations clean up from the latest event, the key takeaway is the critical need for investing in cyber resilience in our highly complex digital world. Whatever the next event may be, will you be ready with a plan and the tools required to return your business and get users back online quickly, safely and effectively?
Navigating tech catastrophes: five key lessons from the CrowdStrike outage
As organisations continue to recover from the CrowdStrike incident, it is essential to reflect on...
What is the cost of a false alarm when it comes to data issues?
A documented triage process is necessary in order to weed out any misunderstandings and false...
Balancing innovation with integrity to build trust in AI
Given the promising reports on AI one might assume its widespread adoption is inevitable,...