Neben der Bekanntgabe in diesem Blog von Crowdstrike, dass es sich wohl um einen Bug gehandelt hat, finde ich allerdings auch folgenden Abschnitt (siehe Zitat) interessant. Ich meine die Liste mit den zu verbessernden Aktivitäten, um solche folgenschweren Vorfälle wie am 19.07.2024 zu verhindern, ist nicht gerade klein und es scheint, als müssten bei Crowdstrike doch noch einige Hausaufgaben gemacht werden, um in Kritis-Sektoren eingesetzt zu werden, ohne den Risikobehandlungsplan unübersichtlich lang werden zu lassen.
Zitat:
How Do We Prevent This From Happening Again?
- Software Resiliency and Testing
- Improve Rapid Response Content testing by using testing types such as:
- Local developer testing
- Content update and rollback testing
- Stress testing, fuzzing and fault injection
- Stability testing
- Content interface testing
Add additional validation checks to the Content Validator for Rapid Response Content. A new check is in process to guard against this type of problematic content from being deployed in the future.
Enhance existing error handling in the Content Interpreter.
Rapid Response Content Deployment:
- Implement a staggered deployment strategy for Rapid Response Content in which updates are gradually deployed to larger portions of the sensor base, starting with a canary deployment.
- Improve monitoring for both sensor and system performance, collecting feedback during Rapid Response Content deployment to guide a phased rollout.
- Provide customers with greater control over the delivery of Rapid Response Content updates by allowing granular selection of when and where these updates are deployed.
- Provide content update details via release notes, which customers can subscribe to.
In addition to this preliminary Post Incident Review, CrowdStrike is committed to publicly releasing the full Root Cause Analysis once the investigation is complete.