The events of 2025 have challenged long-held assumptions about cloud resilience. Many organisations have implemented widely accepted best practices, multi-availability zone architectures, multi-region deployments, and cloud-native backup solutions, yet still experienced prolonged disruption.
In each case, the underlying issue was not a failure of compute or storage. Instead, failures occurred in control-plane services such as DNS, identity, and edge routing. These components, often assumed to be inherently resilient, have emerged as critical single points of failure.
Control-Plane Failures: The AWS DynamoDB Outage
In October 2025, AWS’s us-east-1 region experienced a major outage originating from a DNS failure affecting DynamoDB. An automation issue caused incorrect DNS records to be published, preventing clients from resolving the DynamoDB regional endpoint.
Although the DNS issue itself was resolved within hours, the impact persisted far longer. Several AWS services depend on DynamoDB for state management and coordination, and during the outage these systems accumulated significant backlogs.
As a result:
- EC2 instance launches were impaired well after DNS recovery
- Network Load Balancer health checks continued to fail
- Customer applications experienced intermittent failures during the recovery period
This incident illustrates a key resilience challenge: restoring a control-plane dependency does not immediately restore service. Recovery is constrained by the time required to reconcile accumulated state across dependent systems.
Identity Availability and Edge Dependencies: Azure Front Door
Later in October, just when the world had gotten over AWS, Microsoft experienced a global outage. It was triggered by a configuration error in Azure Front Door. Azure Front Door is the edge service used to route traffic to many Microsoft platforms.
The misconfiguration propagated rapidly, causing widespread routing failures. Although Microsoft Entra ID remained largely operational at the backend, users and administrators were unable to reliably access identity services.
The impact included:
- Failed sign-ins to Microsoft 365 services
- Reduced availability of the Azure Portal
- Authentication failures across consumer and enterprise applications
From a customer perspective, the distinction between an edge failure and an identity failure is irrelevant. If identity services are unreachable, the cloud environment is effectively unavailable.
Implications for Security and Risk Leaders
Taken together, these incidents demonstrate that control-plane and supply-chain dependencies now represent material resilience risks. Traditional approaches focused solely on infrastructure redundancy are no longer sufficient.
Security and risk leaders should consider the following priorities:
Design for control-plane failure
Resilience planning should explicitly account for the loss of DNS, identity, and management planes, not only regional infrastructure.
Establish identity break-glass mechanisms
Emergency access paths should exist that do not rely exclusively on a single cloud identity provider or edge service.
Treat vendor updates as high-risk changes
Critical agent and platform updates should follow staged rollout, testing, and approval processes, with the ability to halt or roll back rapidly.
Map and test systemic dependencies
Dependency mapping should include control-plane services, edge routing, and third-party security tools, and these dependencies should be incorporated into resilience testing.
Final Thoughts
Cloud resilience in 2025 is no longer defined by the number of regions or availability zones an organisation can deploy. It is defined by how well the organisation can operate when control planes, identity services, or critical suppliers fail.
The cloud remains a powerful platform—but it should not be assumed to function as a backup strategy in its own right. Designing for failure at the control-plane and supply-chain level is now a fundamental requirement of modern security and resilience planning.
Discover more from The Security Brief
Subscribe to get the latest posts sent to your email.
