Building for Resilience: Why the AWS Outage had Minimal Impact on Square Systems

When Amazon Web Services (AWS) experienced a major outage earlier this month, many businesses across the globe saw widespread service disruptions.

When Amazon Web Services (AWS) experienced a major outage on October 20th, 2025, many businesses across the globe saw widespread service disruptions. Sellers using Square, however, remained largely operational. Since then, a number of our sellers, particularly enterprise partners, have reached out asking how we were able to maintain service continuity during such a large-scale event.

We’re sharing this post to provide transparency into what happened, how our systems responded, and the steps we’ve taken over the past few years to improve reliability. Events like this aren’t unexpected — they’re the kind of scenario we design for to ensure our sellers can keep operating even when major providers experience downtime. While this incident demonstrated the strength of those investments, we also recognise that no platform is immune to disruptions and there’s always more work to do.

What happened

On 20th October, 2025, Amazon Web Services (AWS), one of the world’s largest cloud providers, experienced a major outage in one of its data regions. This caused widespread disruptions for many businesses across industries.

At Square, we use AWS to power much of our infrastructure. However, Square made a strategic decision, as part of our ongoing effort to build as reliable a system as possible, to operate our most critical systems, like Payments, Square Point of Sale, Login, and Authentication, across multiple AWS regions to help protect against regional failures. This multi-regional design allowed us to, in many cases, automatically redirect our operations to an alternate data region, thereby limiting the overall impact of this outage.

When the outage occurred, our monitoring flagged the issue immediately, triggering a rapid resolution effort. Our on-call team was able to restore these services within about 30 minutes. During this time, in Ireland, the UK and the EU (European Union), sellers experienced a brief 20 to 30 minute period of intermittent checkout errors, while sellers who had our offline payments feature enabled experienced minimal disruption to transactions. It’s important to note that errors were intermittent; no one was blocked, and if you tried a second time, you likely got through.

While we built this regional redundancy and protection for many of our critical systems, we do not have it for all. Some non-payment systems, like phone support andrefunds relied solely on the affected AWS region and therefore were not fully restored until AWS brought its systems back. These are opportunities we are pursuing to further protect our systems against disruptions like this.

The AWS outage reinforced the importance of the underlying infrastructure choices we’ve made to strengthen resilience and reliability across our platform. While no system is immune to large-scale cloud disruptions, Square investing in multi-regional infrastructure, combined with continued investment in offline payments, local fallback capabilities, resiliency against vendor outages, and faster response times, ensured that while much of the internet was struggling, Square sellers stayed open for business.

The Square approach to reliability

Reliability isn’t a single feature - it’s an ongoing commitment. Every incident, no matter how small, offers lessons that inform how we strengthen our systems and processes. We’ll continue to invest in the architecture, testing, and tools that help sellers keep their business running - even when the unexpected happens.

Enhanced monitoring: We maintain comprehensive monitoring that alerts on-call teams to any potential disruptions to critical seller workflows, such as payments and authentication.

Architectural safeguards: Similar to multi-regional design, we are routinely identifying and testing ways to increase the resilience of our systems from incidents like regional or third party failures.

Coordinated response: During an outage, cross-functional teams rapidly identify the impact and implement fixes as quickly as possible to restore performance.

Offline Payments: Square allows sellers to take payments offline during an outage, minimising operational impact. If a seller has already enabled offline payments prior to an outage, they are proactively switched to an offline session during a disruption; sellers can also enable this functionality during an outage. We’re actively expanding our offline capabilities and working to bring offline reliability to more tools, including the Kitchen Display System (KDS).

Keeping sellers running smoothly during moments like this takes planning, testing, and collaboration across many teams at Square.

Our platform includes configuration tools, monitoring capabilities, and preventative controls that minimize disruptions.