Commerce save · Pattern story

The Stripe webhook regression. Rolled back in six minutes.

A 5.7.0 patch to a widely-used WooCommerce payment gateway plugin broke webhook signature verification on a subset of stores. Orders stopped reaching processing. Carts kept filling. Customers checked out, paid, and saw the wrong confirmation page.

The setup

A small fleet of WooCommerce stores. Each one running the same gateway plugin and the same caching plugin. Different themes, different products, same payment infrastructure underneath. The plugin's maintainer shipped a 5.7.0 patch overnight. It went live on the half of the fleet that auto-updates patches without an approval gate.

What broke

The patch tightened webhook signature verification in a way that rejected Stripe's standard webhook headers when the site sat behind a specific caching plugin. Symptoms looked like this:

The site itself looked healthy. Uptime monitors reported green. There were no PHP errors. There was no error log entry that a polling dashboard would have surfaced. The thing that broke is the thing only a live checkout would have caught.

What the agent did

00:00
Plugin 5.7.0 ships. Two stores in the fleet pull it on their nightly window.
00:14
Synthetic checkout fails on store A. Payment intent succeeds, order stays pending. Agent confirms on a second pass to filter noise.
00:16
Agent identifies the change inside the rollback window: the gateway plugin 5.6.4 → 5.7.0. Snapshot present from pre-update.
00:18
Rollback applied on store A. Synthetic checkout passes. Order completes. State restored.
00:19
Cross-fleet network signal fires: rollback on 5.7.0 detected. Every other site in the fleet has the upgrade flagged with a hold.
00:22
Store B's nightly update window opens. The upgrade is blocked before it ships. Single message to Slack: "Held pending plugin update on store B due to network signal."
00:30
All affected stores back to a working state. Audit feed has the diff, the rollback, the network signal, and the hold on the remaining sites.

What would have happened without the agent

The stores would have stayed in the broken state until a customer complaint reached support, until support traced it to the order being stuck, until an engineer was paged, until the plugin update was suspected and the gateway logs were checked. Median time-to-detect on this class of regression in a dashboard-only setup is six to nine hours.

Why this is the agent's job, not the dashboard's

A dashboard shows you the state of the world. If nothing surfaces an error, the dashboard stays green. What broke here was not a state problem. It was a flow problem. The only way to know is to run the flow. That's what an agent does and a dashboard can't.