AI Trends & Insights

Why AI Is Killing the A/B Test

A/B testing was the best tool available for optimising email. AI has made it the slowest one.

Shrestha Ghosal

June 18, 20268 min read

A/B testing email campaigns became standard practice because the alternative was guessing. Send two versions, wait for significance, pick the winner, move on. It felt rigorous. For a long time, it was the most systematic approach available.

AI has changed what systematic looks like. The logic that made A/B testing the default, that you need a controlled variable and a sample size to learn anything, still holds. What's changed is that AI can run that process across dozens of variables simultaneously, on live data, without the two-week wait. For DTC brands managing a send calendar and a retention programme at the same time, that's a meaningful shift.

This post covers what's actually happening to email A/B testing as AI optimisation becomes more accessible, where the old approach still holds, and what this means for how ecommerce brands should think about testing going forward.

What A/B testing was actually solving

alt text

Before getting into what AI changes, it's worth being clear on what A/B testing was doing well.

At its core, A/B testing gave email marketers a way to make decisions based on evidence rather than instinct. Subject line A vs subject line B. Send time Tuesday vs Thursday. One CTA vs two. The framework was borrowed from clinical trial methodology: hold everything constant, change one variable, observe the outcome.

For e-commerce email, this worked reasonably well when:

List sizes were large enough to reach statistical significance quickly
The variable being tested had a meaningful enough effect to show up in the data
The team had time to run tests sequentially and apply learnings before the next send

The problem is that most DTC brands fail at least one of those conditions. A brand with 20,000 subscribers testing subject lines needs weeks to reach significance on a 50/50 split. By the time the result is clean, the send has passed, the season has shifted, and the learning applies to a context that no longer exists.

For a 50/50 subject line A/B test to reach 95% statistical significance, most email platforms require a minimum of 5,000 recipients per variant. Brands under 50,000 subscribers often can't run clean tests on anything but their largest sends.

What AI does instead

alt text

AI optimisation in email doesn't replace the underlying logic of testing. It compresses the timeline and expands the variable set simultaneously.

Instead of sending two versions and waiting, AI models analyse past send data, engagement patterns, purchase behaviour, and segment-level signals to predict which version will perform better before the send goes out. And rather than testing one variable at a time, the model can assess combinations: this subject line with this send time for this segment, against that subject line with that send time for a different segment.

The practical output looks different depending on the platform. In Klaviyo, predictive analytics and smart sending already make decisions about send timing based on individual subscriber behaviour. More advanced AI layers, sitting on top of the platform, can take that further into content recommendations, subject line scoring, and segment-level performance forecasting.

For a fashion brand running four campaigns a month, this means the optimisation that used to require a dedicated testing calendar now runs continuously in the background. Each send informs the next one. The learning compounds without a human having to extract and apply it manually.

If you're still running manual A/B tests, start by auditing which variables you're actually testing and whether your list is large enough to reach significance within a single send window. For most brands under 30,000 subscribers, subject line tests on anything other than a full-list campaign are statistically inconclusive.

Where A/B testing still makes sense

alt text

AI optimisation isn't a reason to abandon structured testing entirely. There are scenarios where a controlled A/B test is still the right tool.

Structural decisions. When you're testing something that changes the architecture of an email, a single CTA vs two CTAs, plain text vs designed, one section vs three, the decision has long-term implications that go beyond performance on a single send. A proper test with a clean holdout is worth running for decisions like these.

New segments or audiences. When you're entering a new segment where you have limited historical data, AI models have less to work with. Running a structured test on a cold audience gives the model something to learn from faster.

Offer testing. Testing whether 10% off outperforms free shipping, or whether a bundle discount outperforms a single product discount, has commercial implications beyond email performance. These decisions benefit from a clean test rather than an AI recommendation built on historical patterns that may not reflect current pricing strategy.

Brand voice changes. If you're shifting your tone, updating your visual language, or changing how you talk about your products, a structured test gives you a clear before-and-after rather than a gradual drift that's harder to attribute.

Don't use AI optimisation as a reason to stop measuring. The risk with automated systems is that teams stop asking why something performed the way it did. AI can tell you what works. It takes a human to ask whether what works is what the brand should be doing long-term.

What this means for your Klaviyo testing setup

For most DTC brands running on Klaviyo, the practical shift looks like this.

Manual A/B testing on subject lines and send times becomes less valuable as smart sending and predictive analytics take over those decisions at the individual subscriber level. The time your team spent setting up split tests, waiting for results, and logging learnings can be redirected.

What deserves more attention is the upstream work that makes AI optimisation perform better: clean segmentation, accurate behavioural data, and a send calendar that gives the model enough signal to work with. A brand sending one campaign a month doesn't give AI much to optimise. A brand with a structured send cadence, segmented by purchase behaviour and lifecycle stage, gives the model a rich data set to learn from.

The second shift is in how teams think about personalisation. A/B testing is inherently a one-to-many decision: one subject line wins, everyone gets it. AI personalisation flips that. The right subject line for a first-time buyer in your wellness segment may be completely different from the right one for a VIP customer in your beauty segment, and the model can handle both simultaneously without a human having to segment and write five separate versions manually.

How to use AI to turn one campaign into five without sending more walks through exactly how that personalisation workflow runs in practice, from brief to segmented send.

The brands seeing the strongest lift from AI-assisted optimisation are the ones that had a clean testing foundation to begin with. If your historical send data is messy, your segments are outdated, or your send cadence is inconsistent, AI optimisation will surface those problems rather than solve them.

The deeper shift: from testing to learning

alt text

The real change AI brings to ecommerce email testing isn't speed, though the speed matters. It's the shift from episodic testing to continuous learning.

Traditional A/B testing is episodic. You run a test, you get a result, you apply the learning, and then you run the next test. Each cycle is separate. The learnings don't automatically feed into the next decision unless a human carries them forward.

AI optimisation treats every send as a data point in an ongoing model. A wellness brand doing $5M might send 48 campaigns in a year. Under a manual testing framework, that's maybe 12 clean tests if they're disciplined. Under an AI model, all 48 sends contribute to a continuously improving prediction of what works for which subscriber under which conditions.

For a beauty brand with a 30-day repurchase window, that compounding matters. The model gets sharper as the send history grows. The recommendations in month six are better than the recommendations in month one, not because anyone reviewed the data manually, but because the system updated itself.

That's the actual shift in how email optimisation works when AI is running the feedback loop. The question for most brands isn't whether to make the transition. The question is how well their current data and segmentation infrastructure supports it.

If you want to assess where your retention programme stands and where AI optimisation could make the biggest difference, book a free call with our team and we'll walk through your current setup.

#AI#A/B testing#email optimisation#Klaviyo#ecommerce email testing#retention marketing

Keep reading.

All posts

AI Trends & Insights

AI Agents in Klaviyo: What DTC Brands Need to Know

Shrestha Ghosal · Jun 24, 2026

AI Trends & Insights

The AI Trends Businesses Should Watch Closely

Shrestha Ghosal · Jun 17, 2026

AI Trends & Insights

The Retention Marketer's Guide to Prompt Engineering

Shrestha Ghosal · Jun 16, 2026