email-and-sms

How to Test Email Subject Lines Without Wasting Your List

Most brands skip subject line testing because they're afraid of wasting their list. Here's how to test without the waste.

Arpit Mehar

June 16, 20269 min read

You want to test subject lines. You know testing moves the needle. Open rates are everything. But you also know you only have so many people on your list. You can't split-test to every subscriber or you'll fatigue them.

So you don't test at all. You wing it. You write a subject line, send it, hope it works.

Most brands are in this position. They know testing matters. They're afraid of wasting their list. So they choose to do neither.

The solution is smart testing. Not big tests. Strategic tests. Tests that move metrics without burning people out.

This post walks through the framework.

Why subject line testing matters but brands skip it

Subject lines drive 35-50% of the variance in open rates. That's huge. Same email, different subject line, completely different results.

A wellness brand we know tested two subject lines:

Version A: "Your magnesium routine"
Version B: "Why you're tired (and how magnesium helps)"

Version A: 18% open rate. Version B: 24% open rate.

Same email. Same segment. Same send time. Different subject line. 33% lift in opens.

Scaled across their list, that's thousands of extra opens per month. Thousands of extra chances for repeat purchase.

So why don't brands test?

Because testing requires sending the same email to multiple versions of your list. Which means more sends. Which means list fatigue. Which means you need enough list size to split the test without burning people out.

Most brands think: "If I test, I'm wasting emails on people who could've gotten my best subject line." So they guess. They skip testing. They leave money on the table.

The real answer is smarter testing. Not bigger tests. Strategic tests that move metrics without the waste.

The constraint: why you can't test on your whole list

Here's the challenge: email fatigue is real.

If you send someone five emails a week, they unsubscribe. If you send them two, they stay engaged. There's a frequency sweet spot.

Every email you send costs list health. Every test you run takes list capacity.

Let's say you have a 10,000-person list. You want to test subject lines. You could split the list 50-50 (5,000 per version). You send one email. You test.

But now you've used up your weekly email quota. You can't send your replenishment flow or your promotional email. You've sacrificed sends to test.

The smarter approach: test only on the segment that needs testing. Don't test on your whole list. Test on the email you know performs best or worst. Test on the send where testing gives you the highest ROI.

The A/B test framework that works

Here's the framework:

You have one email to send. Let's say a replenishment reminder. You want to test the subject line.

Instead of sending to your whole 10,000-person list with a 50-50 split, you do this:

Segment 1 (8,000 people): Get your current best subject line (the one you've tested before). This is your control. You know this performs. You're not wasting their slot on an experiment.

Segment 2 (2,000 people): Get your new test subject line. This is your experiment. You're testing if the new one beats the control.

You send both versions. Same email. Different subject lines. Different segments.

Why this works: You're not splitting your best audience 50-50. You're using 80% on your proven winner and 20% on the experiment. If the experiment wins, great. You learned something. If it loses, you didn't waste 50% of your list.

This requires having a "current best" subject line already. Which means you need to have tested before. The first test is different (you might do 50-50). But after that, this framework is cleaner.

Control group gets your proven winner. Test group gets your experiment. This way, you're not risking half your list on an unproven subject line.

Sample sizing: how many people do you actually need

alt text

You don't need a massive list to test. You need the right size.

The question is: how many people in the test group do you need to see if the new subject line is actually better, or if the difference is just luck?

Here's the math simplified:

For an email with a baseline 20% open rate, you need about 1,000 people per test group to have 80% confidence that a 5% lift is real (not random variance).

For 10% open rate emails, you need roughly 2,500 people per group.

For 30% open rate emails (like welcome sequences), you need around 600 per group.

In general: the higher your baseline open rate, the fewer people you need to test. The lower your baseline, the more people you need.

If you have a 5,000-person list, you can't test emails with 10% open rates (you'd need 2,500 people per group, which is too many). But you can test welcome sequences or high-engagement segments.

If you have a 20,000-person list, you have room to test almost anything.

A food brand we know has a 15,000-person list. Their replenishment emails get 25% open rates. They need about 800 people per test group to call a winner. So they test with groups of 1,000. That leaves 13,000 for their control group. Easy.

The math is simple but powerful: bigger lists can test more. Smaller lists need to be strategic about what they test.

If your list is under 5,000, test on high-engagement segments only. Don't test on low-engagement emails where you'd need 2,500+ people per group.

Choosing what to test vs. what to skip

alt text

Not every email is worth testing.

Some emails have high baseline engagement. Testing here is high-value. You'll find winners fast.

Some emails have low engagement. Testing here wastes list. Even if you find a 10% improvement, the baseline is so low that the lift doesn't matter.

Test these emails:

Welcome sequences. 25-35% open rates. Testing wins big.

Post-purchase automation. 20-30% open rates. High-value testing.

Replenishment reminders. 20-40% open rates depending on category. High-value testing.

Browse abandonment. 15-25% open rates. Worth testing.

Skip testing on these:

Re-engagement/win-back emails. 8-12% open rates. Even a 10% improvement is small in absolute terms. The people testing are already at-risk. Don't test on them.

Promotional blasts. If you have 50% unsubscribe risk, don't test. Just send your best guess.

Low-engagement segments. If a segment has 5% open rates, testing is inefficient. Fix the segment first (improve relevance, send frequency). Then test.

What to test:

Subject line length. Short vs. long.

Personalization vs. generic. "Hi John" vs. no name.

Question vs. statement. "Ready to reorder?" vs. "Time to restock."

Curiosity vs. benefit. "We have a surprise" vs. "Save 20% today."

Emoji vs. no emoji. "Your order is here" vs. "Your order is here."

What NOT to test:

Misspellings. Never.

Spam trigger words. Never. This tanks sender reputation.

Brand name variations. Test one brand name consistently.

Multiple variables at once. Test one thing. Not subject line length AND personalization together.

Pick one variable per test. Change only that variable. Everything else stays the same. This is how you isolate what actually moved the needle.

The testing sequence: test one thing at a time

Here's how to build a testing calendar:

Month 1: Test subject line length for welcome emails. Short (6-8 words) vs. long (10-15 words).

Month 2: Test personalization on replenishment reminders. "Hi John, time to restock" vs. "Time to restock."

Month 3: Test questions vs. statements on post-purchase emails. "How are you loving this?" vs. "We'd love your feedback."

Month 4: Test emoji on browse abandonment. Subject with emoji vs. without.

You're building a knowledge base. Each month, you test one variable on one email. You learn what works. You apply the winner to future sends. You move on.

After a few months, you have a compound effect. You've optimized welcome sequences, replenishment, post-purchase, and browse abandonment. Your overall open rates are 15-20% higher than when you started.

This is the power of sequential testing. Not all at once. One thing at a time. Month after month.

A fashion brand we know did this for six months. They tested subject line length, personalization, questions vs. statements, emoji, curiosity gaps, and benefit statements. By month six, their average open rate was up 18%. Same size list. Same segments. Just better subject lines.

Reading the results: statistical significance explained

You sent the test. You have results. How do you know if the winner is actually better or just lucky?

Statistical significance tells you: if I did this test 100 times, would the winner come out ahead 80-90% of the time?

If yes, the winner is significant. Keep using it.

If no, you don't know if it's a real win or random variance. Keep testing.

Most email platforms (Klaviyo, Omnisend) calculate this for you. They tell you if a winner is statistically significant or not.

The threshold is usually 80% confidence. That means you have 80% certainty the winner is real. (Some platforms use 90% or 95%.)

If your results show 75% confidence, the winner isn't proven yet. Send another test.

If your results show 85% confidence, the winner is real. Use it going forward.

A wellness brand tested two subject lines on 1,200 people per group. Results:

Version A: 22% open rate
Version B: 25% open rate

Difference: 3 percentage points. But was it luck?

Klaviyo said 82% confidence. The winner is real. They started using Version B for all replenishment emails going forward.

Three months later, their replenishment emails were consistently opening 2-3% higher because of that one test.

Statistical significance = proof that your test winner is real, not luck. Usually 80%+ confidence is the threshold. If you don't hit that threshold, keep testing.

Real examples: subject lines that moved the needle

A fashion brand tested "Your winter collection has arrived" vs. "20% off new arrivals (this week only)."

First one: 19% open rate. Second one: 26% open rate.

Result: 37% lift. They switched all promotional emails to benefit-forward subject lines.

A wellness brand tested "Quick question: how's your sleep?" vs. "Your sleep is suffering. Here's why."

First one: 23% open rate. Second one: 19% open rate.

Result: Question-based won. They stopped using negative framing.

A food brand tested "Your meal kit is ready" vs. "Dinner is waiting (20 minutes to prep)."

First one: 28% open rate. Second one: 32% open rate.

Result: Time urgency won. Now all post-purchase reminders include prep time.

The pattern: benefits, questions, and urgency outperform generic or negative subject lines. But you have to test to know what works for YOUR audience.

Testing subject lines doesn't have to waste your list. It's about choosing the right emails to test, using control groups for your proven winners, and testing one variable at a time.

Start with your highest-engagement emails. Welcome sequences, post-purchase, replenishment. Test on these. You'll see wins fast. Build momentum. Then expand to other emails.

If you want help setting up a testing calendar for your email program or want guidance on what to test first, let's talk. We run email testing for dozens of D2C brands and we know which tests move metrics most. See how we've helped brands across wellness, fashion, and food and beverage improve open rates through strategic testing.

#Email Testing#Subject Lines#Email Optimization#A/B Testing#Email Strategy#List Management

Keep reading.

All posts

email-and-sms

Using AI to Personalise SMS Without Burning Your Unsubscribe Rate

Shrestha Ghosal · Jun 24, 2026

email-and-sms

Post-Purchase Email Sequence: The 5 Emails Every E-commerce Brand Needs

Shrestha Ghosal · Jun 17, 2026

email-and-sms

10 Mobile Email Mistakes Killing Your Click-Through Rate

Arpit Mehar · Jun 15, 2026