Split Testing: Interpreting An Example

We talked about split testing a while back. However, I didn’t have a sample split test to refer you to at the time.

So, I went back and found one. Let’s take a look at a split test, what was varied, and what we might infer from our results.

The following stats are from a message that we sent out to our AWeber Test Drive subscribers to inform them about a new article on our website. Open percentages for each appear in the right-hand column.


The complete subjects were:

  • Learn How to Get More Customers from Free Downloads
  • {!firstname_fix} Learns How to Convert Free Downloads to Customers
  • Converting More Free Downloads to Paid Customers
  • Conversion Secrets for Free Downloads to Paid Customers

By looking at the open rate statistics, we see that the message with subject Conversion Secrets for Free Downloads to Paid Customers garnered the best open rate at 20.6%.

So What Do We Learn From This?

First of all, all four messages were sent at the same time, so differences in send date and time did not contribute to the difference in open rates. Also, the content of the messages is identical, so any effects due to content filtering would be based on the subject only, which is what we’re testing.

The use of the word “Secrets” may have contributed to a greater open rate by implying that the information in the message is not widely known, and is valuable due to that scarcity.

I attribute the success of the next-best subject to personalization.

Including the recipient’s first name didn’t get us as high an open rate as using the word “Secrets,” but it did get a better open rate than not using “Secrets” nor personalization.

A future message might use a subject that included personalization and a psychological trigger such as the word “Secrets” to maximize open rates.


  1. Steve Seltzer

    11/8/2006 2:53 pm

    Be careful when split testing that your sample size is valid.

    As a real eye-opener, try split testing identical pages. The results will NOT be evenly distributed.

    This means that you want to see an ever-increasing difference between your test pages and that the sample size be sufficiently large (I use at least 250 hits for at least one of the pages).

    I’m not sure how much confidence I would give to Justin’s example as the percentage differences between the four groups are so small. The critical question is: How large was the sample size?

  2. Justin Premick

    11/8/2006 3:03 pm


    You’re absolutely correct in stating that a sample size must be sufficiently large before we start testing and drawing conclusions. After all, if you send a message to 10 people, and 2 of them open it, that’s a 20% open rate, but it’s still only 2 opens.

    The sample size in this test is statistically significant. While I can’t disclose precise numbers, I can tell you that each message in the split test above was sent to and opened by thousands of subscribers.

  3. Robert

    11/13/2006 5:28 am

    Justin !

    you are in the business of selling, babe !

    Try to become a direct marketing genius !

    How to Get More Customers from Free Downloads
    {!firstname_fix} How to Convert Free Downloads to Customers
    Here the people just think (old shit -we use it always, nothing new to us)while mostly they dont do.

    How to Convert more Prospects to Paid Customers by simply doing one easy thing all people can do, but most dont
    (yes you got it ! by using downloads)

    Conversion Secrets to get more Paid Customers you can possibly handle
    (Does this sounds great ?

    Read all dankennedystuff you can get.

  4. Steve Shepherd

    11/14/2006 6:26 pm


    After looking at a pretty cool php split testing product that uses taguchi the developer of the product talked about how to measure whether the difference was statistically significant.

    He worked on a square root of the sample size being the value that determined whether the difference was significant.

    ie The square root of 100 is 10 which is a 10% difference.
    however at 2400 samples you only need to get a 2% difference to be signficant.

    The bigger the sample the easier it is to get a significant result.
    Hope this helps.

    Steve Shepherd
    founder of theexclusive.info website
    PS. I use aweber all the time and it is FANTASTIC!!

  5. Justin Premick

    11/15/2006 11:24 am


    Thanks for that.

    I can’t really comment on how accurate that method is compared to using standard deviations, but it does seem a lot quicker/easier to use than standard deviations, so if you’ve found it to be accurate, I say run with it!

  6. Sten

    11/16/2006 12:41 pm

    I’ve created a small script that computes a value so you can know whether the difference is significant or not in an A/B split test like you’re talking about.

    It is free and available here:

    (Explanation on that page.)


  7. Mo

    1/18/2007 7:58 pm

    I saw this article titled "16 Tests (and Results) to Improve Email Response Rates" which also mentions use of testing to find best format. Shared results are also interesting to check http://www.marketingsherpa.com/sample.cfm?ident=29840

  8. Gordon

    6/9/2007 11:11 am

    Interesting article and I like the idea of testing …. testing …. testing ….

    But I have to say with my cynics hat on, I’m not all that impressed by the different opening percentages. They’re all remarkably close to each other and I’m pretty certain they’re not different enough to get anywhere near statistical significance.

    I’m not knocking the idea – just doubting the stats before everyone goes crazy including the word Secrets in all their postings