Google Website Optimizer Case Study: Daily Burn, 20%+ Improvement

118 Comments

This post will show exactly how one start-up improved their homepage conversion rate (visitor to sign-up flow) more than 20%, then 16% again, with a few simple changes and Google Website Optimizer.

Once reading this, you will know more about split-testing than 90%+ of the consultants who get paid to do it…

There are a few advanced concepts, but don’t be intimidated; just use what you can and ignore the rest.

Along with Founders Fund (Dave McClure), Garrett Camp (CEO, StumbleUpon), and others, I am an investor in Daily Burn, one of the premier diet and exercise tracking sites.

Following investing, first priorities included introducing them to Jamie Siminoff, who taught them how to purchase the domain name for DailyBurn (Jamie’s method is described here), and look at their conversion rates for the homepage and sign-up process (sign-up flow to completion of sign-up). This post will look at the former, since the latter cannot happen without the former.

The first step was simple: remove paradox of choice issues.

Below is the homepage prior to tweaking. The bottom of the screen–the “fold”–was right around the second user under the running calorie counter.


Click here for larger version.

Offering two options instead of six, for example, can increase sales 300% or more, as seen in the print advertising example of Joe Sugarman from The 4-Hour Workweek. Joe was, at one time, the highest-paid copywriter in the world, and one of his tenets was: fewer options for the consumer.

DailyBurn (DB) was two founders at that point in our conversation, so instead of suggesting time-consuming redesigns, I proposed a few cuts of HTML, temporarily eliminating as much as possible that distracted from the most valuable click: the sign-up button.

Here is the homepage after reducing from 25 above-the-fold options to 5 options and raising the media credibility indicators. Note the removal of a horizontal navigation bar. The “fold” now ends just under the “Featured On”:

The results?

Test 1 Conversion Rates: Original (24.4%), Simplified (29.6%), Observed Improvement (21.1%)
Test 2 Conversion Rates: Original (18.9%), Simplified (22.7%), Observed Improvement (19.8%)

Conclusion: Simplified design improved conversion by an average of 20.45%.

To further optimize the homepage, I then introduced them to Trevor Claiborne on the Google Website Optimizer (GWO) team, as I felt DB would make a compelling before-and-after example for the product. Trevor then introduced DB and me to David Booth at one of GWO’s top integration and testing firms, WebShare Design.

Why not just use Google Analytics?

David will address this in some detail at the end of this post, but here are the three benefits that Google Website Optimizer (GWO) offers over Google Analytics (GA):

- GWO offers integrated statistics – is new version B better by chance or better because it’s better?
- GWO splits traffic – half traffic runs to A, half of traffic runs to B (if A/B test); it also ensures, using cookies, that a returning visitor will see same the same variation
- GWO really tracks visitors – GA works on idea of a session (a person bounces around on the site for a bit and leaves, which is considered a “session”); if they return, that is generally a new session); GWO uses unique visitors (no matter how many visits, they’re counted as one visitor, assuming they don’t delete cookies). On a fundamental level, it’s the difference between visits and visitors. This is critically important for determining if your result in statistically valid, as ten people and ten visits by one person are not the same.

GA can do a lot of what GWO does, but you need to do a lot of custom work and intricate number crunching to make it work.

Enter Google Website Optimizer

The following is a report of the WebShare / Gyminee Website Optimizer landing page test, and includes a description of the test that was run as well as analysis of the test results. This report was authored by David Booth, to whom, and to whose team, DB and I owe a debt of gratitude. I’ve included my (Tim’s) notes in brackets [ ]. Don’t be concerned if some of the graphics are hard to read, as the text explains the findings.

1. Test Description

The landing page identified for this test was identified as:
http://www.gyminee.com

This A/B/C test included three distinct page versions, including the original (control) homepage as well as two variations designed with conversion marketing best practices in mind:

Original (control)

[same as simplified version above]

Variation B

Variation C

2. Test Results and Analysis

During the first run of the experiment the test saw ~7500 unique visitors and just under 2,000 conversions over the course of about 2 weeks. When the experiment was concluded, both variations B and C had outperformed the original version, and specifically Version B left little statistical doubt that it had substantially increased the likelihood that a visitor would convert, or sign up for the Gyminee service.


Larger version here.

We can see from the analysis of the data that Variation B had a large and significant effect on improving conversion rate. The winning version outperformed by the original by 12.7%, with a statistical confidence level of better than 98%. [This means there is less than a 2% likelihood that you would duplicate these results by chance, which can also be called a p-value of <0.02]

Interesting to note is that the B version, which does not have a “take a tour” button, nor horizontal navigation bar, performed a few percentage points better than their current, more polished design which does offer both.

A follow up experiment was then launched in order to provide more data and ensure that these results were repeatable. The follow up experiment was conducted as an A/B experiment between the original and Variation B, and ran for approximately 1 week, over which time almost 6,000 unique visitors and ~1,400 conversions were recorded.

The results of this follow up experiment showed that Variation B outperformed the original by 16.2%, with a statistical confidence level of better than 99%.

Further analysis concludes the following:

* The absolute difference in conversion rates between Variation B and the original during the test was 3.7%.
* During the test, Variation B’s conversion rate was 16.17% greater than that of the Original design.
* The p-value used in these calculations was <0.01, corresponding to a confidence level of >99%.

The Bottom Line: The results of this experiment were extremely successful.

Putting these test results into plain terms in another way, there is a 98% chance that the true difference between the conversion rates of these versions is between 7.8% (1.8% raw) and 24.5% (5.6% raw).

3. Supporting Analysis (A/B/C Test Only)



A Pearson Chi Square test answers the question: “Out of all the combinations, is any one combination better than another?”

The values here tell us that with >95% confidence, at least one variation was statistically better than another. This further validates the conclusions drawn by Google Website Optimizer.

Was Version C statistically better than the Original?

At an acceptable level of statistical confidence, it was not. However, had we continued to run this test for a longer time period, it is very likely that we would have eventually proven that it was indeed better than the original with >95% statistical confidence. The estimated sample size needed to prove this would have been an additional ~21,000 unique visitors (~7,000 for each variation).

The table below shows you the various sample sizes you would need at different confidence levels to show different relative improvements [Tim: this is my favorite table in this analysis]:

Was Version B statistically better than Version C?

We can be approximately 94.1% certain that Version B is also better than Version C. After applying a Bonferroni correction for the test set, we would still be >90% confident that Version B is better than Version C. The p-value for these calculations is 0.059.

Recommendations:

As Version C did test well, and we believe would have eventually proven itself better than the Original, it is very likely that certain elements of Version C resonated well with visitors to the Gyminee website.

To continue down this path of testing, we would recommend using the winning Version B as a test page for a multivariate experiment. In this experiment, we would suggest testing certain page elements from Version C in the framework of Version B.

Additionally, as testing only covered the homepage, we would highly suggest performing testing on the form found at:
https://www.dailyburn.com/signup

Many concepts such as calls to action, layout, design, contrast, point of action assurances, forms & error handling, and more could be used to increase the likelihood that a user enters information and submits the form.

Lastly, it may be beneficial to begin running tests where the conversion is measured as the paid upgrade. As this conversion rate is much lower than the free sign-up, it should be understood that all other things held equal these tests could take significantly longer to run to completion.

Google Website Optimizer vs. Google Analytics – Parting Thoughts

From David Booth, whose team performed and compiled the above:

1) GA doesn’t have any capability of doing statistical analysis to compare two groups (and it’s not meant to), but it can collect all the data you would need with the best of them. GWO records data very differently and is not meant as (and should never be used as) an analytics package. It runs the stats for you and tells you when you have a statistically significant difference between variations/combinations, but is limited to a single goal or test.

2) The real beauty is to integrate GWO with GA – this gives you the best of both worlds by letting each tool do what they were built to do. You can use GWO to create the test, split traffic, and crunch the numbers for your primary goal, and you can then pull the data out of GA on anything you have configured and run the numbers in a stats package like JMP or Minitab. A very useful case for this is an ecommerce purchase: GWO can tell you if one version / combination was more likely to get an ecommerce purchase (binary – they either purchase or they don’t), while GA data can record things like revenue, and running a different statistical analysis can tell you if one version was more likely to make you more money.

###

Related and Recommended:

Daily Burn 90-Day Fitness Challenge – Starting August 17th! Lose fat and gain muscle with better data and accountability.
How to Tim Ferriss Your Love Life

Posted on: August 12, 2009.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Comment Rules: Remember what Fonzie was like? Cool. That’s how we’re gonna be — cool. Critical is fine, but if you’re rude, we’ll delete your stuff. Please do not put your URL in the comment text and please use your PERSONAL name or initials and not your business name, as the latter comes off like spam. Have fun and thanks for adding to the conversation! (Thanks to Brian Oberkirch for the inspiration)

118 comments on “Google Website Optimizer Case Study: Daily Burn, 20%+ Improvement

  1. Great post.

    Although, two quick points.

    Correct me if I am wrong, but Website Optimizer is using Z-scores not Chi Squares? Personally, I prefer Confidence Intervals because it is easier to explain to people and for people who don’t know statistics to see what is going on. Of course, it is three ways to do the same thing–find statistically significant differences. Although, like you showed in your own Chi Square, if it is anything but 2X2, you have to do another step to find which comparison is statistically significant. So, it isn’t always the most efficient way to get results.

    The second point has to do with your comment regarding sample size and significance. I see this all the time with people talking about A/B testing and multivariate testing. While it is true that adding more samples will provide significance, it kind of defeats the purpose. Sure, if you only had a couple hundred samples, then sure, adding more people would be beneficial. But at a certain point, adding more participates simply waters down the results. Adding 21,000 respondents is just crazy. The p-value still tells you something if the result isn’t significant. Mainly, that the signal-noise ratio is weak. No result is still a result, i.e., you can’t reject the null hypothesis. Most likely, once you hit about 1,500 samples, you’ve pretty much reached a point of diminishing returns. Adding 7,000 more samples for each variation simply gives the illusion of significance when none really exists.

    Like

  2. It’s really fascinating to see the results of this split test – thanks for sharing. I’m intrigued that since the test the homepage has been switched to a design that performs less well than version B – does it generate less sales but more revenue? Or is it a branding decision that made a negligible impact on the bottom line so was ultimately worth pursuing?

    Like

  3. Indeed some very valid points – I am going to forward this to my SEO guy – (he hates it when I do that) – but education is everything when it comes to online marketing.

    Like

  4. As a psychology researcher I can say without question that the conclusions drawn from this testing are junk. To conclude that, in your words,

    “had we continued to run this test for a longer time period, it is very likely that we would have eventually proven that it was indeed better than the original with >95% statistical confidence.”

    is lazy and unscientific. It’s also unethical of you to present your findings in the way that you do when they don’t prove what you claim. It’s pretty obvious that any measured effects are likely random, or that you have confounds at work. The fact that one could easily remove cookies and re-participate or see different variations should also call into question such things as,

    -Who were the participants?
    -How were they recruited?
    -How much information about the study did they have prior to participating?

    It’s entirely possible that there are other variations of sign up pages which would yield to significant results but you’ll never find them conducting your “research” this way.

    You can drop statistical terms all you want, the fact is the design of this and much of A/B testing is broken from a methodological standpoint. No amount of futzing with the numbers is going to change that. Futzing with the numbers will only mislead your readers.

    Like

  5. @edward Do some psychological research on why your a dick?

    The stats seem legit + the methodology behind it.

    Thanks for the article!

    Like

  6. Great article on split-testing Tim, very informative. I completely agree that it is vital to integrate Google Website Optimizer and Google Analytics as this will give you the complete picture of the results you receive and is also useful for long term analysis.

    Like