Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. My name is Bill Code and my talk is the
art and science of a b test development. What I'll be talking
about is going to be a brief introduction of who I am,
an introduction to the A B testing process, a b test developer
strategies, some developer tips, what happens post
launch and how to iterate, and some parting advice.
So so currently I am a front end optimization developer at Love Every,
which is a company that offers stage based play kits and digital products for early
learning. Prior to this I worked at several different conversion rate optimization
agencies as a developer.
Some things that I like. I am a
very amateur home cook, big fan of electronic music
and getting into student filmmaking. I love video games and
also a huge hockey fan. So that's enough about me.
Let's talk about what a b testing actually is. In its simplest
form, it's a method of showing several different variations of a web page
or application of visitors at random, and comparing which variant converts
better. We also call these test experiments.
You may also know this process as split creating. It's the same
thing. We're actually splitting traffic of a website between
the different variations that we create. And in this example here,
we have sort of one conversion of a website,
which is the control, or how the website exists in its natural
state, versus a variation. One where we've changed
that header section to a different design. And then through the use of an A
B testing platform, we will launch this experience, split the traffic,
and then measure the difference between these variations.
So with experiments, you'll need some basic roles at your organization.
The first would be your lead optimizer. This is someone who's actually
the point person for the A B testing efforts and is
managing a testing roadmap. You'll of course need a UX designer to
research the problems or opportunities to test into.
And this person is designing those new experiences for
your developer who's actually creating and deploying the code.
And lastly, you'll need a digital analyst to interpret the results and build recommendations
based on those test outcomes. So we see that
optimization experimentation, it's really a collective effort.
And after sharing more about the A B testing
process, I'll of course be explaining the intricacies of the developer role
here. So what platform do we want to a
B test with? Well, there's a lot of them, and you can see
here this long list, and honestly,
at their core they all do the same thing. They all have their pros and
cons. But for the purpose of this talk, we'll be looking
at some screenshots, optimizely because in my opinion
they probably have the best UI and are the most straightforward.
So we'll be seeing screenshots from optimize in this case. But there are plenty of
tools for a b testing out there. So with client side testing
you will want to implement your main platform snippet as high up
in the document as possible and ensure it's synchronous.
The reason for this is that you don't want your users to see a
glimpse of original content and if they're bucketed intro
a different variation, they'll see that original content change walkthrough their
variation content. And this is what's known as
Flickr and it degrades the user experience.
So it's very important that this script is implemented synchronously.
And yes this is render blocking, but we are changing
the way chat the page is rendering here with this script, so it is warranted.
But because of this, we want to make sure that this
script, whatever code we're deploying for our variations in our
AP test, we want to make sure that that code is as performant
and as small as possible. Also, a mistake that
beginners make here is that they deploy this snippet
through a tag manager which automatically makes it async.
I think really the only exception for doing that is if you are only a
b testing elements that are below the visual fold.
But in general it's not recommended. So for this talk I have
a website that has optimizedly implemented so we could actually
take a look at what's going on. So if we inspect with devtools
we could see in the head that we have our snippet right
here and then in the console. This gives us a global
optimizely object we could play around with and it shows us data
on the experiments that we're running. But let's switch back
over to the slides for a second. So on your ecommerce website,
your team may have discovered a problem or opportunity that's been validated with data
to test with a b testing. And here we have an example
website that's selling some apparel and
we have a hypothesis statement stated here.
I believe if we add a mini cart to the header we'll make it easier
for users to check out. If I'm right we'll see an increase in order conversion
rate. So we want to create an A B test where we're showing a mini
cart on this page right now. It's just a sort of static cart
icon on the top right? But what happens if we add a little mini cart.
When we hover over, we see an order summary. Will that perform better
than this current control version? So once you've come up with that hypothesis,
you'll want to formalize things into a requirements document.
This is an example here where we have experiment
one. Add mini card as the title of our A B test,
and we'll restate that hypothesis. We'll show the
problem that you're addressing. We'll list the device that the A
B test should run, where it should run, what key metrics you're
tracking. Also, every A B test has
a duration, which is the length of time the test should run until a statistical
significance is reached and the requirements, the visual requirements.
So on the left is our control, the way the page exists in
its natural state versus our variation one.
And you can see that we're coding that little mini cart on the cart icon,
hover. We'll list some dev specs here for the developer so
they know exactly what they're doing. There will be some
user QA stories here so that your QA
personnel can figure out exactly what needs to be done on the new variation
and some key metrics to track. So having this test plan in
place makes it really easy for you as a developer to understand
exactly what you need to code for this a B test. And when it
comes to the way that things execute, they generally work
the same across the platforms. But essentially the A B testing snippet
will load. It will check if the user is on
a URL that is targeted within the A B test,
and if so, it will then check if the user is in a
certain audience that you've set up for it. So something like the
browser or desktop device, if they're in that audience
segment that you've set up within the test, they'll then move on to
check if you're in the proper traffic allocation for the
experiment. Because not all experiments have to be set to 100%
of traffic. Some experiments, maybe to mitigate some risk,
can only be set to maybe 10% of traffic. And within that
10%, there will be a 50 50 split between v zero
and v one. So if you're in that test group of
10% of traffic allocation, the platform will determine which
variations you should see. And then any experiment code
that you've written, let's say it's v one, then that
v one experiment code that you've written will then execute for that user.
So that's the order of events. Now,
when it comes to dev strategy, there's a number of
ways to code these A B tests, but it always depends on the requirements,
so we'll take a look at those now. So coming up with the dev strategy
is going to depend on if you are an in house developer or
a third party developer. If you're an in house developer,
you'll have source code access, which will make your job a lot easier.
In this case, it's possible to do hide show tests
where, let's say you're creating the addition of a new component somewhere.
You can build that new component into your code base in a hidden state.
And then in your v one of optimizely just put a CSS
rule to show that new component there
is total control over deployment. So you know exactly when to
deploy your a b test code. You can sync it up with your normal site
deploys and also you have the ability to do more
redirect type tests. So if you're testing a completely different redesign
of, let's say, a product detail page, you can code that new product detail
page on a completely different URL. And then within optimizely
you just can do a redirect to that new page. If you're
working as a third party developer, you don't have access
to the source code usually. So all changes have to be made
based on what you see with the site in front of you. A lot of
times you're searching for global functionality that you can manipulate and
it's a little more risky because there's unknown context. You may
not know if the client website is going to deploy a change
that is going to remove some dependencies that your A
B test code was using and that will break your a B test.
So really there's a big difference between the two and that's going
to determine your strategy of how you actually code these A B tests.
In this talk, I'm going to focus a little bit more on the third party
dev aspect of it just because there's a lot more challenge to it.
And I think it makes for a much more interesting talk than simply
coding up an alternative version of a page and then doing a redirect test too.
So moving on now we'll take a quick look at the setup within optimizely for
the A B test. If we
go to our A B test in optimizely, we'll see that
we have a spot for variations. We're just going to have our original
and our variation one here. The targeting is
where the A B test should run. So if we're referencing our test plan
this a b test is going to run sitewide because this
little cart icon where we're adding the mini cart is available sitewide.
So we would do a substring match for fatcatitude.com.
The audience referencing our test plan again
will be desktop. So optimizely has a really easy way
to create audiences with these sort of drop downs
you just drag and drop. You could do a number of different audience solutions,
the platform, the location, there's a whole array
of these things, but in our case we just need desktop.
We'll jump now to the metrics. This is
anywhere we want to add certain goals, such as clickables
or visitors to pages. You could create custom goals
that can be done in this section. Shared code. So any code that should
run before any variation code should run. Usually this is
maybe some sort of bucketing code to Google Analytics or Adobe
site catalyst, whatever analytics platform the site is using
and the traffic allocation. So this is the portion of
visitors meeting the audience conditions that are
eligible for the experiment. So we'll keep it at 100% at
a 50 50 split. And then
there's just a number of other options. You can schedule the test to go live
at a certain point. It also gives you some API names
to make your code a little bit more dynamic. But we'll jump now to
actually the coding strategy, how to actually code this test.
So we'll go into one of these variations, our v one,
and it has a WYSIWYG editor, but we're never going to use that.
We're going to write actual code here. There's just certain things you can't do with
the WYSIWYG. But if we go into this editor here,
we see that there's some code here. And next we're going to walk through some
of that code. So technically you could write all of your code into a single
file and then copy and paste it into the optimize Lee editor.
But you'll see that it sort of becomes unwieldy the larger the experiment
is. But just as an example, it might look something like this,
where you have an iffy, so has to not pollute the global namespace.
And then you can add your CSS styles by way of string
concatenation, and then you can start adding your HTML
with functions to create that mini cart and function
to save to local storage so that that data persists across the
different pages and the event listeners. So as
you can see, this isn't a huge experiment, but the bigger ones you really want
something like a build tool so that all of these files
and concerns can be separated and can be compiled into a
single build, which I'll show next. So now we'll look at a build
tool and a defined structure for our a b
test code, and you'll see that this is a much more preferred way of doing
development. So essentially this is a webpack
config file where we could use all of the latest and greatest bundling
options we could use has we can import
HTML, we can minify, we can transpile,
we could use node packages, do all those fancy things, which is really nice.
So if we look at our entry file, which is our v one js,
we see that we can import our CSS,
attach it directly to the head this way import those functions.
It's a lot cleaner, a lot nicer than that single file build.
And if we pull up the terminal and we run the webpack
command, it'll run some processes, and when
it's ready it will spit out a single build for us.
Our v one bundle here and now we
can actually just copy and paste this directly into optimizely.
So at this point we want to test our code just before we paste intro
optimizely. And this is a single page app. So even
if we go to another page, we're still within the same context.
And this code has watchers to check if any other items
are being added. And it's checking local storage and
then creating that HTML list when we're hovering over
the cart icon. And if we go to another page, we'll add another product,
we'll hover over. We see that populate in the mini cart, which is great.
Of course, if we refresh, we won't see that mini cart on hover anymore,
we'd have to reenter our code into the console.
So what we want to do now is paste this code in the optimizely
editor so that optimizely can give us a preview link so that we
can have a persistent experience. So if we go back over
to optimizely, we'll go back to our variations,
we'll hit edit, we'll go to the code editor, and we'll
paste in our build, hit save, and apply.
And now to create a preview link, we go to API names
and we want to copy the id of the variation.
And we can use this in a special parameter.
Optimize Lee X equals chat id. And then optimizely
Lee token is going to equal public. And so
with this URL, you can essentially share the
experience with your code running to any of the stakeholders,
UX designers, QA to make
sure the experience is looking and functioning as
expected. So if we add those query
parameters, add a product to the cart, we see that
the experience is showing. And so we'll go back to the home page with
those params, hover over the cart. We still see the mini cart,
so this way we can share this experience with anyone.
Now, once your experiment preview has been qaed and approved, it's time
to set the test live. And after the test is live, the platform
will continue to collect data and randomly bucket
those users to either v zero, V one, V two, and so on.
And those users will continue to see their experience until they
clear their cookies. And this whole time the
experiment platform is collecting results. Now this is typically more of
an optimizer or analyst responsibility,
but this is what it would look like. And after the
test has run, after it's reached statistical significance.
Now what if your test lost? Well, it's not all
in vain. You've probably learned something and you've probably
saved time and money by not permanently implementing
something. Chat wasn't going to work, and oftentimes negative
test results often lead to new creating ideas that
you can continue to iterate upon. And you really do need to have that
culture of experiments at your company and that
trust and that ownership to continue doing this until you find a winner.
Now if your test won, that's great. You can temporarily set
and optimize the v one experience to 100%
of visitors. Now, you only want to do this for a short amount
of time. It's just generally not good to have what are supposed to
be these somewhat ephemeral tests running
for such a long time. You want to immediately get this into your
main core development team's roadmap so that it can
be a permanent change. So now I'd like to share a quote from a friend
and former coworker of mine. His name is Aaron Montana and I think
this beautifully captures the essence of a B test development.
Historically, software development has leaned heavily on architecture analogies.
Strong foundations build good houses, maintenance is vital,
etc. Experimentation, however, is the art of building sandcastles.
Beautiful structures, but complete facades intended to delight,
facilitate learning, and be washed away by the tide.
It's so poetic, but it's so true. I mean, what we're doing here is
we want to make quick, iterative experiments. So velocity is key,
because what we care about are those insights so we can
make decisions on things we want to productionize or
things that just don't work. So thanks Aaron for this.
So now I'll share some general developer tips for a B test development.
The first is avoid concurrent tests on the same page.
Not only will this reduce complexity, but if you have many
A B tests running on the same page, it can also be hard to attribute
results because you're not sure if the results you're seeing
are because of a combination of variations
in experiments on the same page. So in general, to reduce complexity,
just keep it to a few per page. Also,
namespace classes, ids, global variations a
b test development is a separate work stream from your
regular sites development. So anything you can do to
reduce any kind of conflict there is important also always QA
on actual devices. We know the actual devices can perform differently
than the emulator, so that's important.
Check logged in logged, but states when you're developing the code,
anything dynamic should be carefully considered and accounted
for in your code. Confirm the control state when bugs
are found. So if there's any bug on your site,
stakeholders may not know where it's coming from. And the first
thing you might want to do is opt out of your A B testing platform
and see if the bug still exists. If it is, it's not coming from
your A B test. That's usually a good first step.
Here are some mistakes that beginners make.
Know writing CSS in line as we saw
before, we have modern tooling now to create has,
and we can compile that into a single build
that we can add. And a lot of the platforms have a separate space for
CSS, so we should take advantage of that. Anything to keep our code
clean and organized. Also, here's a few tools
that I think are particularly important when it comes to single page applications
where you're sort of seeing those virtual page changes and
components are rerendering. So your code really has to happen
at those certain triggers. And mutation observers
are great for that. You can attach your mutation observer to a certain
section, certain element, watch for the subtree, and then
when there's a mutation you can run a callback intersection
observers are great for when you want to detect if an
element has entered your viewport. There is a technique for
XHR override, which you're basically overriding the prototype
of the XHR request and then putting in your custom callback
when that request is completed. And we could also use
pulling functions to check for a custom condition
and then run a custom callback.
Here are some query parameters. These are specific to
the optimizely platform, but you can log what
audience segments you're in, any warnings or errors.
We have query parameter to disable or opt out of optimizely altogether.
And as we saw before, we have a query parameter to preview experiences
in optimizely. They also have a great debugging
extension as well. This is great for your non developers to see
what's going on as well. So really a
B test. It's a communal activity and
everyone across your organization has great ideas and
different perspectives and different sources of data. So I encourage
you to have a way for those people to
submit their ideas. One way you could do this really simple, is a Google
form. Ask them if they have any ideas, and then
you can create a roadmap and prioritize them and figure out
what would work best for your organization.
Lastly, some parting advice. Definitely keep a hypothesis library,
some centralized repository where you're keeping track of
all the ideas that you want to test and the outcome,
the development effort that goes into it, and the priority.
Small iterative changes are usually best.
If you're doing a test where you're testing a completely
different product page design,
maybe you're adding certain features to the page. Well,
even if that a B test has a positive result,
you may not know what specifically caused that. So if
you test small changes, you'll know if there was a positive
or negative result. They could be directly attributable to
that one change. Having enough traffic
is important. You need a certain level
of traffic for your test to reach a statistical significance.
So keep that in mind and always be testing. This should be part of your
company's culture. I think there's a lot you can learn
from this, and it's fun too. And I think sharing the results of
your A B tests are also a great activity to share with
your coworkers. So that's the end of my talk. Here are
some places to find me. Thanks very much for listening and
I'll see you next time.