Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              Hi everyone. First of all, I would like to give a big shout
            
            
            
              out to all the organizers of Conf 42. Thanks a
            
            
            
              lot for this smooth conduct. It has been a very fantastic,
            
            
            
              fantastic experience for me. Amazing job, guys.
            
            
            
              Hi, thanks for joining me. In this talk,
            
            
            
              I'll be talking about experimentation platform at scale.
            
            
            
              I'm Hrithik Shwastav. I work as a lead software engineer at
            
            
            
              Kojek in data engineering team, currently in a stream called experimentation.
            
            
            
              In this talk, I'll be walking you through my experience
            
            
            
              of building a b testing platform at Gojek Tech. Challenges which
            
            
            
              we faced, right. And the changes in architecture which we did to cater
            
            
            
              the new throughput. The new scale, which is right now
            
            
            
              is around 2.1 or 2.2 million
            
            
            
              rpm. Let's proceed. Yeah, so this
            
            
            
              is the title of my talk, a B testing platform at scale.
            
            
            
              Let's move to the next slide. Yeah, so the
            
            
            
              agenda of this talk will be like. I'll be explaining about
            
            
            
              what a b testing is, and then we will
            
            
            
              talk about the objective, what the audience of this
            
            
            
              talk can learn from this. Then I'll be explaining about the litmus,
            
            
            
              the name of a b testing platform at Gojek Tech.
            
            
            
              Give a light on the old architecture from where we started
            
            
            
              building the platform. Then we will talk about the
            
            
            
              current scale or the scale, how the number of
            
            
            
              experiments, number of clients, the adoption experimentation
            
            
            
              has increased in kosher. And then we will talk about the new architecture,
            
            
            
              how we have made changes in the new architecture to cater the
            
            
            
              new scale. So what is a b testing? A B
            
            
            
              testing is basically a randomized experimentation process
            
            
            
              wherein two or more versions of features are shown to
            
            
            
              different segment of users at the same time. After running the
            
            
            
              experiment for some days or weeks, we analyze
            
            
            
              the data. And then this data helps us determine
            
            
            
              which version leaves the maximum impact and drive the business
            
            
            
              metrics. Let's for an example,
            
            
            
              understand this concept with an example. The allocation
            
            
            
              algorithm, right? The allocation algorithm is
            
            
            
              used for allocating drivers to customers. Suppose you
            
            
            
              have an existing algorithm which is allocating drivers to customers.
            
            
            
              Now you have a hypothesis that by changing few of the
            
            
            
              parameters in your allocation algorithm,
            
            
            
              you will be able to allocate your driver efficiently
            
            
            
              to your customers, which will impact your business metric, which is
            
            
            
              booking conversion metric. And this metric will
            
            
            
              increase by making this change in the algorithm. So what you will
            
            
            
              do, instead of directly rolling this out algorithm to production,
            
            
            
              which can be very critical and
            
            
            
              dangerous, because you do not know whether how this will perform in
            
            
            
              production. Right? So you will be running an experiment with the existing algorithm
            
            
            
              and the new algorithm which you are proposing, you will run this experiment for
            
            
            
              a few weeks and collect the data, and then you will analyze the data and
            
            
            
              see for which variation the booking conversion rate has
            
            
            
              increased. If it is a new variation, then you will be rolling that out
            
            
            
              to all the users in a staggered
            
            
            
              manner. If it is not, then you are sure that whatever
            
            
            
              you thought was not right and the current algorithm is performing better
            
            
            
              than the suggestion the new algorithm.
            
            
            
              Right? So let's move to the next slide.
            
            
            
              Yeah, so these days a
            
            
            
              lot of companies are investing in experimentation. Not only
            
            
            
              big giants, even small startups are also
            
            
            
              doing experimentation on daily basis. So whatever
            
            
            
              apps which you are using right now, these days,
            
            
            
              right, it must be running at least few
            
            
            
              hundreds of experiments on each app.
            
            
            
              Because this makes sense as well, right? Because it is
            
            
            
              the data which is leading you to make the decision, right? Not just that
            
            
            
              your manager or some high paying people are coming and saying that
            
            
            
              this is how we should do it and this will perform better and we are
            
            
            
              rolling this out to all the users. This decision is
            
            
            
              backend by data, right? So I'll quote one of
            
            
            
              the quote by Jeff Bezos. What he said
            
            
            
              is that if you double the number of experiments per year, you are going
            
            
            
              to double your inventiveness. He also
            
            
            
              knew that how critical experimentation is for
            
            
            
              a particular form. So Amazon
            
            
            
              also must be running pretty huge number of
            
            
            
              experiments for its different,
            
            
            
              different services.
            
            
            
              Let's move to the next slide. Yeah, Litmus. What is
            
            
            
              Litmus? So Litmus is basically the name
            
            
            
              of experimentation platform at Gojek
            
            
            
              Tech. Actually has two
            
            
            
              core features. One is a b testing, which we call experiment,
            
            
            
              where you come and define your experiment with
            
            
            
              two or more variants and you can do the
            
            
            
              A B testing. Second one is a staggered rollout, right? Once you
            
            
            
              have the result of your experiment, you can promote
            
            
            
              your winning variant to a release, like rolling that
            
            
            
              out to all the users in a staggered way. So that is
            
            
            
              called release. These are the two
            
            
            
              features currently supported by Litmus. The third one is that
            
            
            
              it also supports the targeting rules. I am purposefully
            
            
            
              mentioning this here because if you see here, right,
            
            
            
              we have a kind of rules like app version newer than 1.23.12.
            
            
            
              Sorry. So all the users who will have
            
            
            
              the app version newer than this will be part of this experiments.
            
            
            
              So these are some rules which we support.
            
            
            
              So I'm mentioning this because for each experiments
            
            
            
              we need to pass the rules for the experiments, right? Which makes
            
            
            
              this application as a cpu intensive. I want to highlight this
            
            
            
              word because it will be used in
            
            
            
              the system design. Also, it will help
            
            
            
              us design the system better because we know the expectation from this applications,
            
            
            
              right? So these are the key
            
            
            
              features supported by Litmus, which is the A
            
            
            
              B testing platform of Kojek. So what is
            
            
            
              the objective of this talk? Right. So I'll
            
            
            
              be walking you through my experience of building the
            
            
            
              litmus from scratch, to also
            
            
            
              the changes which we made to the system to cater the
            
            
            
              hydropole. So what are the non functional
            
            
            
              requirements which we will need to build the system?
            
            
            
              So these are the core non functional requirements. First is the low latency.
            
            
            
              Low latency is required because
            
            
            
              if you see, right, experimentation is a new
            
            
            
              call in your flow, right? Suppose if you are running an
            
            
            
              algorithm for driver allocation, right now you have only one algorithm,
            
            
            
              so you know which algorithm to use. If you are running
            
            
            
              an experiment, you added one more branching
            
            
            
              there, right. You will talk to the experimentation service, get to know which
            
            
            
              algorithm to use and then you will use that algorithm.
            
            
            
              So this low latency is very critical here because it is
            
            
            
              going to impact the overall flow, right. Second one
            
            
            
              is the highly available, which I think is given for all the systems.
            
            
            
              The system has to be highly available because it
            
            
            
              is being used in lot of flows,
            
            
            
              a high throughput. A high throughput in the sense, because in
            
            
            
              Gojek pretty much every services are using this
            
            
            
              platform for running experiments. So we are expected to get the
            
            
            
              high throughput, weak consistency.
            
            
            
              Basically the data should be
            
            
            
              eventually consistent. Like if anyone making some changes
            
            
            
              in the experiments, it should eventually be reflected on the client side.
            
            
            
              Yeah. These are the objectives. Let's go to
            
            
            
              the architecture, how we started.
            
            
            
              So let's look into the initial architecture, how the idea
            
            
            
              of litmus started, how we build the
            
            
            
              version one of Litmus.
            
            
            
              So we have this Litmus server, right? So Litmus server is basically
            
            
            
              a b testing platform. The application which have all the
            
            
            
              functional requirements like supporting experiments,
            
            
            
              releases rules, et cater,
            
            
            
              which is backed by postgres. We have a primary database
            
            
            
              postgres. Right. So how the initial
            
            
            
              flow was, we had a portal actually. So the
            
            
            
              experimenters can come to portal, can create experiments,
            
            
            
              update their created experiments. Basically can do any kind
            
            
            
              of crud on their experiments. So how the flow was.
            
            
            
              Whenever they are making any changes on the portal, portal will talk to the litmus
            
            
            
              server and litmus server will reflect that changes to the DB.
            
            
            
              And we have clients. Clients are basically the,
            
            
            
              you can say microservices or mobile clients, any clients which
            
            
            
              needs to talk to experimentation platform to decide which variants
            
            
            
              to use, which variation of an experiment to use.
            
            
            
              So how does this flow work?
            
            
            
              Clients talks to the Litmus server and litmus
            
            
            
              server talks to the DB, fetches all the valid experiments
            
            
            
              and return those to clients. So this is the initial
            
            
            
              architecture. You must
            
            
            
              be thinking that this is pretty knife and there are a
            
            
            
              couple of limitations here. Yeah, and we will be talking about
            
            
            
              those limitations in the next slide. Let's go to next
            
            
            
              slide. Yeah, so limitations. First one is the
            
            
            
              single cluster handling the request of read and write.
            
            
            
              Right? So what is the problem here?
            
            
            
              So the problem is the read throughput and
            
            
            
              write throughput are very different for litmus.
            
            
            
              The read throughput is around six to seven x larger
            
            
            
              than the write throughput. Right. So the tuning,
            
            
            
              the scaling required for the read will be different
            
            
            
              from the write. Right. So if we are using a single cluster,
            
            
            
              and suppose read is using
            
            
            
              a lot of resources, it might affect the write.
            
            
            
              Suppose if there is some outage happened on that cluster,
            
            
            
              read and write, both will be affected by those. So this is
            
            
            
              one of the potential problem in this
            
            
            
              whole architecture. The second one is primary database
            
            
            
              being used for read and write. So we
            
            
            
              have a primary and recovery
            
            
            
              database. Primary is basically which
            
            
            
              will handle all the writes and read ports. Other replicas
            
            
            
              are basically where you can read the data. If we had the
            
            
            
              read clients directly read from the
            
            
            
              application server through directly read through
            
            
            
              replica database via litmus application,
            
            
            
              it will be better in the performance that even if the
            
            
            
              primary database is down, they can read through the
            
            
            
              replica database, which is like reading the data is a critical flow
            
            
            
              here because a lot of the microservices will be talking to
            
            
            
              the experimentation platform at
            
            
            
              API level, right? So even if the primary database is
            
            
            
              not performing well or running some kind of maintenance,
            
            
            
              they can read from the recovery
            
            
            
              database. The third one is a single deployment.
            
            
            
              Single deployment as in like if you are making any changes
            
            
            
              related to write APIs, the deployment will
            
            
            
              go for both the read and write. Right. If you are making any changes
            
            
            
              related to read APIs,
            
            
            
              like any optimization you have done, you will have to make the deployment.
            
            
            
              And that deployment time deployment will affect both read and write,
            
            
            
              right? So the separation for
            
            
            
              read and write is not here, which we will talk about in the next slide.
            
            
            
              So what we did next, we separated the read and write
            
            
            
              cluster to scale it
            
            
            
              better because the throughput difference was quite
            
            
            
              large. So let's go to the architecture.
            
            
            
              We have Litmus write servers,
            
            
            
              we have litmus read servers,
            
            
            
              we have primary database, and we have replica,
            
            
            
              which will be replicating all
            
            
            
              the data from primary on
            
            
            
              its site. So how the portal will
            
            
            
              be talking right now, portal will talk to the write servers.
            
            
            
              And similar to the previous fashion, the write
            
            
            
              server will make those changes in the DB and
            
            
            
              all the clients now will be talking to read servers and
            
            
            
              read servers will be talking, fetching the data from replica and returning
            
            
            
              them to claims. This is how the system will look like
            
            
            
              once we segregate the read and write clusters.
            
            
            
              Let's go to the pros and cons of this architecture.
            
            
            
              Pros read and write cluster can
            
            
            
              scale independently, which is correct, right. If your
            
            
            
              read throughput is increasing insanely, you can just
            
            
            
              scale your read cluster. And even
            
            
            
              if your read cluster is for
            
            
            
              some time affecting the response time,
            
            
            
              right. Because of the high throughput, your write cluster or
            
            
            
              your portal will not face any kind of issue unless
            
            
            
              the issue is on the database side.
            
            
            
              Decoupled deployment the deployments will be faster,
            
            
            
              right? Because the read changes and the write changes
            
            
            
              are independent raw, right. The deployment on read is not affecting
            
            
            
              the write and
            
            
            
              vice versa. Because anytime on a high
            
            
            
              throughput service, if you are doing a deployment on day, right, so it will affect
            
            
            
              all the clients that time. So if we have a read and write client separately,
            
            
            
              we will be able to deploy the write cluster at any time
            
            
            
              because it is a low throughput service.
            
            
            
              Third one is the master slave setup,
            
            
            
              basically primary or replica setup.
            
            
            
              Here the write
            
            
            
              cluster is talking to the primary
            
            
            
              database and the read cluster is talking to the Replica database.
            
            
            
              So you can tune the replica database to increase the number
            
            
            
              of connections for your applications.
            
            
            
              And there won't be much load on the master
            
            
            
              because it will only be serving the right
            
            
            
              throughput. So there are some cons as well,
            
            
            
              which I think we will understand later in the
            
            
            
              talk. The cons are all the clients are sharing the resources,
            
            
            
              right? Means there are a lot of clients
            
            
            
              which are talking to experimentation platform. They all have
            
            
            
              a common resource. Like the read cluster has some amount of
            
            
            
              cpu, right? That cpu is being shared by all the clients requests,
            
            
            
              right? It is a problem. I'll explain this in the
            
            
            
              later section. Not horizontally scalable
            
            
            
              is the second con or the limitation. I would say it
            
            
            
              is because the underlying
            
            
            
              database is a postgres, right? You can
            
            
            
              have certain amount of connections there, right. If you
            
            
            
              are going to do the horizontal scaling, each server will
            
            
            
              be making a connection to database, right? So you need to keep in mind that
            
            
            
              as well, it's not easily horizontal scalable.
            
            
            
              Whenever you are reaching the max connection configured
            
            
            
              on the DB side, your server will be crashing, right? Because it will not be
            
            
            
              able to get the ideal connections with the DB. So this
            
            
            
              is one of the limitations in the real write segregation
            
            
            
              crash. Yeah. So let's
            
            
            
              talk about scale over time, which is important to
            
            
            
              understand the changes in the architecture needs to be done to
            
            
            
              support the new scale. Right? So let's see how
            
            
            
              the number of clients, number of experiments or the number of religious
            
            
            
              has changed over the course of time. Let's see the
            
            
            
              number of clients over time. So from
            
            
            
              2019 to 2022, I would say
            
            
            
              see the number of clients, how it has increased, right.
            
            
            
              Every quarter it is increasing somewhere it
            
            
            
              is increasing slow and somewhere it has increased by
            
            
            
              a big margin. But the trend is increasing, right. It is not anytime at
            
            
            
              constant trend. So this is one
            
            
            
              of the major part in the scale that the
            
            
            
              different clients will be talking to your systems now,
            
            
            
              right? And each client can have different requirements.
            
            
            
              Some clients can run hundred of experiments, some will be running two or
            
            
            
              three experiments, right? So there
            
            
            
              is a diversity in the clients as well. We'll talk about
            
            
            
              that later. Let's see the number of active experiment at a time
            
            
            
              over a course of period from 2019 to 2022,
            
            
            
              right? If you see, we started with 178, we reached to
            
            
            
              739, and now in 2020, and we
            
            
            
              were about 1.8k. These are the active
            
            
            
              experiments. Like running experiments, we have excluded
            
            
            
              the completed or stopped experiment. Stopped experiments.
            
            
            
              So this actually is the distribution
            
            
            
              of clients experiments. The number of experiments
            
            
            
              running for a particular client, you see it's pretty uneven,
            
            
            
              right? So the expectation from
            
            
            
              the clients is also very different. Right. Few clients will be
            
            
            
              running hundreds of experiments. They are okay with the
            
            
            
              large response time. Few clients are running only two and three experiments,
            
            
            
              but they are not okay with the large latencies.
            
            
            
              Right. Because the response time is proportional
            
            
            
              to the number of experiments. Right. Because we have the rule configured
            
            
            
              for the experiments, and we have to parse a rule for each experiment,
            
            
            
              right? So somehow this response
            
            
            
              time is proportional to the number
            
            
            
              of experiment running for a client, right? So here, if you see
            
            
            
              it is not only the overall skills, it is
            
            
            
              also the diversity in the client. Right. What is the expectation
            
            
            
              from each client if you are sharing the resources? Right.
            
            
            
              Suppose some clients is okay with large latency.
            
            
            
              Suppose 50 ms as 99 percentile.
            
            
            
              But some clients are not okay with it because they're only running one or two
            
            
            
              experiments. Why would they want their 99 percentile to be this high?
            
            
            
              Right. So this was one of the other concern.
            
            
            
              Let's look the similar graph for releases.
            
            
            
              This is the trend for the active releases over course of time.
            
            
            
              And this is the distribution of number of releases
            
            
            
              per client. This is similar to what we saw in the experiment. And the
            
            
            
              problem is also the similar. Right? And if you see
            
            
            
              the numbers, numbers are also similar. We had around 1.8k
            
            
            
              experiments at the end of 2022,
            
            
            
              built in releases, we had 1.6k.
            
            
            
              Now, you see, from 2022 to 2023,
            
            
            
              we have already contributed, like increased the
            
            
            
              active releases by two k. So you see the
            
            
            
              scale, right? How is it increasing?
            
            
            
              So let's see what changes we made in the architecture to
            
            
            
              cater this release, right? Cater this scale.
            
            
            
              Sorry. So let's look
            
            
            
              at some of the points here that every year Litmus is receiving a
            
            
            
              new scale, right? At the end of 2019,
            
            
            
              the max throughput was coming around 100k rpm.
            
            
            
              These all numbers are rpm. I forgot to mention in the slide,
            
            
            
              at the end of 2020, it was around 500k
            
            
            
              rpm. At the end of 2021, it was around 1.1 million rpm.
            
            
            
              At the end of 2022, it reached to 2 million rpm.
            
            
            
              So the last two architectures we saw it,
            
            
            
              the very basic and naive architecture with a single server
            
            
            
              and the second one with a segregated read and right cluster,
            
            
            
              were able to handle around 500k throughput within the
            
            
            
              committed SlA. Beyond that, it was like
            
            
            
              the responses were frustrated and we
            
            
            
              were seeing some errors as well because of the
            
            
            
              resource scarcity, right?
            
            
            
              So, looking at the adoption and the growth of experimentation
            
            
            
              per year, which we saw in the scale slide, right.
            
            
            
              How the number of experiments and releases are increasing over a course of
            
            
            
              time, we adopted a sidecar pattern to optimize the
            
            
            
              latency, right? Why sidecar pattern?
            
            
            
              Sidecar pattern, because by doing sidecar, if you are running
            
            
            
              the sidecars on the client server,
            
            
            
              you are distributing the resources, right? So resources are distributed.
            
            
            
              So if some clients want to run thousand experiments,
            
            
            
              right? Say thousand is a random number,
            
            
            
              say 100 experiments want to run, right?
            
            
            
              So they will be configuring that
            
            
            
              much of resources to the sidecar, right? If some clients are running only
            
            
            
              one or two experiments, why would they bother to give good resources?
            
            
            
              They will just be giving one code to that sidecar,
            
            
            
              right? Or less, for running the sidecar.
            
            
            
              So this came to our mind because of the diversity
            
            
            
              in the clients, right? We thought that let's make it distributed,
            
            
            
              let's make the resources distributed and
            
            
            
              build the platform so that it can handle the
            
            
            
              scale as well as the diversity in the scale.
            
            
            
              Let's look at the architecture diagram. Now. How does it look like
            
            
            
              we have a. Right servers, right. We have the primary database.
            
            
            
              So portal flow will be similar to what it was, right.
            
            
            
              All the portal requests will go to the right server, and write server will be
            
            
            
              reflecting those changes in the primary database. Right.
            
            
            
              Now, how the clients will be connecting. So these yellow
            
            
            
              boxes are client servers and these are
            
            
            
              the client processes in gray, right? So we
            
            
            
              have deployed the sidecar processes along
            
            
            
              with the client main process, right? These sidecars
            
            
            
              will be syncing the data from right servers
            
            
            
              periodically, let's say 5 seconds every five,
            
            
            
              which is a configurable value. Every, let's say t seconds
            
            
            
              they will fetch all the experiments for that particular client
            
            
            
              and they will store it in patcher tv. Sidecar uses Patcher
            
            
            
              tv. So periodically they are syncing the
            
            
            
              data from write server and write server has
            
            
            
              an API exposed which is basically fetch all the
            
            
            
              data and will return it to sidecar processes.
            
            
            
              So every t second they will
            
            
            
              be fetching the data from write server and store that in the badger Db.
            
            
            
              So whenever a client make a request to sitecast, Sidecar will read
            
            
            
              the data from the badger DB and do all the rule parsing and everything
            
            
            
              and return the valid experiments to client.
            
            
            
              So if you see this right here,
            
            
            
              all the computations are distributed over
            
            
            
              each servers, right? Rather than doing all the computations
            
            
            
              on a particular set of servers, it has been distributed.
            
            
            
              So by this architecture it can scale to
            
            
            
              any value, right? Because here we are not making
            
            
            
              any direct connection with the database, we are syncing the data via API.
            
            
            
              The connections are maintained by the right server with the database and
            
            
            
              also the client's
            
            
            
              response time of.
            
            
            
              Sorry, the response time of one of the client is not getting affected by
            
            
            
              others, right. Let's see the key
            
            
            
              points for the sidecar pattern. So given Sidecar is distributed,
            
            
            
              it is designed with the availability and consistency. So using this
            
            
            
              pace theorem,
            
            
            
              basically if we have a partition,
            
            
            
              we will have to choose between availability and
            
            
            
              consistency. Else we have to choose between latency and
            
            
            
              consistency. So sidecar has been
            
            
            
              built with choosing the availability
            
            
            
              over consistency in case of network partition and consistency
            
            
            
              over latency if
            
            
            
              there is no network partition, right? So we
            
            
            
              decided to go ahead with this. We had a survey with our
            
            
            
              clients and we decided to go ahead with this. We do not have right now
            
            
            
              the options to configure the sidecar to support
            
            
            
              both consistency
            
            
            
              and latency with a configurable value. Right now
            
            
            
              sidecar just support availability in case of network partition,
            
            
            
              else consistency second one is horizontally
            
            
            
              scalable, which I already talked about, right? Since it is not making
            
            
            
              direct connection to the database, right? It is horizontally scalable.
            
            
            
              Like you can literally spun up like hundreds of thousands of servers which
            
            
            
              will talk to the litmus servers periodically, right?
            
            
            
              Because the throughput on the sidecar is not directly proportional to the
            
            
            
              throughput on the right application.
            
            
            
              Because if you see here, the call
            
            
            
              between sidecar process and write servers is happening periodically every
            
            
            
              t second. So irrespective of this throughput.
            
            
            
              Right. The client process to sidecar process.
            
            
            
              This throughput is not getting affected. So it is horizontally
            
            
            
              scalable. So the third part is resources are distributed,
            
            
            
              which I already explained. Right. The response time of a client will not be affected
            
            
            
              by the experiments of other clients. Because the resources are distributed and
            
            
            
              their clients will be running the sidecar on their own resources.
            
            
            
              And also we have a central monitoring for all the sidescar on the new relic
            
            
            
              and Grafana. Because as an experimentation
            
            
            
              platform, if we are providing a sidecar, we need to check the health of all
            
            
            
              the sidescars as well. Right. If someone is complaining that my sidecar is
            
            
            
              not working, you need to have the observability.
            
            
            
              Right? What is the issue? Is it working fine? Since when it
            
            
            
              is not working? So we have monitoring as well for all
            
            
            
              those sidecars. Yeah. So this brings
            
            
            
              us to the last section of the talk.
            
            
            
              Basically how the throughput and response times
            
            
            
              look like right now. This is one of the screenshot I have took
            
            
            
              while making this slide at a particular time.
            
            
            
              So if you see this is. This is reaching up to 1.61
            
            
            
              million rpm, the throughput and if you see the 99
            
            
            
              percentile, right. It is pretty efficient.
            
            
            
              9.52 ms.
            
            
            
              Right. You see the scale and the response time. This is the
            
            
            
              most optimized thing which we were targeting for.
            
            
            
              And using this sidecar pattern, we were able to deliver
            
            
            
              this, right. And make the system
            
            
            
              more available and
            
            
            
              more diversified according to support
            
            
            
              the requirement of different clients. So yeah,
            
            
            
              this is the overall learning for me while
            
            
            
              building the experimentation platform.
            
            
            
              Yes. So currently we are
            
            
            
              focusing mostly on the building analytics systems.
            
            
            
              But I think that is not the part of this talk that is out
            
            
            
              of scope of this talk. So yeah,
            
            
            
              that's all from my side. Thanks a lot.
            
            
            
              Please reach out to me on Twitter or LinkedIn for any questions you have.
            
            
            
              I have written a few blogs as well on this.
            
            
            
              If you are interested, you can go on media man. Thanks a lot. Have a
            
            
            
              nice day,