Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello. Good day everyone. My name is Tejendra Bhandari and today I'm going
to present the business impacts of multidomain use chaos engineering
use cases. I hope you are all doing good and welcome
to Con 42. This is my second presentation for Con 42
and I'm happy and glad to meet you all virtually.
So let's start and dig into the business impacts
which helps to gauge presalespeople, the architects
and the people who are willing to present the chaos
engineering towards organizations which are new to implementation and
basically finding it difficult for how to gauge
the impacts and how to basically redeem the benefits which
have been applied by chaos engineering experiments.
So let's start with the agenda. So today I'm
going to present an introduction about chaos engineering
and myself. Then we'll focus upon the session details
and then last question and answers. You can post the questions
and answers on the Slack channel and I'll be happy to take up
them. So let's start. So basically
when we say business, right, how we are going to gauge the
impact which chaos been created by chaos experiments and
how you can get the outputs in the business form for
any organization which are willing to invest in it, but they want to actually
see what is the actual safe or
basically what is the actual revenue which has been come from
the chaos tools which has been implemented.
So basically, to start with, there are lot
of impactful cases which can be across domains,
can be applied and you can gain insights from multiple experiments
and also from a single experiment in multiple domains.
So create first of all a strategy where you find a
lot of similar AWS or any platform
pool basically. So let's say for example,
you have used AWS as in chaos experiment
platform and you have gate insights out of this AWS.
You can start multiplying them within the
forte of AWS and can use these similar platforms
as a single bucket and can create a strategy
out of it and then can portray these are the
strategically good areas where you can actually implement chaos
engineering experiments. Now to drive the experiments,
right? We are always in a jiffy that
how you are driving the experiments? How is the business case being created?
So you have to drive the experiments based on the business requirements. And I'm
sure a lot of businesses do not know their requirements, but you have to start
from somewhere. So we have to understand what are the business requirements which they are
willing to. Either it's a resiliency, either they're volume
drawing the outages, what is the impact they want actually out
of the chaos engineering domain,
then working with teams that have a full
picture multiple times. You do not get a team which
has full organization picture, but we have to dig
out the team which actually work with lot of other teams
and can serve as a single platform owners for them
and gain the insights and interact with them and then learn how
technically they are facing the problems and what are the challenge areas which they are
actually facing it. This happens in all the domains
as per my understanding. I have worked in media, I have worked in medical,
I have worked in delivery,
I have worked in multiple domains within technology
for hours implementation and I found it very
resourceful to interact with people who are actually
engaged with infrastructure, actually engaged with
writing of the services across the organization or multiple
service lines. So it is helpful then you get
started with the most painful area. If they are able to give you the most
painful area, I'm sure they would be because there are a lot of painful areas
which are actually impacting the organization and the run of
the chaos, or probably they are failing lot
of cases internally. But there will be surely
a lot of sres where you can actually sort out your chaos experiments which
you have designed or you want to design. So that would be very
helpful to start with. Then a lot of time we create lot
of use cases to portray the challenges and the outcomes
of it. So you start converting those cases into
a common pool. And then where you do not have the business to interact
with or you do not have any information on the business or the
platform or the infrastructure, you can start playing around in an
environment where the customer or the organization wants to you
to start with. You start using the common and the most easiest
experiments to gain out the insights of the infrastructure or
the network where you want to hit the experiments from.
So it would be helpful in multiple domains then
artifacts. As I say, it's the most important thing
to create awareness and to create lot
of traction towards what you are doing. So create an
impacts artifact and publish across organization
and publish it multiple times. Reach out to teams which are
actually using
your domain. As I say, using your domain means either they
are using AWS as a platform or any cloud platform
and they are using it on daily basis, but they are not aware of the
challenges which they are actually they
can face off. So publish them, publish them
and then basically make them aware of what you
are doing and if there is a result or not result.
But you can start applying with them and learning from them
is a very meaningful exercise which can be done on your past experiences
or your strategy where you have defined for any platform or
any organization, you can basically systemize
these insights into an experiment and then curate
a set of experiments for that particular organization
or service line. So this is all about
how you are creating an impact on multiple domains. Now, how do
you create a value to this business using these experiments?
I'm sure a lot of people have run the experiments and
they would have got some insights out of it, but they are failing to
convert these insights into a business use case.
Right, because at the end you have to win the business and then once you
win the business, then only the traction or the
organization benefits would be known to other people who are willing to take
these experiments on their services also.
So you have to understand the organizational landscape where
you want to portray, how do you want to portray and what are the business
outcomes you want to actually levy on. Then you
have to create an experiment bucket which has an agility to run on multiple
services, whether it's serving in
a different domain itself, but in the same organization. The experiment
should be so agile that someone can create
a small change in the experiment and can utilize them in their services.
As I said, gaining insight is most powerful and most useful.
When you gain insights, you actually portray them into
your documentations and then people understand these
technical terms and insights which you have
given. They can also correlate with their services and then allow
you to come into their area and then experiment.
Now, acceptance of the use cases across organization. As I
said, you create a pool of use cases which are very generic
in nature and very generic towards the organization where you are implementing
it. And then slowly and gradually start giving
these experiments to run on services,
first manually and then create these use cases as
an automation pipeline based use cases where
people do not have to come and edit your use cases, they just have
to run the use case and then get the output out of it.
So more and more ease
of use cases you give to the user.
They will be highly appreciating to run them and then very less
conflicts would be there to challenge their
environment because these use cases would run internally to them
and then they will get the insights. And slowly and gradually
this will become a pool of use cases where you start
getting the insights. Then the most important
part is conversion of these use cases into revenue. So the business is
revenue. So how do you convert these use cases, your business
use case into revenue? You have to gauge how much and
what have you saved or what have you found,
basically. So for example, you have found an application
which is running on a load balancer, and these load balancers seems
to be very intact and these services are running very
fine, you do not have any challenge in that. But when you run the use
case which you have defined as an experiment on the
load balancer or in basic infrastructure, and you find
that there are projections
of the transactions which are going on internally
and the teams are not able to find out this, and you create
an outcome of this run and then tell them,
say, hey, you know, 30% of your rejections have been going when your
ha is switching over, and 30% means
some amount of x amount of dollars which you have probably
would have lost. Or if in any case of big billion day,
your revenues would be majorly impacted because of these failures
and how this x amount would be calculated. It would be calculated
based on the failures which may happen and
the time or the manner which
would be able to rectify this. In the outage which is in production
or in a big billion day environment, you can calculate the manners
and man days and then how
much time it would take to rectify this use case.
So these terms will give you some sort of revenue
or number and these numbers would be then mapped to your use cases
and outcomes. So the best part is to
try, try, and then try with multiple landscapes
in your other organizations and then create a pool of
use cases which can run automatically into a CI CD line and then
these CI CD lines when they come up with some outcomes,
as I said, you can map them into a menage or a revenue term and
then can win a use case to any business.
So moving on, how you are creating value with
this technology, the use cases could be from open source
use cases, from your learning use cases, creation from the
experiments which have been tried and tested but failed before.
There would be multiple ways you can try these experiments
and create a value using these experiments.
Working on the experiments may be challenging because a lot of times you do
not have an idea where and how to start with an experiment.
So I would recommend go to open source tools, go to
open source communities, find out which are the relevant experiments
which have been driven earlier in the past or probably present in the system,
and then curate your experiment based on that and start
hitting the infrastructure API or any layer
you want for that reason, and start creating your
own experiments with that particular background of your
knowledge. And then either you can use any automation tool to
make it into a CI CD line and probably getting an
experiment result every day, every run or
whenever it is required. So basically you get to know how your system is
behaving and you are hitting the system in the right mode.
Once you hit the system and you get the learning out of it.
You can use these experiment and create a depthful experiment.
Say for example series of experiments. Like once you hit
the infrastructure, then you hit the network layer and in sequence you hit the API
or the application layer. Then you get an end to end result of
an experiment and can create depthful experiment for an organization.
So yes, creating the experiments with technology will help.
A lot of people have asked me a question,
which open source or which license cases tool
would be helpful for me or for my organization? It solely solely
depend upon your experiments. First you hit manually and then
gauge whether my technology which I'm using, chaos, a landscape
which can use a license based tool or can I go
around to an open source community and get my experiments to be created.
And then slowly, gradually when I create a pool of experiments,
can I turn it to a serviceable or a license
based investment or whatever I'm investing right now is
giving me an output which I wanted. So it's again based
on the business requirements and the technology landscape which you
are already using and invested in. So moving on,
I'll not take much of time here, but we have talked in length here.
So experiments in depth. As I said, you create your experiments,
create a series of experiment and then try pushing into
an area where you think these experiments
would help me. And then a prediction of these
experiments and outcomes will help you to
reduce the outages and issues which you are facing in your
landscapes of chaos engineering.
Then revenue realization is the most important part. As I said, convert your use
cases into revenues which help you to map a revenue for
an outage or a time effort saved for
any outcome of a chaos engineering result.
Infrastructure, network and application changes which
impacts any kind of issue or outage can
help to realize what are the major areas in any
application or any infrastructure or any landscape of
an organization. You can dig in and can create
a lot of chaos internally with small, small changes which
may or may not be tested by thoroughly, but you would
be the person who would be able to get the insights out.
Then last but the most important ease of use for
any user across organization which helps or which
can take your chaos engineering experiments without any pain and can run them
without any disruption in their current running services would
be the major business impact which you
can create and these small, small impacts. You can actually
map it to your revenue and based on the runs, based on
multiple runs, usage and outcomes, you can actually drive your business
changes and business revenue impacts across.
So you can please post your question and answers. I would be happy to help.
I have been working with presales and sales and practices
to implement this and have been interacting
with lot of CTO, CXOs and Devsecops,
people who are willing to implement security
as well as resiliency in their application layer and infrastructure layer
or network layer. So thanks for your patience and thanks
for listening me and hope this session will give you a
lot of insight and which will make some
really your organization and you by yourself facing the challenges
and implementing chaos engineering. Thank you so much. Have a good
day.