Transcript
This transcript was autogenerated. To make changes, submit a PR.
Good morning, good evening. Good afternoon, everyone. Who's tuning into this chaos engineering
panel at Con 42, Chaos Engineering. I am Prithviraj,
the host for this panel. I'm a CNCF ambassador and
working as a technical community manager at Harness, and I've been leading
the community for the Litmus Chaos CNCF project for
the last four years. Now, for the esteemed panel, I have with
me four amazing folks from the Chaos engineering community.
Some of them have been part of the community since its inception and
then have been leading the community, the open source side
of things for long now. And I'm pleased to welcome
with us Sylvan from the Chaos toolkit community,
Miko from the powerful seal community, Karthik from the Redmiss
Chaos community, and Anurag from the Chaos mesh
community. And without any further ado, I would
allow each of them to introduce themselves one by one.
Let's start with Anurag.
Sure. Hello, everyone. Good morning, good afternoon.
Good evening, depending upon where you are. First of all,
I want to thank Prithvi for inviting me for this session.
I'm Anrak Pariva from Adobe and from past five years
I'm working in a platform engineering team internally known as
Ethos and where we are basically helping our
service team in Adobe to deploy their services on Kubernetes
and also trying our best to make best use of the CNCF
projects walls.
Right, next up, Sylvan.
Hey, everyone. Thank you indeed for having me as well.
For the past six years now, I've been
leading the Chaos toolkit project.
It's been quite a fun ride because it's
been good to see the market and the project and the community come
together, having seen so many products, great products
like we have today coming up. And yeah, I'm really happy to
have this crowd today because it's quite a lot I want to hear.
So quite happy about it. So thank you very much again for having me.
Thank you so much, Sylvan. Next up, Miko.
Hi, everybody. Very nice to meet you. My name is Miko.
I've been in somewhere in between platform engineering,
SRE, and Chaos engineering for the last decade,
primarily working in finance.
I initially started powerful seal project
back at Bloomberg, I'm forgetting now, but a good few years ago,
and ended up writing a book,
Chaos Engineering from Manning, published 2021.
Very nice to meet you all and looking forward to the panel.
Thank you so much, Miko. Next up, Karthik.
Hey, everyone, I'm Karthik. I'm one of
the co creators and maintainers of Litmus
Chaos, which is a CNCF incubating project.
I've been around around the same time as Sylvain.
I think we sort of started the Chaos community together.
I'm currently a principal engineer at Harness, focused on
the Chaos module within the harness platform.
Prior to this, I was mostly on the storage side of things,
focused on testing the resiliency of storage systems,
and that's how the Chaos journey began. So, yeah, very happy to
be here and talk chaos engineering with you.
All right, thank you so much, Karthik. I think the panel looks
stacked. Mandi, we have a lot of chaos engineering insights for folks
tuning in. But first up, I think Miko shared a
lot about his journey and how he's involved in not just chaos engineering,
but SRE practices platform engineering, and how
he chaos written a book on chaos engineering as well. So, Miko, we'll start
with you. Just tell us about your early days in the
Chaos community. How did you get involved,
and what's your experience been like as part of the Chaos engineering
community? Sure thing. So for me,
it was kind of very natural. So here
is 2015 sometime there,
and I landed on a new project
back at Bloomberg, where we're basically trying to build
a platform for quants on this brand new thing that just landed
called Kubernetes. And at the time, it was just
a bunch of binaries that you hamed to kind of put together yourself
following best practices that were readme on
GitHub. So as we were building that,
I quickly realized that first, we bumped
into a few things that kept breaking.
So we wanted kind of like a new way of testing
against these behaviors. It's not unit test, it's not integration
tests. It's kind of like this dynamic behaviors
that were appearing in the systems.
And second of all, we found
that actually trying to proactively find
issues with this distributed system was
making us sleep better at night and being able
to find things before they break on us
became like the motto of the team that we worked on. And that
slowly, over the next couple of years, evolved into what I
later learned was I was supposed to be calling
cache engineering. The result of that, I actually just had to
look it up. But it looks like powerful seal got open source
about seven years ago. So somewhere around that time,
we published this tool that was basically taking a Yaml file
and saying, okay, do this, Mandi, this and that to my
system, and we're verifying that it was still
working. And then I discovered the rest of the community
kind of building things around that. And different people,
the KS toolkit,
who were basically building similar things in this domain.
And we started talking Mandi,
that eventually actually was the reason for the
Ferris Conf 42, chaos engineering back in 2020.
So kind of like a natural way of,
I keep joking that this is basically a sleep aid,
this entire chaos engineering to try to go a
level above the standard weights of testing and
the community around that has been great. I met a lot
of amazing people and had a lot of so very,
very grateful for all of.
Absolutely. Absolutely. Miko, I think you talk
about seven years ago, and I joined the community like four or five
years ago, and even till now, I've been seeing the exponential growth
and the amazing people who have been coming up,
contributing, making this community vital.
And I think chaos engineering has grown
exponentially, to be honest, in the last two, three, four years,
if I'm not wrong. Certainly has.
And to not a small extent, thank you
to chaos native and Uma and
company. And now part of, you know, you guys
contributed quite a lot to that. So kudos.
Appreciate it, Miko. So next up,
Karthik, what are your thoughts? I mean, you've been
part of this community for five years. You were
one of the founding members of the Litmus Chaos community.
Tell us more about how litmus chaos came into play.
And with all these projects being there,
what was the thought behind it? Yeah,
the motivations to carry out chaos engineering is,
I think it's very similar to what Miko just explained. I think we were at
a similar point in our journey as an organization.
We were trying to stand up a SaaS platform on
kubernetes, and kubernetes was new to us as
well. And we figured the best way to learn would
be to start experimenting and causing failures and see
how systems react, et cetera. So that way it's
very similar. And when we
did find out, try finding out what tools were already available to us at that
point. And like you said, a lot of these tools were, that was the time
around when all these tools were getting built, powerful seal,
chaos toolkit, et cetera. And we found that we
did not have a consistent way to meet all our use
cases. Of course, we learned more about powerful seal and chaos toolkit after
we started with litmus, I should admit that. But one
of the things we found at that point of time, things that
we lagged, was a homogeneous way to do chaos across different
kinds of entities. So we were part non kubernetes,
part kubernetes, moving on to kubernetes. And we saw that
there were a lot of different kinds of tools. There was the Chaos monkey,
which was mostly on cloud instances. We had a very nifty
tool, still look up to that tool today called Pumbaa,
which was doing a lot of container level
faults. And then there were a lot of scripts that
we had written ourselves. There was an assortment, to be honest,
and that was making management of our resiliency
test suites very difficult. So we wanted a standard
contract in the way we define a
particular failure test and how I can actually implement
it, how I can view the results, how I can sort of
make it work in a pipeline. And all this sort of
led us to go build what we called
as litmus chaos. It was actually part of the open source
project called OpenBs. So litmus was just another repository
inside the open EBS organization when it started. But over time,
we learned more about the actual principles of chaos. So, like Miko
said, we don't really know the term called chaos engineering. It walls started
as a way to do fault injection, but we
learned more again, thanks to folks like Sylvain and the chaos,
the initial version of the Chaos work group. We learned
more about what chaos engineering actually entails, how it is
assigned, and then we decided to sort of put a cloud native spin
on it with custom resources and things like that. And that's how the journey
began. And I think some of our choices were vindicated in
the long run with all the feedback we got from the community.
And of course, there were things to course correct along the way. But that
was really how we got started. That was the inception of the project.
So thank you so much, Karthik. I think there was a lot
of agreement I saw from the other panelists, and a
lot of credit to Sylvan for starting this journey,
Miko for also being one of the initial
members of this community. So next up, I just ask Sylvan about,
he's been sharing a lot of experiences on culture practices using
chaos. So, Sylvan, just let us know about your journey,
the inception of Chaos toolkit, and how did it all
start? If you have like, two or 3 hours ahead
of you trying to squeeze that in
a few minutes, it's really a shame. And for
disclosure, that panel should be way longer, because already it's
quite interesting to me to go through back history with
you.
I started with Russ Miles, actually. Russ back,
we were working in a startup company called Atomist
at the time, and we left, and Russ said, look,
that is that principles of Chaos book that
was released a few months or something earlier than that so
from Casey Rosental and people from Netflix.
And it captured what at least
the outcome of years and years of leading chaos
engineering at Netflix. So we were
really hooked by that. There was something that transpired,
but to us, we were less interested in the fault injection
itself than the reading of what it means
to do chaos engineering. The old experiment, learning curve learning
thing. And this is where chaos toolkit was trying to be at,
say, can we glue under a certain language?
And we went declarative. We were quite terraform in spirit,
I guess. So can we go and have a
description file that says, like Miko said, can we do this yaml
or JSon?
Because to us, what was interesting was, can we learn from this?
Right? Not how are we going to do the fault injection. There were
already quite a few tools, like Pumbaa, indeed, and then the native
Linux tools like stress ng or very low level
things. But to us it was we
never wanted, Mandi never wanted anywhere to stop at
low level fault injection. To me, when you
look at incidents, a lot of them come from someone
published a wrong configuration file somewhere.
So it's nothing to do with your cpu or memory.
So we thought, okay, can we create that format
where any sort of experiment is a
chaos engineering experiment and delegate
to better tools for the fault
itself, right. And let people drive those tools with a common interface.
And this is where we came from.
Interestingly, the chaos toolkit format,
and how could I say,
its model changed only once, like three months
after we started, we added something I'd missed,
and since then it's never changed. I've added things, but never changed.
So it's a testimony that at least
what we took from the principles of chaos was
something of meaningful. It was a great
outcome after years of working with Netflix, doing this.
So it was quite interesting for us.
What we didn't do, I think, as good as others
after that was especially as Litmus was to
promote that. Mandi, think we
touched on that later on about how to grow
from there. So the culture, just to finish on your
question around culture, I don't have a particular
culture at AnT. The only thing that I do is I always treat people
nicely. It's my only culture. I try
to be kind. Mandi, when someone comes to
me on slack and say, I don't understand that, to me it's always
about, I've got bad documentation, not people are being stupid
or anything. Mandi, that's the only culture I know of being civil.
Mandi kind. Because we're walls learning, right?
And that's the only thing I can do, honestly. So it's
the way I see communities, but I'm not
very good at driving the community itself,
just me. So this is where I'm at. But what I find fantastic.
Mandi, promise I'll finish on that. What I find fantastic
is I realize today how much we haven't talked together
enough over the last few years,
and perhaps that the biggest foul here is
we all do the same sort of thing and with the same spirit. The proof
is we're here and for the
next phase, we should really spend more time together. Mandi, talk about as a
community, as a global community, we should be more aligned and more talking
to each other. I think it's not even like we avoid. I don't want to
sound like we are at war or anything, not at all. It's just I think
we are on our lane and we have things to do and
we just don't get the time. But it's a shame. We should talk more.
Anyway. It's rambling as usual. You know me.
Absolutely. Sylvan Walls said talking
about it more and just exchange
of ideas, sharing thoughts. I think this is something
that would have helped the community. Mean I'll
talk about the current state and my thoughts on the community as well.
But Sylvan, as you said, I think we
all are learning how to build the community and
take it forward. But yeah, I think this is something,
these panel discussions meeting together, exchange of ideas
are something that become vital. Moving on.
Just talking about advocacy, I would ask Sylvan,
you've been part of this community for seven, eight years,
have driven a lot of advocacy behind it, given a lot of talks here and
there. So what is your thought on how advocacy has helped
chaos engineering reach that level?
What's missing in that culture that people are still
not adopting it? What are your thoughts or take on that?
Sylvan it's a large
subject, and if I had the answers,
the definitive answer, I would be in a better
position. In general,
I think, like we were saying,
probably as a community,
as leaders of tools that we've created, we've not been good
at joining forces into a bigger thing.
So make it look bigger than it might be,
right? I remember I'm going to use
one of the commercial tool that exists there.
When they started, we discussed with them and we were telling them we
need to have more players in the field. It's good to
have competition at a commercial level and probably open source level as well,
because the more you have to a certain degree, the more it proves
in a way that there is a valid need
for this right market, however you want to call this. So you
need that. You need to have the players that sort of work
together in a way to prove the existence,
the validity of the effort,
basically. But then you rexed to fight like
Anurag was saying. Other fights like a
lot of DevOps. I'm going to use that word extremely loosely
because I don't want to have any sort of, I don't
have any definition, but I was actually recently looking
at a Reddit thread where the person asked, is chaos engineering
still a thing? And there is a lot of answers. And what
struck me was how a lot of people
didn't basically throw that away,
dismiss the whole idea like it's buzz. Right? And I
was surprised to see that in 2024
still, right. I would have been quite not
surprised to see people saying, we'd love to do that. We just don't have the
budgets, Mandi, everything. And some people do say that, but there is still a lot
of advocacy with the audience that we have
in mind, whether it's sre, DevOps, whoever,
and the developers themselves. Right. So there's still
a long way to go to not sound like we're trying to sell
something that we find fancy. Right. And it does take a long
time either because a lot of people are more confident in
themselves and what they put out. I remember talking someday to some engineers
or actually higher up saying, we are quite reliable.
Right. What do you do to know that? Well, we don't do anything.
We have all the things that we need to tell us that we are reliable.
Okay, let's see each other. When you hamed an outage. Right.
So you do have that mindset in the audience we're
targeting as a community. If they don't
believe that it's useful, it's going to be difficult because they are your advocate,
they are your champions. And from what we've
seen in care circuit, quite a lot of time we've seen a single person
becoming the champion, either for personal reasons or because we're advocated.
And you need to support that person. You need to give them everything
they can to succeed within their own organization. But it
has to be repeated with every single organization. So it's not really
scalable. It's not easy to scale that because as open source
communities, we don't have the bandwidth to be
there for every single person. So while I was saying that I'm kind and helpful
whenever I can, it doesn't mean I have the bandwidth to be the
champion of the champion. For every single person
out there, it does take time and you end
up not being responsive. So the committee may think that
you're not there for me, which is not the case. I would like to be
there, but it does take time. So you do have that. And finally, we were
saying a little bit before, it's not prioritized.
Until you don't see that prioritized in delivery teams,
it's going to be so easily dismissed. Right? A lot of
delivery product deliveries just don't put resilience.
It's not chaos for chaos, but reliability. However, you name
that as an indicator of success
for this, so you end up with quite a few times I've seen
companies where you have a team trying to champion saying, yes, we believe in that,
we brought it. Sometimes we've even built our own platform, whatever we do.
But we struggle. We struggle to make it a
thing. Everywhere else, even them
internally struggle to advocate for it. So I think we
still have a long way to go, but I think we are stuck in that.
I'm not rephrasing. We're not stuck. We've been cornered
a little bit because of ourselves, but also because of, generally speaking,
how the media, the market. However, you want to see
chaos engineering into something that is, do that when
you're mature enough. So you need to look like Netflix to do chaos engineering,
which I always fight against. I've talked to people who say
we are restarting our project. We can't do for sure
chaos engineering. Yes, you can. You're just going to adjust to
what you have and grow with it, because you're not talking about chaos engineering,
you're talking about your operations. You want to operate right now
the best you can. So it's
something you have to be relentless continuously. I've got the same discussion now
that I had six or seven years ago, and it's okay, I guess
it's where we at, right? It takes years to get there.
But then again, a lot of organizations still haven't
even started on bringing DevOps as a culture.
So it's the same for everything,
right? And finally, for the last two years, anyway, at least
year, project fundings hamed,
decreased dramatically within organization. So even
when using open source tools, like, like the ones we have,
obviously that is commercial proposition for sure. But even open
source tools suffer from that, because suddenly you have people turning
their back and saying, walls, I can't use this anymore because it's not prioritized anymore.
And therefore you're left with, well, where do I go?
From this. Right. So it's a lot of things coming together.
And while as a community, we can push, there is
a lot of the world that we can control. So it's going to take
time. It's going to take time.
Absolutely sylvan, I think the last point that you mentioned
how open source projects are not just open source projects, the idea
of chaos suffers is people start
using it. But then there's some hard call that's
taken that there's no budget for this, or this is not our priority right now.
I think it's a two way suffering, where one way is the
project suffers in terms of adoption or users, but the
other way suffers is also in terms of contribution.
It just becomes stale. Or the company that's sponsoring
the project, it believes that there's no interest in this anymore.
Why do I. To sponsor this anymore? Why to take it forward. So I think
that's why you mentioned that chaos engineering is being cornered.
And that is where I can partially agree with you
that. Yeah. With how things are moving and
with this kind of a market, I think that's something that's
evident for all open source projects out there. I mean, Mandi projects out
there, but, yeah, per se, chaos engineering as well.
There's some impact that's there for sure.
So with this, I would just move on to Karthik.
Karthik has been one of the members of the open source community,
chaos community, for some time now. And Karthik, I would just
want to know from you, what exactly are the challenges
that you have seen? I mean, you've seen the open source side of things.
You hamed seen the enterprise side of things. And what would you like
to list as the challenges that you have seen moving
from open source to enterprise, or just the enterprise adoption side of
things in today's world?
Yeah. Before getting into the enterprise adoption,
et cetera, just dwell on the open source
side of things for a few minutes. I think I
resonate with the point that Sylvain made. You want to be there for everybody
who is trying chaos, make them successful, but it's
not a humanly possible task. Right. You will find someone
on the other side who is interested in doing chaos,
adopting your tool. You help them in all possible ways.
They're able to get out a very technically
appealing POC, which basically
helps them sell the solution into, let's say,
a couple of teams. And then there's no follow up on top of that
purely because of this reason. Right. No prioritization, no budget,
et cetera. So you've spent a lot of energy trying to make someone successful.
And of course, in that process you learn a lot from them about what's lacking
in the tool. Mandi, you go back and build. That's how communities,
those projects grow. But then it stops
at some point. That's probably the biggest barrier
in terms of running with chaos.
You will find some of that spillover onto the enterprise scene as
well. If you can imagine, this is the case with open source where
you're not really putting a cost on it. In case of enterprise,
it's sort of going to be exponential, right.
Because there's a cost associated with it. So you're going to scrutinize it so much.
Is this really needed for me?
So that cycle is very long,
but purely from,
let's say, a technical perspective,
what people ask for when they're trying to,
let's say they're all on board with respect to the chaos engineering.
They've got the buy in from the management, they've managed
to prioritize it, they've identified the target team
or the domain or the services. They've gone ahead and allocated
the infrastructure and budget for it all. That's good.
Now, from that point onwards, how does chaos scale?
Right. How successful is a chaos product there?
There are a lot of operational constraints. I would say it's
all good to start some low intensity,
low blast radius, simple kind of chaos scenarios.
You might be, let's say, doing the hello world of the
chaos engineering kubernetes. Chaos engineering space. Right. That's the pod delete.
You do that, you're very good. You actually
can convince people to drive a lot of value out of
a pod. Elite experiment. It sounds very innocuous, but it's actually very useful.
That's what we found in our journey. So you
get a lot of value. But from then on, for the person who is very
enthusiastic about wanting to do more and more chaos,
he hits a roadblock with some of the other faults that they
would want to do something that's, let's say, more involved. They want to
do a network packet drop or they want to do something
that's more, let's say a storage level I o latency
or something like that. You're entering into territory
which wants you to run privileged containers. You should be basically
adding a lot of container capabilities to be running it. You are now
facing friction with the security teams who come and tell you that we are not
going to allow this unless this is really important. And then you sort of
hamed to convince, except for a lot of verbiage
and a lot of documentation and explaining.
You don't really have any other sort of armor in terms of how
you push those kind of faults into their environment.
Right? So that is some of the things that we see.
And then, of course, large organizations, when they try to operationalize
chaos at, let's say, a larger scale, now we are talking about
more mature organizations who really recognize the need
for chaos and they're okay to run with it. They're ready to give all
these kind of permissions and everything.
They hamed requirements that will be something like,
you will have a certain group of people who should be allowed to do chaos
on, let's say, only certain services, only a certain point of time,
using only a certain set of roles associated
with them. You might want to introduce some kind of freeze windows.
You might want to go ahead and simplify the visualization
around how much of chaos is happening. Of course, all the dashboarding
integration with different APM tools, these are par for the course.
You sort of hear them in most conversations. But there are a
lot of other ways in which they want to aid you with respect
to how they carry out chaos operations. Right. They are very concerned about the
security and impact of the chaos. They want to
gain the benefit of the chaos engineering Mandi chaos experiments,
but they want to be very careful in how they implement it. So there are
a lot of constraints that come that way, and they look for tools to alleviate
those kind of problems. The other thing is a
lot about, I would say when
you have more folks coming in to do chaos engineering,
there's a lot of perspective differences in terms of what they want to get from
it. Reliability is understood differently
by different people. Right? So how do you
quantify the impact of chaos? So when
you say calculating KPIs for chaos, that's a very broad topic
you might have seen in the chaos carnival. Different people came
with their own notion of how to measure the impact of chaos. So that changes
from organization to organization. And how are you catering to
that group? Who's looking for that insight from
their application or from their teams?
How do you go and put a number that says, this is how
impactful chaos has been for you, this is how effective chaos
chaos been for you, and this is how you go from here.
Right. And many organizations who are beginning out with
chaos expect some kind of solutioning as well. So you should not just
be ready to say, this is how my tool works, but you should also try
and understand how the tool can work for them. Right. And what are their
use cases and they also look for guidance on
at what point where you can actually help
with chaos engineering. With all the services
that they have right now, how can you sort of ease it into the system
and how best to sort of sell it to their own management
if they want to scale it, they look for certain solutioning guidelines as well.
So this is what we sort of deal with on a
regular basis when you are trying to take the enterprise
route. But I think a lot of this
is worth it. A lot of these conversations gives us a lot of insights,
and of course, a lot of those insights translate into features or capabilities
and move into the open source project again, because that's
the karma giving back thing. So I
think that's where we are now in our journey and
we are learning, I would say absolutely
Karthik, I think one point you mentioned about reliability and how chaos is
seen as part of reliability. I think there
has been a vast definition to reliability, and a lot of people are trying to
mix chaos with a lot of things. You see, I mean,
the slos, and then there's incident management
and then there's so much more that's coming under the paradigm of
reliability. And then that is how modern chaos
engineering has changed. And that's where I go to my next question to Miko.
I mean, Miko, you've been seeing chaos engineering for so
much. I mean, so long now, and you've been part of other
practices as well, sre platform engineering. How do
you see modern chaos engineering today? I mean, there's generative
AI as well, which is being induced right now to
move chaos forward. So what's your thought process on modern
chaos engineering and how chaos has changed from back then to
today? I love your question.
And predictions have that
nasty thing about them that they almost always false. But I'm going
to venture something because it's such an interesting question.
A caveat to that is that I never actually had a commercial
chaos engineering thing, so I can't really speak to what Sylvan was talking
about when you're trying to convince people to use that product.
I was just sharing the things that I walls doing so
it was somehow easier. But the response was
very often exactly what he described, which was, oh yeah,
this is such a cool idea. We're not going to do that.
We're not mature enough. But I think there are basically
two elements that are kind of now on
a collision course together. I think one of the elephants
in the room that I noticed a few years ago was
that managing tooling to
implement these things was the easy part of the equation.
The harder part of the equation was actually making sense
of the data that you found and doing all
the things that you need to do to implement this
lessons in practice, because a lot of that wasn't even technical.
Like you said, sort of an initially messed up config.
There's no tool that's going to prevent
you from doing that. So you had that one elephant in the room, and then
on the other kind of side of the
collision course is what you mentioned. With AI, we're at
a very interesting time in
the evolution, I think, of humans in general.
Like last year, effectively, our species figured out that if
you take a big enough corpus of text, like 1 trillion
tokens or something, and then you mush it up for long enough with
a stash of gpus whose prices just
skyrocket because mining wasn't enough,
all of a sudden you can have this new type of software that
can summarize text, that can kind of understand intent,
that can do a lot of things that are creepily human like.
And I think for us, like in the SRE space
and incident management and
chaos engineering, that creates a massive opportunity
to solve at least a part of
that problem where it's easy to get the data, it's much
less easy to actually make sense of it and
summarize it, because typically it's a lot
of data and actually leverage this large
language models to do
some of that harder work for you. Mandi, think what
I was picturing the other day is like, why am I still writing
bash all those years later? I'm still terrible
at writing bash, and every now and then I have to look up how to
do a loop, because that's bash for you.
Why am I not talking to a small optimized
LLM that can run locally and doesn't cost the
price of a house in electricity to run? And if we apply a
similar logic to understanding all the signals
that we already have from all the tooling that we have for
chaos and for SRE and in general for observability,
I think we could really take it
to the next level. Mandi, make a lot of these arguments
about, oh, but this is work, and we don't have budget for that,
somewhat mute because you can do it automatically. So I'm
super hopeful about that. I've not
been in the AI space for very long, but last year
I kind of realized that this is not like crypto
that's going to hype, that's going to die out. This is
legitimate new discovery that actually turns out to not be
as difficult as the Sci-Fi writers
had us believe to achieve that kind of like human
like interaction. So I'm both creeped out about this
and super excited, and I think that,
well, I would love stuff like
that to come out of our community and I
think it will. So, yeah, here's a prediction
for you. Let's see how this ages.
Absolutely, Miko. I think with things changing,
and as you mentioned over the last couple of years,
how we have seen AI come in, I think
we'll see a lot of change. And as you mentioned, it's not like
crypto. The hype will come up, tie down.
If you talk about chaos today as well, we've been still
referring to what was done initially, chaos monkey,
the principles of chaos. I think that still stays
the same. So hopefully that stays alive tomorrow
as well. So with this, I think,
yes. Miko? Yeah. Don't get me wrong. Right now everybody's slapping AI
on everything. Seems like this is
the way to raise funds at the moment and all of that. So that bit
will go away eventually, but I think there is a
legitimate part of it that's definitely shifted the
landscape. Yeah,
absolutely, absolutely. So with this, we'll just cover
1 minute each of sharing what the future
looks like. I think Mika has given a lot of insight. So before we
move on to him giving the final future prediction direction,
let's, let's get an insight from Sylvan.
What do you think about the future direction?
Well, yeah,
like you said, michael, predictions. All right,
future direction for Chaos Toolkit. I think I
want to spend a lot of time trying to care
for it where I couldn't for the last few years because I was a
little too busy with reliably and all other things. So I
think I owe to the community a little bit more
care for the product. So it's going to be essentially documentation.
It's going to be a lot of tidying up. If I could
rewrite that in rust, not because I think Rust is better than
Python, not at all, but purely because I wish I could just
drop a binary and people could just run it, whereas with Python
it's a mess, but bits
and pieces like that. So basically making the project, I think,
more, I would like to
be a bit more developer centric. So go back to
the developers just because, and that goes
back to our discussion earlier. There is one thing we didn't mention,
but I have a sense that there is a fatigue of
new things coming all the time, and there is a fatigue
generally across the board. And I would like to come back
and I sense a little bit of what do we do with
all of this? So that's why I'm interested in going back to
perhaps talking to developers or engineers to
be more global and try to be out of their
way and not promise too much and just deliver that
one tool they need and that does the thing that they want at the time
they need. And that's it, right. Not more so
being more focused, lean and out of their way and
not necessarily a full on platform. I guess that would be
where Kaosurki tries to be back, or.
Yeah, good insight,
Sylvan. But just one more question. How do you see
the future of chaos engineering overall? There was one thing about
Chaos toolkit, but what's your comment on. Yeah,
okay, so I completely obviously missed the question. So I'm sorry
for this.
To me, chaos engineering,
I agree with what has been said
that it's still central. It's still
central because we are glorified when
we. I say we as engineers, globally, what we do is
we integrate with a gazillion of third
parties. Right? You want to build a platform, you're going to use
stripe or something like this for your payment. You want to use other services
like this. So you end up being. Connecting things together
all the time. So if you don't consider this
as a problem at
every milestone, at every step, you're missing something as
an engineer. So I think, hope that
chaos engineering will be realized. It's not a buzzword,
it's not a vendor trying to push things, but it's a legitimate
action that you do on a daily basis. That's why I'm
focused. But generally speaking, I hope chaos engineering,
I'm going to use the abused verb of shifting left,
which I don't like, but the idea of bringing chaos
engineering closer to what engineers need to
do to deliver their value, right. Not a thing that
comes on top of, but that supports
what they need to do. So I think as long as chaos engineering streamlines
itself into the delivery pipeline, I think it's
going to win. But if it sticks to be that thing that
lives in its ivory tower somewhere,
it will struggle. It will struggle. So hopefully, to me,
chaos injuring is going to be something that the mask, in a
way, can just take and do, and that will make it
absolutely. Anurag,
some thoughts from you. What's next in your chaos engineering
journey? How do you see chaos engineering?
Thank you so much, Anurag, for those insights. I think
a lot of enterprises or folks who are getting started.
Trying to see your direction will get some insight from your thoughts.
Karthik, would you like to share some insights on the future direction?
According to you,
I like the point about documentation. That's a sure
short candidate for us to improve on in litmus
as well. I don't know if documentation is ever going to be
not a problem. So it's always going
to be something that we need to continuously keep working on.
And that's really the top of mind thing
for us as well. On predictions,
I don't know if this is a prediction. This is probably something
that's already out there. It's known domain chaos
engineering is going to treat faults as just
going to commoditize faults. It's more than fault injection and stuff.
It's going to be about what insights it is giving about
the system. So how do you measure and what all
can you measure and let your developers
or sres get value out of that particular chaos
experiment run? You don't want to be running an experiment,
investing time and tooling to not learn more about the system.
So how much can you learn about your system by running
that experiment? And what all can you measure
not just over the course of one experiment, but across your
entire chaos journey? How are you going to quantify?
That's probably the questions that we're already being faced with,
and there are a lot of different folks attacking
it in different ways, but I think that's
really critical and probably that is also going to,
in a way, I think, motivate people to do more of Chaos engineering
if they can sort of taste how much they will learn about the system,
how much they can learn about it by, let's say, doing a chaos experiment.
These are all the metrics that I've got. These are the places where
probably I wouldn't have looked otherwise at all.
Right. That's basically the original intent of chaos
anyways. Probably also why CNCF put all
the chaos tools in the observability category when they started out in
the original.
So that's probably the most important area
in which the chaos tooling or the general discourse
about chaos is going to evolve. Probably.
All right, thank you so much, Karthik, for some insight.
Before we close the panel, some conclusive statements
from Miko on his thoughts. He's given a prediction,
but his thoughts on the future. What are the changes he
would like to see coming in? Thoughts?
Yeah, so I ventured one already about the artificial
intelligence. I would also like to venture one about the
very human intelligence. I think it goes back
to what Sylvan was saying, that we kind of all run in different
directions and try to do things. And I
would love to see us come together a bit more often and
maybe that walls be enough to generate a little
bit of hype and generate a little bit of an agreement to
kind of accept this as a boring practice that's
no longer crazy to do.
And we're doing some of that with comfort. Two,
shout out to mark, my brother.
We're doing some of that with Sre day,
but I'm not sure what form this should have.
But I definitely would love to see more of that cooperation.
And I think basically what
I'm saying is that the tools are there and
we figure out most of the technical aspect and if we manage
to figure out the human aspect a bit better, we'll be on
the right track.
Absolutely. Absolutely. Miko, very well said. I think there's
so many projects that are coming up, I mean, supported by the CNCF,
open source, on the end price side of things, so many
events, talks. I mean, if we are seeing Con 42,
chaos engineering for the fourth year, fifth year running, and then
there's chaos cons carnival, so many meetups,
webinars happening. So I think these
things obviously lead chaos engineering until
now. And we hope to see that over the next at least four or five
years, maybe chaos engineering with
all the modern practices, with all the evolution will still
exist. But again, thank you so much, everyone,
for joining in the panel. And I hope all of them watching just
got some insight, getting started with the practice or
have more inputs for the community. And again, I hope
to see more conversations happening around chaos and similar pan is going
through. Thanks everyone.