Conf42 Chaos Engineering 2021 - Online

Chaos Engineering in 2021

Video size:

Abstract

Chaos Engineering has come a long way in the recent years.

Starting with Chaos Monkey, through the distributed effort, it has evolved into a serious, scientific method of making your software better.

In this talk, Mikolaj Pawlikowski, the author of “Chaos Engineering: Site reliability through controlled disruption” will talk about the journey so far, and what lies ahead.

Oh, and we’ll see whether sharks are really more dangerous than hamburgers.

Summary

  • chaos engineering is all about testing the resiliency and the reliability. Things that we were taking for granted couldn't be taken for granted anymore. Engineering is at an interesting crossroads right now. There are different trends and phenomena in software engineering that put chaos engineering in a nice spot for adoption.
  • Scale, cloud, native and complexity. Trend that makes chaos engineering so important right now is the cloud. Networking is pretty hard. Latencies, how do latencies affect your systems? Do you survive them? What about timeouts?
  • chaos Engineering is the discipline of experimenting on a system in order to build confidence in a system's capability to withstand turbulent conditions. It doesn't replace the different ways of testing. It's all about thinking, what if.
  • Every new technology goes through something like this, whether it's smartphones or tablets or wearables. As it goes and gets more and more adoption, it turns into boring and just part of the landscape. This year might be just a moment when it becomes a fact of life.
  • The myths that still kind of float around chaos engineering are one of the top contributors to why generating buy in for chaos engineering is still a little bit tricky in 2021. One myth is that it's only for massively distributed systems. The fact that we have this entire spectrum means that adoption is possible for different companies.
  • The selling point of curse engineering is that you can do it during working hours. But the more complicated conversation typically is with your manager. How likely are you to die of a shark attack? How scared should you be? Using chaos engineering can be fun and easy.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
You. Hello, everybody. How are you doing? I'm really hoping that you're having a good day. It looks like an exciting day today at comforty two chaos engineering with plenty of good content. Looking forward to that. I'm going to talk to you about chaos engineering in 2021. I know that this is a lot like, we all have very high expectations of what 2021 will bring, given how the last twelve months went and how challenging they were. But I'm going to try to focus on what I hope chaos engineering is going to achieve soon, where I think it is in 2021, the journey that we've made so far together. And spoiler alert, I'm going to argue that it should be boring and that boring is a good thing. So one of the things that the last twelve months tested was our resilience. And we started talking about resilience and reliability much more because we realized that things that we were taking for granted couldn't be taken for granted anymore. Things like going outside and going for a walk or shopping and getting the basics or even the infrastructure and electricity. Turned out that all of that is actually put into question when there are unprecedented events happening. All our systems are being tested. We talk about reliability a lot and resilience, things like the resilience of our healthcare systems, the reliability of our infrastructure and stuff like that. It's probably a good moment to kind of look at what we really mean about know. We talk about reliability in terms of software engineering kind of all the time. But if you take a look at the definition, this is from Oxford Dictionary, is the quality of being trustworthy or of performing consistently well. And one of the examples they give is the fundamental aspect to building relationships is providing reliability. So I was thinking of a good example of what a really reliable system would be. And I came up with this. The pyramids, if you think about that, they have been performing remarkably reliably for thousands of years. They're still working as expected. They were built, there's still a tomb, there's still a bunch of rocks put together, and chances are then a few millennia from now, they're still going to be standing there virtually unchanged. So it's a really reliable system. When you think about it. It's not maybe has exciting as kubernetes, but it's a really reliable system, or perhaps it's more exciting. And then we have the resilience, right? The resilience, again from the Oxford Dictionary, is defined as the capacity to recover quickly from difficulties. And another definition they also give there is the ability of a substance or object to spring back into shape. And I think I might like the second one even more than the first one, because the springing back into shape, it kind of gives you this image of pressing or pushing or compressing, and then the thing comes back springing back into shape, elasticity of it, right? So 2020 has been a real test of resiliency on all kinds of ways. Our lives have been modified from one day to another. Things that we were taking for granted, like going on holidays, going outside, have been tested, and the resilience of our institutions, of our infrastructure, of our companies, of our teams, we spent a year basically video conferencing with people and trying to maintain social interactions with them. All of that has testing whether we can still be pushed and compressed and then go and spring back into the previous shape. It's interesting to talk about this in the context of chaos engineering, because chaos engineering is all about testing the resiliency and the reliability. I'm probably talking to two very distinct groups of people here. Some of you came to this conference to learn about chaos engineering and how to adopt that and how to get started with that, kind of ramping up the adoption curve. And some of you might be practicing that and might be preaching that and are looking for an update on what's new. So I'm going to try to address both points of view. I think it's fair to say that, at least in my point of view, chaos Engineering is at an interesting crossroads right now, at an interesting point in the timeline, because there are different trends and phenomena in software engineering that kind of put chaos engineering in a nice spot for the adoption. And also we are slowly moving onto that adoption curve, and we can see that by a number of companies that start doing that, a number of companies that offer services and stuff like that. But why is it an interesting crossroad? I think there are different trends that kind of made it possible for chaos engineering, kind of made it really necessary for chaos engineering to emerge. I wanted to mention a few things. I wanted to mention scale, cloud, native and complexity. So let's start with the scale. Scale is a little bit, it's probably the easiest one to explain. Here in our global connected world, we start having bigger systems. We have bigger companies that manage not hundreds of thousands, but hundreds of millions of users. And the systems that we build are getting bigger, they're getting more complex, they have more features, they have bigger scale. And from the geographical point of view, we have users from all over the world. It's all connected, and the scale just becomes bigger, which increases the complexity. Another kind of factor, a trend that makes chaos engineering so important right now is the cloud. We've been on this bandwagon for a number of years now. When you're using someone else's computer and you're calling it a cloud, and if you're starting a company right now with any kind of stack, chances are that you're probably using some kind of cloud. You're probably going to AWS or Azure or Google and you're using someone else's computer. And that layer means that it solves some of the problems, but it also adds extra layer of complexity, because now you not only have to understand your application and the underlying operating system, but you also need to understand how the cloud is built underneath because most of the time you can probably not think about that and get away with that. But if things go wrong and get complicated and you need to debug something, you need to understand that you also have tools that you need to use to talk to the cloud provider. You have the APIs, you have caveats, you have quirks that you need to work with and the differences in operating between the different clouds. But it's also a driver that kind of goes back to the scale, because now you can literally go to a cloud provider and say, oh, tomorrow I need ten times more resources than I have today, or in an hour I need ten times more resources, and chances are that you'll be able to get that. So that's another trend. Then with cloud came the cloud native ecosystem that has flourished and absolutely grew enormously in the last few years. And the prime example of that being kubernetes in this short time, it basically became the de facto API for scheduling, orchestrating containers across a fleet of vms. That was also made possible and made popular because of the cloud. Right. And with kubernetes, you get another layer. So now you have the cloud, then you have the Kubernetes layer that works great and has some amazing features that you can leverage and that you get for free. But you also need to understand these are things that can go wrong. And when they do go wrong, you need to understand how to debug that. You need to understand where to look for trouble. There's another part of the learning curve that becomes relevant to you, and that kind of all boils down to the extra layers of complexity that if you are deploying software today, it's no longer just about the software. I mean, it probably never has been, but it's not about just the bugs or memory leak. Now obviously I know that you guys don't do bugs or memory leaks, I do. But even if you write the come that works perfectly, you can still mess things up with configuration or versioning. If you have a dependency that you point out to the wrong version or you point out to the wrong dependency, you have a problem. So even if all the bricks involved are coded correctly, you can still mess things up. Then there are things, the embarrassing things like certificates, expiring licenses probably. Once again you never had this problem, but I suspect that the certificates expiring on people is more popular than they want to admit. Then because of the fact that we have this increasing scale and everything is in the cloud and more and more distributed, we have this entire family of problems that come from the connectivity. Connectivity. Networking is pretty hard. Latencies, how do latencies affect your systems? Do you survive them? What about timeouts? Are the timeouts reasonable? Are the timeouts aligned between the different components so that the whole thing makes sense? What about retries? If you get a timeout, how do you retry? Do you get that thundering Hertz problem? Do you exponential backup? Do you have circuit breakers? For example, if you retry enough times do your circuit break, when do you come out of circuit breakers? How do you do all of that? And then on top of that you also have now new bricks and new things in the equation. If you're using kubernetes, you're probably using an overlay network. That means that you have to now understand how that overlay network is set up and how it fits with all the other components. Maybe you have a service mesh on top of that that also includes its own features and these extra layers. They solve come problems, but they obviously include their own problems and their own complexity that you have to somehow manage. Then if you look at the infrastructure level, you build for redundancy, which is great. In theory it should be easy, but it's another bit of complexity scheduling. If you're using kubernetes, like I mentioned, you have to understand how it's scheduled, you have to understand what the weak points are, what are the actions that you might take that will break the scheduling or things like self healing. It's a wonderful feature of kubernetes and to be able to get self healing for free. But what happens when that stops working? You now build your application in a way that relies on the behavior of kubernetes to bring it back to healthy state. What happens if it doesn't? Or what happens if you made it stop doing that because you misconfigured something, then obviously the isolation, all of that makes sense. If you're making good use of your machines, that means that you're going to be sharing things. That means that you have to take care of things like resource starvation. Do you understand well how the cpu limits are actually implemented? Do you understand what happens to the process if it goes and tries to get more ram that's allowed to? Do you have alerting on that? Will you notice that? Will you know about this? These are all kinds of problems that we now have to deal with. And these are all the kind of things that even if you're just starting and you start your own company, your stack is probably going to involve all of this wonderful and amazing features at the same time. All of this extra complexity to deal with. And then obviously, on top of that, this was all technical. We still have humans involved, we still have operations. Operations that have the overhead of having to understand that. Operations that can make mistakes, incident responses, tribal knowledge, all of that is still has applicable to all of that. None of the technical solutions can eliminate these things. So the complexity, it's basically why chaos engineering is so relevant, because we can go and we can test the systems as a whole and experiment on them to see how they cope with all of that complexity. So for those of you who are new to it, chaos Engineering is the discipline of experimenting on a system in order to build confidence in a system's capability to withstand turbulent conditions in production. And this is what I consider the canonical definition, circa 2015 from the principlesofchaos.org from the Netflix gang. If you remove the fluff from it, it's the experimenting to build confidence to withstand turbulent conditions. That's it. That's really what it's about. And that's why it's so relevant, and that is why it can be useful on so many levels. Because if you think about it, it doesn't replace the different ways of testing. You start with your unit test. When you focus on a small subset of a component to verify some kind of behavior, this is great. Then you have the integration tests where you take components and you put them together, and you verify that they work together. Another layer is probably the end to end tests, where you take the system as a whole, you spin up a small instance, and you verify some happy path. And then that's where chaos engineering kind of comes in, as an extra layer of verifying things. When you take the system as a whole and you experiment on this, you verify that when things happen to it, the unhappy path, when things happen to the system and might have nothing to do with the correctness of the code that you wrote. The system continues working as designed. And when we design the skills experiments, it's basically like the game of what if, right? It's all about thinking, what if. What if the network latency increases? What if traffic spikes? What if the database becomes slow or we trigger a circuit breaker? What if the application needs to heal? The what ifs about the system as a whole. You've verified your unit test, your integration test. But what happens when you don't have enough cpu? Or what happens if the slowness happens the way that you didn't expect it to? It's basically like the what if book by Randall Monroe. One of my favorite reads, where you go and ask yourself hypothetical questions. Probably not the kinds that you cover here, like how could we build a Lego bridge between London and New York? By the way, I really recommend that book if you haven't read it. But the kinds of what ifs that are realistic to your system. Right? And speaking of the chaos experiments, we've kind of bolded down to four steps. Now we're calling them experiments because we want to underline the kind of scientific behavior of that. If you really apply the scientific method to that, it boils down to four steps. You ensure observability first, and this is important, observability is this nice word about basically being able to measure something reliably so that you can observe a behavior. So for example, if you have some kind of API server and you serve requests over the Internet, if you pick a variable that you can reliably measure, you can build your observability on that. So if you, for example, measure the request per second, or average latency, or whatever it is that your metric is, you can build your observability on that. Then measuring the steady state, steady state is just a fancy way of saying that this is the normal range. So let's say that, for example, we measure the request per second and our normal range with the hardware that we have. And the setup that we have is about 100,000 requests per second, right? Let's say that plus, minus five or 10% of that is our normal range. This is the steady state. And the first step is when we start having fun, we go and we think about what if, what happens if 20% of the servers go down? What happens if the servers don't have enough cpu? What happens if one of the servers restarts? All kinds of things that we can take a look at and we can actually implement as an experiment. And if you go and form this hypothesis and you have the right observability and, you know, your steady state, then you can run the experiment. And ideally, we spend a lot of time trying to automate that and trying to make sure that this is repeatable and verifying that, because we're working on an entire system. So it's easy to influence multiple variables at the same time. And sometimes it's not obvious to isolate what you want to do, but you run the experiment, and then the nice thing about the experiment is that whatever the outcome is, you're good, because if it works, that's great. That means that you get more confident in your system. You know that in this particular situation, the system behaves the way that you expect it to if it broke. On the other hand, that's also great, because that means that you found a problem and you found a problem that you can fix before the problem found you, or before a client came and complained to you. So this is the ideal situation. These are the chaos experiments, these are the trends that all put us in this kind of interesting position on the innovation adoption lifecycle. This is what typically looks like it's a bowl curve. You will see that when you read about adoption of new technology. Every new technology goes through something like this, whether it's smartphones or tablets or wearables or whatnot. And it basically means that we have a very small group of first innovators, people who see enough value in the new technology to put up with the terrible uis, terrible UX, or bugs, or just bad first versions. Then after that we have a slightly bigger group of early adopters who are still able to put up with a certain amount of things and come lies that are not ironed out yet, but still see enough value to drive the adoption. And then that group grows and grows, and eventually, when we hit a certain critical mass, it becomes an early majority, the late majority, where most of the people already use the technology, and then potentially some laggards towards the end of the curve, the right hand side of the curve. So what happens to a technology is that it goes from exciting and new and unusual, and as it goes and gets more and more adoption, it turns into boring and just part of the landscape. And I'm going to argue here that we really want chaos engineering to be boring and be part of the landscape, and that this year might be just a moment when it becomes a fact of life. And boring is good, because when you think about smartphones a few years ago, every new version of a smartphone that got released has significantly better than the previous one, got all these new features and all of that. But at some point, the speed of innovation plateaus. It becomes more and more boring. Every new smartphone now basically looks the same, has roughly the same better life, and until there is a new breakthrough. Looking at you foldable phones, we are at the boring stage. And that means that smartphone, everybody has one. That means that it's just part of the landscape. That means that there are parts of the world where you don't have a pc, but statistically, you do have a smartphone, and that is a good thing. That means that everybody has access to that, and everybody can get value out of that. Another example is rocket science. We're getting to the point now with SpaceX, where no one's really watching their launches anymore because they got boring. First time I saw that rocket go up there, and then one of the stages land, kind of like in science fiction movies. It was really, really exciting, but now just kind of happens and doesn't make headlines anymore because the rockets keep landing. The starship didn't land particularly well for the last two tries, so that made news. But when it gets boring, no one will be talking about it. Right? Same for flights before, COVID you could just jump on a plane and be reasonably sure to be delivered safely to your destination at a reasonable price. Something that people not that long ago couldn't even imagine. So boring is good. And the closer we get to boring with chaos engineering, the better, because everything improves, right? If you looked at it in 2015, there has the chaos monkey, and you had a choice of either use chaos monkey or write your own tool. If you take a look at it today, there are dozens of open source projects that you can use. There are tools for pretty much anything you can think of. There are dozens of commercial tools that you can use. You can actually go and pay someone come money, so that they bring the knowledge for you in house, and you can get started really quickly. Things are improving, and then we need the critical mass of companies that start seeing the value, and we're seeing the increase in that. We're seeing companies that have been historically very risk averse, that become more and more interested in that. And this is how we get it on the roadmap. We get it boring, and boring is good. Unfortunately, it's not all rosy quite yet. There are still things that we need to address, and I'm really hoping that 2021 is when we do that. This is a little poll that I did on LinkedIn towards the end of last year. And I asked people, what's blocking you from doing chaos engineering? And the top two responses were difficult to generate. Buying with 50% and inadequate training with 27. The missing or hard to use tools only got 11% of the votes. So after speaking a lot to different people in different companies, I identified the myths that still kind of float around chaos engineering as one of the top contributors to why generating buy in for chaos engineering is still a little bit tricky in 2021. And so I wanted to cover some of that with you so that you're prepared for having these conversations so that when that arrives and someone tries to use that, you can debunk them. One of the things is the it's chaos monkey, right? Kind of situation. It's an interesting one to kind of talk about, because, yeah, in a way, it was all kind of started by SCS monkey back in the day. Netflix moved on to AWS 2010 eleven, I think, and they started doing chaos monkey. Eventually they spread it, and that's how the chaos engineering name was populated. Right. That is a testament to chaos monkey being a good name, because everybody seems to know it, but it's so much more right now. It's like the entire spectrum. You have the chaos monkey on one side, where you don't really need to know that much of your system. You can set it up very easily, and you can randomly break things and see what happens. And this is valuable. This is how it all started. It allows you to detect things that you missed in other layers of testing. It allows you to test for emerging properties, the things that happen because of the interactions between different components that you wouldn't have caught with your unit tests or integration tests. And this is great. This is a very good start where you don't know where to start, but there is the entire side of the kind of other side of the spectrum where you can start by analyzing your system and start from the weak points of your system, and design very custom tailored experiments that deal with very particular failures that you expect to do. And the fact that we have this entire spectrum and you can find your happy point, your happy spot, anywhere on this line, means that adoption is possible for different companies with different constraints, different possibilities for what kind of risk is acceptable to take. Right. Another thing that I keep hearing is that it's testing production. And I'm expecting this one is probably never going to go away, because there was so much about breaking things randomly in production at the beginning. And it's also understandable because it is the holy grail. Ideally, we would all be so amazingly confident and have done so much testing with all the other phases that we would inject the failure in the production system. Because if you think about it, that's where the real data is and everything else is an approximation. We will never be 100% sure. We can never 100% test a system because it's a copy, a clone, an approximation of what the production one is. But it obviously doesn't mean that you have to start there. And it doesn't mean that if it's not in production, it's not case engineering. You can get a lot of value, and in some cases, you probably can't ever go to production. If people will die, if you accidentally detect a failure in your system, probably never going to be acceptable. But that's okay, because everybody can apply that to the level where it makes sense for them. And if you can go all the way to the production, that's great, and we should probably talk. But it doesn't mean that this should hamper your adoption. Another one is, it's only for x massively distributed systems, or microservices, or golang, or kubernetes, or cloud. And this is also a myth, because kind of like similar to adapting to come engineering to your system, it doesn't really matter what your system is. You can still use the same methodology, you can still apply the same scientific framework to designing and running this experiment. And in my book, I'm going to give a link at the end, I have an entire chapter that talks about a single process that's legacy that you're not even taking a look at the source code of, that you can play with and tinker and experiments to verify that the retry logic that it supposedly has is working on a single process, on a single machine, without a distributed system or without kubernetes. So it's really important to stress that it's flexible enough that you can get value out of applying this to pretty much any system as big or as small or distributed as it is. And obviously, if you have a massive, amazing system and you do that at scale, that makes for a nice, better blog post. But you can apply that to anywhere, and there is no reason to not harvest that low hanging fruit and their nice return on investment. Another one that I keep hearing is that our organization is not mature enough for cash engineering. And I get it. It probably comes from just being a little bit intimidated by the big guys and thinking, okay, yeah, but we're not Netflix or Google, so we might not be mature enough. It might be coming from just not being confident that confident in the existing test. If you expect things to break, that can be intimidating. But once again, this is a flexible framework that you can apply to whatever makes sense for you. So if you don't feel you're mature enough to run that on a big part of a system, you can start with a small part of a system. It's something that really can be applied at any level of maturity. And obviously the value you get will differ, but it's good to get into the habit of doing that anyway. And you can already get value and then connected to that is like this one. I literally get it pretty much every time. We already have chaos, wink wink, kind of joke. And if you're saying things like that, you don't really understand what chaos engineering is about, because it's not about increasing the amount of chaos or increasing the number of things that can go wrong and unknowns. It's about decreasing that by injecting failure. That's very controlled, that you understand. If you just inject random failure into your system, that's probably not going to teach you much apart from potentially, oh, your system doesn't work when something breaks. But if you know exactly what you're injecting and it's a reasonable thing to expect to happen to your system, the popular kinds of failures, you decrease the amount of chaos, you know how it's going to behave. So that's no longer an unknown, that's something that you control, you can fix, you can modify. So if you're saying that, have a think about what chaos in uni is really about, because it's not about increasing chaos, it's about decreasing that and finally breaking things randomly. Kind of goes back to chaos monkey. And I kind of touched on that, talking about the spectrum. But it's really, really important to stress that the randomness, it's kind of like the practice of fuzzing when you generate random inputs or random race conditions and random orders, and you verify that the system works well, and this is great, but it's just part of it now, and there is more to it. And if the randomness doesn't work in your case, you should probably look at the other side of the spectrum. Okay, so this is the myths that I keep hearing, and if you are able to talk about them, you're already in a good spot to get cursed in your roadmap. But there's also two conversations that you need to be prepared to have with other people. And these are the things that I called getting called less and risk versus reward or sharks versus hamburger kind of situation. So let's start with getting cold less. Because if you're talking about cursor engineering and to the people who actually get cold at night, this is likely to land as the best argument that you can use. If there is something that can be done so that I don't have to wake up at night and log in and context switch and try to figure out what happened and get coffee. Anything I can do to avoid that is great. So the selling point of curse engineering for these people is just that you can do it during working hours. The more we experiment and more we find things that break during working hours. That means that you don't have to be cold at night. So typically that lands really well. And this is a very simple argument in favor of gas engineering. But the more complicated conversation typically is with your manager or someone who needs to okay doing something like that. And this is the risk versus reward situation. So it turns out that when you think about that, we tend to be pretty terrible at estimating the risks and rewards of things. And there's the entire evolutionary history behind that. But if we don't put numbers on a napkin, back of the napkin, we're probably going to make some bad decisions if we just follow our instincts. And to illustrate that, I would like you to think about how likely you are to be in danger of a shark attack. How likely are you to die of a shark attack? How scared should you be? And if you ask a person in the streets, chances are that they're going to overestimate that risk very, very strongly. If my googling skills didn't fail me, 2020 has three fatal shark attacks in the US, the country that tends to have most of them. Whereas if you take a look at the actual things that kill people, heart disease number one, takes about 650,000 people every year in the United States, meaning that the likelihood of you dying from eating hamburgers that look really good but are unhealthy for you and increase your bad cholesterol is so much higher than getting killed by a shark that it should really not be difficult, one, to compare, but because of the fact that sharks have this teeth and they look scary. And also, there was a movie about them, or maybe two. I'm looking at you, the meg. The fact that the stats doesn't support your fear doesn't prevent you from being scared of it, right? So next time you look at that hamburger, think about it, because that hamburgers is potentially much more dangerous to you than sharks. But what does all of that have to do with chaos engineering. If you put numbers together for the risks that are involved in doing some of this curse engineering, and you do things like monitoring the blast radius and minimizing the blast radius, and you do things like making sure that you only apply the experiments to a small subset of things, and you don't just rush into production and you do things properly. The risks can be controlled, can be managed, and they might be your shark that you're scared of, whereas you should be scared of that hamburger instead. So with these two conversations and the myths and understanding the trends and kind of like interesting crossroads moment for chaos engineering on the adoption curve right now, we're finding ourselves in this very interesting point in time when we can make the push and we can get the curse engineering over that bump where it becomes boring and it becomes a part of that landscape. If you would like to learn more about curse engineering, I published a book about it last year. It's called curse engineering. Site reliability through control disruption. It's from money. It's a complement to existing books where I try to bring a very, very practical approach and cover all the tools that you might need and show you and kind of demystify that using chaos engineering can be fun, can be easy to start with. You don't necessarily need very complicated or sophisticated tools. You can reuse things that you already know and that all you need to do is change a little bit your mindset and then use all those tools. So the book is covering various different levels of your tech stack that you might be exposed to. It's starting with, like I mentioned, a single binary that you don't know what it's doing, apart from the fact that you know it should be handing the errors, and you manipulate system calls to verify that the retry logic works okay through things like containers and how they break, how they work, what's actually Linux containers, what's actually under the hood, how that works, how to understand what happens when you hit the limitations of Docker, how to implement your Liu container, and how to test applications that run in containers all the way to Kubernetes, and how to test Kubernetes itself. If you happen to be one of the people who are responsible for a Kubernetes based platform, where you need to make sure that it runs, how it's built under the hood, how it breaks, what the fragile points are, and also testing kubernetes, the software that runs on kubernetes, to verify that the ongoing verification of things like slas and various other things like working with JVM and injecting bytecode to trigger exceptions to verify that the behavior is the way that you expected, all the way up to JavaScript and working with browser to verify that your uis that get more and more complex and more and more fancy are actually behaving the way that you want it. So there's a discount code that you can see here. I think there's also going to be some free copies if you want to grab some for the conference. And let me know if you'd like to connect. This is LinkedIn. If you scan this code, that should bring you to my profile. Probably the best way to reach out to me the photo credits and if you are interested in this kind of problems, we are hiring at Bloomberg. Go to bloomberg.com careers and reach out to us. And with that, thank you very much. I'm really hoping that the 2021 will be the time when we get the chaos engineering to be the boring part. We get it over the hump of the adoption and it's going to be part of everybody's framework, because there's really no reason to not get all of that return on investment because it's easy. So thank you very much. Have a great day at the conference, enjoy the rest of it, and I'll speak to you soon.
...

Mikolaj Pawlikowski

Engineering Lead @ Bloomberg

Mikolaj Pawlikowski's LinkedIn account Mikolaj Pawlikowski's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)