Space Birds love Cyberbugs

Video size:

0:09 Miko Pawlikowski

Hello and welcome to Conf42Cast! Episode two: Space Birds Love Cyber Bugs. My name is Miko Pawlikowski and today with me is Liran Haimovitch, the CTO and co-founder at Rookout. also a frequent speaker. Rookout empowers engineers to solve customer issues five times faster by making debugging easy and accessible in any environment. See what Rookout can do for you at www.rookout.com. Hello, Liran. Great to have you here today!

0:41 Liran Haimovitch

Hey, Mikolaj, great to be here as well.

0:43 Miko Pawlikowski

Can you tell me what's the deal with Rookout? Where did the name come from? You were there at the beginning, right?

0:50 Liran Haimovitch

Yeah, I was there from the very beginning. The name actually comes from the name of the bird - rook, which is a very clever Raven. It’s good at solving puzzles, you can actually throw some food in a glass of water and it's going to use stones and stuff to get the water levels up. And it can actually use tools such as small tree branches. Essentially, Rookout is about having a very smart bird on your hand that allows you to instantly extract any piece of data you want from your application so that you can see what's going on.

1:21 Miko Pawlikowski

Yeah, so kind of eagle eye plus a smart bird. Makes sense! I always have a lot of respect for people who jump into building a company from scratch. That's quite a challenge and very risky. Was there any moment that made you think “okay - this is worth doing”. Was there like in a single moment, or you just knew you kind of needed to do that?

1:44 Liran Haimovitch

So we were exploring the realm of cyber security, DevOps and dev tooling, we met with many engineering leaders, and individual contributors. Every single time this problem came up, it's my code, it's running somewhere remote, and I just couldn't get it. And we found people trying to spend so much time and effort but kind of solving it backwards. Whether it's about getting your CI/CD pipeline to be fast enough that you can just add logs on the fly. But it's never fast enough to try to replicate your production environment locally, whether it's through fancy orchestration, Turing's database, migration, tooling, service, virtualization, tunings. And not only are you spending so much time and effort on that, again, it's never accurate enough, we met some companies that would spend two or three days doing database migrations, every time a customer reported a bug, just to be able to observe it and see what's going on. This pain was so universal, we knew we had to do something about it.

2:49 Miko Pawlikowski

Yeah, I can feel the pain already in there remotely from here. And you just thought “we need to fix that” and you went for it.

2:57 Liran Haimovitch

Yeah, I just felt like we had a new angle on it, a fresh perspective. And we figure that at the end of the day, while we traditionally use log lines to monitor production, the act of updating those log lines of updating those metrics was very cumbersome. It was releasing a new version and deploying it with everything that comes as part of the process. And that's a very big process, a very big risk, very big change we're doing. And the end of the day, all we're trying to do is change a single line of code, sometimes even flip a single bit in memory. And we're spending hours doing that. And we figured there had to be another way. And so we kind of take into our cybersecurity mindset and skill set and our experience and kind of build something that allows you to do it on the fly that updates the application for you on the fly to collect any piece of data you need. Without having to spend any time or effort doing it yourself.

3:49 Miko Pawlikowski

Yeah, we'll dig deeper into that in a second. But tell me, what's your favourite and least favourite bit about working in a startup? And especially, how did all of that dynamic get affected by the biggest pandemic in the recent memory?

4:04 Liran Haimovitch

I love the experience of creating or making stuff that wasn't there. And in many ways, building a startup is the ultimate experience of that. It's not just about building the product. It's about building the company, the culture. It's about hiring the people and training them. A company is kind of a living, breathing thing. It's more than some of its power. It's all the people, all the product, all the customers. Seeing it all come together knowing you're the big part in making it happen - I think that's my favourite part. My least favourite part, I would say, it's about uncertainty. When you create a startup, you literally create a business that's going bankrupt every day. By definition, you're spending more than you earn. You're constantly relying on the next injection of cash coming down somewhere down the road. So, while not necessarily a lot of personal risk, there's definitely a lot of pressure. Think about it, would you want to be the CEO in a company that's literally going bankrupt every day?

5:09 Miko Pawlikowski

I can definitely relate to that. That actually makes me think I was watching this “Kurzgesagt” video the other day. And it really stuck with me that they were talking about the breathing process and how the energy is being released, and the bottom line is that we're constantly dying. With every breath, we reset the timer for like 2 minutes. And that's how we live our lives. That's kind of basically what you describe here with the start of except that the oxygen is replaced with money, right?

5:35 Liran Haimovitch

Definitely. I'm not sure how many people would be comfortable with this thought of every breath they are taking is literally bringing them step closer to death.

5:44 Miko Pawlikowski

It got deep. So one last random question before we jump into the record, if you could have any animal at all as a pet, would it be a rook?

5:51 Liran Haimovitch

I think that's a good question. I actually have a dog right now, that dog really loves chasing after ravens and crows and stuff. So if I had a rook and that dog side by side, it might get messy.

6:04 Miko Pawlikowski

Although it's probably better than having a cat and a bird, you gave a talk recently at Conf42, about understandability. Could you talk a little bit about how that's different from observability, the other buzzword that we keep using these days, and what the common things are and how it differs in how we apply it?

6:24 Liran Haimovitch

Observability has been around for a while now, over five years. And it stems from the need of SREs and Ops to know what's going on in the system. Essentially, they deploy a system and it contains some code, some configuration. You deploy it somewhere, hopefully in the cloud, and then you have to know how it's doing. You can't see into the servers themselves that are remote, there are potentially many of them. You're trying to figure out, is my system working properly? And there are many definitions of that, but those tend to be somewhat strict. How many requests am I serving per second? What's my latency? What's my error rate? And then how can I break that down throughout the customer experience through different API's? If I'm in a micro service environment, I probably want to see this breakdown per microservice and see the interactions. It's kind of getting the big picture and knowing what's going on. That's the essence of observability. Understandability, in a way, goes a step deeper and goes a bit different route. It’s about how well do I understand the system, not only how it's doing, but also what it's supposed to be doing, what it was originally meant to be doing and what it's doing now, and kind of so on and so forth. Truly understand the system, we don't get that luxury of just seeing it from a bird's eye view and knowing everything’s alright. In general, we often have to answer deeper questions. And that's kind of what software engineers do on a daily basis. How is that feature working? Why is this customer getting a five and the other customer is getting three? Why for that customer, the screen it’s blue, and for the other customer - green. And it's not just about knowing that everything is good, that customer is supposed to be getting blue and that customer is supposed to be getting green. It's also about understanding the algorithms behind it. Understanding the flows, the configurations. And unfortunately, this means that you constantly have to get new questions, whether you're resolving a bug, trying to design a new feature, or just adopting an existing feature. The questions keep changing on a daily, sometimes even hourly basis. And so the data you need to answer those questions is always shifting as well. Whether it's about the source code itself, the inputs and the outputs of the system, the configuration and state of the system, various dependencies you might be interested in. Even more importantly, how they all play together.

8:52 Miko Pawlikowski

Okay, so it's not just the school of thought that the methodology is the best set of tools. Is that a t-shirt? Which one? Is it really the understandability?

9:03 Liran Haimovitch

It's a bit of everything, especially a t-shirt. I think it's first and foremost about school of thought. One example I like the most is the fact that when we got our basic computer science training within the university or anywhere else, you had those simple exercises, or no salt data structure redefined in a POSIX content. And those are the kinds of exercises that each of us got through after a few hours on that first computer science training. And now we expect all of ourselves to be able to do it in a matter of minutes quite often. And yet, when we get those tasks in the context of large, complex environments, they often end up taking weeks and often that's the difference. The difference between plugging those two lines of code into a huge system. It's all about knowing when to plug them and what to plug and that's about understandability So it's not just about being able to do things. It's about understanding the scope, how it all comes together.

10:06 Miko Pawlikowski

So basically observability is cool, you can see things, but then the next step would call it now “understandability” - kind of level up. So you touched on a lot of things that people typically have problems with, like you mentioned, the time it takes to get feedback. Is that what people hate the most about debugging, like from your customers experiences? Is that like the number one challenge? Or, is that something that you or other people hate even more than that,

10:35 Liran Haimovitch

it's often that feeling of helplessness that gets around when you're kind of stuck. And obviously, sometimes you get a bug, and then you're like “Oh, I know that!”. I made a mistake, I forgot a dot or a slash, or I need to redact one. And that kind of brings us back to understandability. You know the code so well that once somebody tells you the symptom of the bug, you instantly know where in the code you made a mistake and how you need to fix it. And that's where your understandability is very good. But quite often, it's not that easy. I would say more often than not. Whether it's because you're missing logs, missing metrics or you don’t have access to the customer data, or you don't have the full picture of what's actually going on, and what environment the code is operating in. And all of the sudden, you're kind of stuck, and you feel helpless. And this feeling is often aggravated when it's a customer that reported the bug, especially if it's an important customer, or something that has a big impact on the company. And now everybody in the company is looking at you whether it's the tech lead, or the engineering lead the solution engineer, and definitely the marketing and sales departments. Everybody's kind of staring at you and saying, “you have to fix the bug”. And you're like, “I have no idea what's going on”. That's a terrible feeling!

11:47 Miko Pawlikowski

Especially when it's like a rare condition that only produces itself once or twice, and you have real trouble reproducing them.

11:57 Liran Haimovitch

We recently found a couple of memory leaks within the V8 engine itself when put under extreme pressure. That was so frustrating, we literally spent weeks handling those memory leaks only to find them within the V8 engine itself, which was pretty crazy.

12:16 Miko Pawlikowski

We use so much open source technology and so many dependencies that it's sometimes really hard. We work a lot with Kubernetes and most of the time, when you do find something that really looks like a bug, if you look deep enough, there's probably someone who ran into that already. But there was this bug the other week, we were trying to track it down. Eventually we found a ticket that was put in by someone a year ago, and only updated like a few days when someone actually managed to reproduce like race condition when a connection was being closed.

12:48 Liran Haimovitch

I think your point around open source is another great example of where logging doesn't work well enough. Because when it's your code, and you want to add a log line, then that's quite easy. You go to the code, you edit the line, you rebuild, redeploy. But what happens when you want to adjust one log line to an open source project. All you have to do is edit the code, but then you have to create the entire build process, the packaging process and dependency process on your machine. There is a huge difference between being able to read the source code for an open source project and being able to build it efficiently into your own build process and into your dependency management process. That's the difference between 10 seconds to type the line. And potentially days of work setting up the development environment. And that's another great use case for Rookout. Because when you can instantly set a breakpoint just like a traditional debugger, then you can debug anything. It doesn't matter if it's your code, debugging is easier than anything in many cases.

13:49 Miko Pawlikowski

Yeah, especially now when everything's now cloud native and microservices and everything. You have all these processes running remotely, and you have to get to them. I feel like at this stage, the suspense is unbearable. So tell us - how does Rookout work under the hood?

14:04 Liran Haimovitch

Under the hood, we offer five SDKs - one for the JVM and one for the dotnet runtime, one for the Python runtime for the node runtime and for the Ruby runtimes. Each of those SDKs is built to be different and yet very closely related. What we do is we essentially map out the code in memory, we map out the class objects, the functions and all that and we find a memory representation of that code, and we allow you to edit it. So when you go and say go into main.java in line 40 and want to see what's happening there, we find in memory which function main.java at line 40 is and then we recompile that function with additional instrumentation code that extracts data on the fly. So that the next time this function is called and the line 40 is hit, you are going to get to see what's going on inside of it.

14:56 Miko Pawlikowski

Okay, so let's say I have a Java application, your SDK as some kind of Java agent. It starts and then it modifies my classes on the fly with your extra debugging code, right? How safe is that?

15:11 Liran Haimovitch

Very safe! Like everything else we're doing, it's about trusting our infrastructure. Whether it's the Java virtual machine itself, you could ask how safe is that: taking Java bytecode stored in .zip files and running it in memory. That has its own set of challenges. And the thing is, at Rookout, we take great pride at our work. And that's what we focus on - doing that in the possibly the most secure way. I know there have been some alternatives that kind of allow you a DIY approach, where you can use various instrumentation libraries to do it yourself. Obviously, there is a lot that can go wrong. We narrow it down, we do a very specific set of data collection that's heavily tested on our end, as well as we add a lot of policy-based safeties on top of that. We have rate limiting and hit limiting, we have sandbox control that ensures you don't modify the application, you just collect data from it on the fly, and the correctness will be impacted. We ensure that both the code will remain valid on the one hand, and on the other hand, we ensure that the applications overall performance and correctness won't be modified, because we tightly control what's going to be executed.

16:23 Miko Pawlikowski

So it sounds to me a little bit like a “debugger light” that not everything's wide open, but you have like this specific set of things that you want to look at and you basically expose a debugger like kind of interface. Is that the right way of thinking about that?

16:40 Liran Haimovitch

I think that's a great way to look at it. We strive to bring the experience of the debugger, also the experience of a profiler to every environment. Obviously, some trade offs have to be made. You can’t stop the application, because if you're going to stop a production application, or a service mesh application, things ain’t gonna to go too well. We simulate, we provide you a very close experience. We collect the data, we show you snapshots of various data controls and security controls in place, to ensure you can use it in a production environment. And try to bring you the best of both worlds. This is as close as you can get to running a debugger in production

17:16 Miko Pawlikowski

Gotcha, and the SDKs that you mentioned - does it mean that you need to build your application with them, or does all of that happen at the startup? Kind of enrichment without you actually having to instrument the code?

17:29 Liran Haimovitch

It's a bit of both. It depends on the runtime itself. Some runtimes can support instrumentation runtimes, for some of them you have to compile a scene. But either way, it's a 10 minute installation process. And you're good to go.

17:42 Miko Pawlikowski

I was just thinking - I have this massive bug, I spent two weeks on it and I gave up, I need Rookout. So I go and then do I need to recompile my thing? Or do I just need to attach some kind of sidecar that will do the instrumentation on the fly?

17:59 Liran Haimovitch

It's not a sidecar, but it's a quick package. You just add it, you can get it up and running in under 10 minutes and find about 10 times faster than you would otherwise.,

18:08 Miko Pawlikowski

Hey, no way! You said five times faster on your website. Now it's ten? [laugh] Which one is it?

18:14 Liran Haimovitch

So you know, it depends on which marketing person you speak to that time of day. We're seeing customers report anything from five to 10 times faster. So it's more about how conservative we are with specific data.

18:27 Miko Pawlikowski

And in terms of performance, you said that trying to give the best of both worlds and I'm guessing with the limiting and just trying to put as little instrumentation as possible, do you have any benchmarks that we could look at? Or are we gonna have to trust you with that?

18:43 Liran Haimovitch

So we do have benchmarks we can provide. Not everything is public, but we'll be happy to provide. As a rule of thumb, I can say that a single breakpoint is under one millisecond for most runtimes. And it's often a lot under one millisecond latency. And on top of that, we have various mechanisms such as heat limiting or rate limiting, so even if you set a breakpoint in a very hot loop, it’s just gonna move into sampling mode automatically disable, based on various policies you can configure.

19:10 Miko Pawlikowski

So you went with this active approach building this custom solution for all the different runtimes and I know that a lot of the industry right now is trying to leverage BPF to get things that the Linux kernel mode, do you also do any of that at the Cisco level or is that too coarse grain, so that is not necessarily useful to your users?

19:32 Liran Haimovitch

So the answer is a bit of both I guess. For now, we are focusing on the application core, on the code itself. We are looking at dynamic instrumenting other components in the future such as ebpf, such as logging such as our stuff, but even if you look at ebpf which is a very interesting growing technology, there are various integrations. I think node is a built in ebpf engine and things are changing very rapidly. A whole lot of focus is about being able to agilely and dynamically collect data and instrument. Wherever we go, we try to say “it's not our focus to collect you the same data day-in day-out”. There are many companies that do that: APMs, logging integrators, and so on. And they're doing a great job, what we focus on is being able to collect just data you need with as little overhead as possible, and being able to move it very fast, so that you can change your mind in a matter of seconds on what you want to collect. And we don't want technology to hold you back, we want technology to empower you. And this actually goes beyond debugging. It goes into collecting analytics, collecting metrics, any piece of data you want from your software should be available within seconds, rather than prioritising new features and getting lots of work and so on.

20:49 Miko Pawlikowski

Makes sense, a different use case. So you started with about four years ago, right? Something like that, which kind of correlates with Kubernetes becoming a thing? Is all of that because Kubernetes made it so hard to debug things that you have to do solutions like that, or is that just a coincidence that this happened roughly at the same time?

21:12 Liran Haimovitch

So it's an interesting coincidence. On the one hand, we have customers from all around, whether they're the most cutting edge cloud native, even some serverless. On the other hand, we have a lot of customers still running data centres with vertical scaling servers, Java application servers, and all of that. I would say we’ve just mapped it a few months ago and we were very surprised to see that most of our customers are very deep into Kubernetes. I'm not sure if that’s because the pain is so much bigger in Kubernetes. And especially in service mesh environments, where you just can't set up decent development environments for your engineers, or is it just a coincidence that those companies are more agile, and more interested in adopting new technology?

21:54 Miko Pawlikowski

You know, I was kind of half joking. Kubernetes has some advantages and also makes things a bit more complex to explore. Okay, one more question. I noticed that you have a solid open source footprint on GitHub. Are you an open source company, kind of open source company? Is there anything you recommend sticking out on GitHub?

22:15 Liran Haimovitch

We're not an open source company, or at least not for now, it's always been a debate for us - how do we build the product in a way that's both useful from the open source perspective and still sustainable from a commercial perspective. It's a tricky question we've been struggling with. In the meantime, we've tried to out to open source as much as we can. We've open sourced various tools we've built internally. For instance, we've built a tool called Git enforcer that allows you to enforce and automate various best practices over GitHub pull requests and so on. Over the past couple of years, we've seen GitHub implement some of those features since then, which is pretty cool. When we originally wrote it three years ago, but much of it was lacking. So let's say a developer opens a pull request. And when they don't have a ticket to merge it yet, because it just saw about as they were walking, they can just add a comment to the pull request and automatically get the Jira ticket for it so they can manage it. So that's available as an open source project. We've also open sourced some of our research around integrating air with browsers and stuff. We are hoping to release some more of our co IB into open source this year.

23:28 Miko Pawlikowski

Awesome. That was really cool. I'm looking forward to trying out all of those things, really. I just want to close up with one question I think most of our audience might find useful. If you were to pick one single highest return on investment thing that you did for your tech career, what would it be? And would you recommend other people do that too,

23:50 Liran Haimovitch

My career's been, I would say, pretty unorthodox, or at least compared to many people. The important thing I took is being independent and being able to believe in yourself and your ability to learn. Just knowing that you can pick up any topic you want, whether it's along the way with rock out, I grew my knowledge from Python from basic to expert, I learned Java, I've learned JavaScript, and I've learned Ruby. I've also upgraded my C sharp knowledge, and all of that pretty much on my own. I think the important thing about software engineering is learning to learn. It's about knowing how to learn. Picking up new technologies, picking up new languages, and expanding the capabilities. And of course, wherever you're working with our smart colleagues around you, you definitely need to build on that, learn from them. But at the same time, it's important to both learn to learn and to build your capacity for learning, as well as gaining a deeper understanding of every technology you pick up because you would see those themes returning: different languages, different runtimes, they have a lot in common. The more you Deeply you understand one, the easier it's going to be to understand the next. So always learning, always being inquisitive, makes a huge difference in your ability to pick up new skills and technologies and be a great engineer.

25:13 Miko Pawlikowski

Awesome, I like that! Building the tree branch by branch, you never know where the next branch might go. Thank you for that, Liran If you want to check out Liran’s company - once again www.rookout.com. Thanks so much for coming, I'll see you next time!

25:26 Liran Haimovitch

Thank you, everybody!

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways