Conf42 Cloud Native 2021 - Online

Monitoring Kubernetes VS Serverless-based applications

Video size:

Abstract

Monitoring Kubernetes-based application is a common practice nowadays. What is changed when moving to Serverless? Comparing an established practice to a new practice can assist developers to build the right journey to the serverless world.

The software we write does not always work as smoothly as we would like. In order to know if something went wrong, understand the root cause and fix the problem, we need to monitor our system and get alerts whenever issues pop up. There are many useful tools and practices for Kubernetes based applications. As we adopt serverless architecture can we continue to use the same practice? Unfortunately the answer is “no”.

In this session we will discuss: The differences between monitoring Kubernetes and serverless based applications Best practices for serverless monitoring Methods to efficiently troubleshoot serverless based applications

Join the session and start enjoying the great benefits of serverless computing.

Summary

  • Erez Berkner: We're going to talk about monitoring Kubernetes and how serverless changes everything. When we talk about the evolution of transportation as an example, we can see different things that evolve over time. And serverless is really about upload your code to the cloud and we got you covered.
  • Five layers of Kubernetes that we need to monitor. We have the infrastructure, the actual hardware where we have a vital sign. And we have the application itself where we see application logs. Use these tools to really understand what's going on across hardware, infrastructure and application.
  • Serverless is a variety of services from lambdas to containers to manage containers. In order to monitor serverless the right way, you need to make sure that you have the right tools. Lumigo takes metrics, tracing and logs, and we connect the dots to understand when things are going wrong.
  • If you want to try out Lumigo, we have a free trial and a free tier. No code changes and everything is automated so you can have a full view of your environment with this. And wish you a great week.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello, my name is Erez Berkner and I'm CEO and co founder of Lumigo. And welcome to this session about monitoring and debugging Kubernetes versus serverless. So just as a context, we're going to talk about the evolution of the cloud. We're going to talk about monitoring Kubernetes and how serverless changes everything. And then we're going to talk about monitoring servers. And hopefully we'll do a quick demo at the end. So when we talk about the evolution of transportation as an example, we can see different things that evolve over time. And the reason I'm talking about transportation because I think it's very relevant, very similar metaphor for serverless. So when we think about transportation, we based to ride by a car we owned, we bought it, we fueled it, we navigate and we got to our destination or we could went a car, then we don't own it. But we still need to take care of all the maintenance. We can navigate through trains or based and that means just figuring out how to get there or we can get an Uber and that's focusing on getting there. And this is very much what's happening in the cloud. So we used to work with the physical servers where we own the hardware, deploy the operating system and in charge of scale and our code. Virtual machines really took the hardware away. So we were renting servers, containers, brought this to an upper level of virtualization and abstraction. So you care mostly about runtime, the scale and the code. And serverless is really about upload your code to the cloud and we got you covered. So this is where these are similar trends at the same time when we talk about continue managing them is very popular to do with Kubernetes. And I want to mention five layers of Kubernetes that we need to monitor. We have the infrastructure, the actual hardware where we have a vital sign that we need to monitor. We have the cluster of the Kubernetes itself, we have the pod, we have the overall readiness of availability of the pods. And we have the application itself where we see application logs. And you need to remember that whenever you go and work with Kubernetes, you need to make sure that you cover all these five layers in order to really understand what's going on across hardware, infrastructure and application. I want to share with you a couple of tools that are very work really good with Kubernetes. I'm sure some of you heard of Prometheus, a great, great open source tool that basically allow you to connect or basically you have able to pull metrics from the different nodes and aggregate those. You can save it to a storage, you can analyze them and push alerts through Prometheus alert manager to slack to predator duty. And you can use Grafana which is also very popular with Prometheus in order to visualize that and have dashboards that show you the health of your system. This is an example of Grafana monitoring a Kubernetes cluster. And note that there are different layers over here of what we talked about before, like the pod usage, like the cluster's availability, the cpu, the hardware, so on and so forth in terms of getting the application logs. So you can use a log stash in order to log stash like basically elk in order to get aggregation of all the logs and make them available and searchable. Another great best practice to have monitoring of the application logs of kubernetes. This is a really cool open source tool that I want to share. It's not that known in the community, but I found it very helpful, especially when we need to understand how traffic is flowing across different containers. And this tool called viseral and relatively easy to connect to this and again, it's open source. So go check it out. It might be very useful for your scenario. I won't finish the Kubernetes part without mentioning service mesh, which basically allow you to, or I would say when you're monitoring service mesh based architecture, your life is much easier because you have a centralized point where you can deliver and get information and metrics and you don't need to go and add a layer that do that. It's actually integrated in your architecture. Istio has some great, great tools for that. So that's another point to remember when you're architecturing your environment and you care about monitoring. If you have a service mesh baked in or planning to use service mesh, use it also for your monitoring. And let's talk a bit about serverless. So when we talk about serverless, I just want to frame it. It's not just like AWS lambdas, it's a variety of services from lambdas to containers to manage containers. Fargate, DynamosDB's API gateways, stripe, Twilio, all of these are ways to consume functionality without actually maintaining a server. And this is what I define as serverless environment. And when we talk about this environment can understand that serverless is different, it's ephemeral, meaning there is no server that is always up and running. There are hundreds of components that work together, not just like three tier application that we used to have and there are actually no servers, so you cannot deploy agents anywhere. You need a new methods and in order to monitor serverless the right way, you need to make sure that you have the right tools. So that's a quick comparison of server based versus serverless. So in a server base or continue for all of them, you have many many small parts. You need distributed tracing in a serverless. In a continue environment you can use good old agent based and there are many good solution open source or proprietary solutions that solve that. In serverless alerts doesn't work anymore, so you need to use APIs or libraries in order to integrate and infer what's going on within a service. When we talk about costs, this is containers is per resource, serverless is driven per request, which really makes a difference when you think about what to monitor on containers. Kubernetes you need to monitor hardware, operating system, serverless based applications. On serverless it's only the application that's your responsibility. Service discovery again, you have the different tools in continue the legacy tools also of course they are accepted using service mesh serverless you can do that based on APIs from a main point, like AWS for example. I think the most important thing to remember is that you still own the monitoring part. Nobody will do that for you in containers or in serverless. And that's an important point to remember when you're offloading things to the cloud provider. So what do we really need when we talk about serverless monitoring or modern cloud monitoring? First of all, we need to be able to identify and fix issues in minutes. And for this, because we have so many different services, we need somebody to connect the dots and make data bugging data available for us on demand. I'm going to show that in a second. In a quick demo we have hundreds of services we need to do distributed tracing, but it has to be automatic. I cannot chase after every new service that is popping up. And in the third point, we need to make sure that we're able to identify bottlenecks because there are so many potential bottlenecks in those environments. And as I mentioned, all of this need to be agentless and based on APIs and code libraries. So this is what we do at Lumigo. We basically take metrics, tracing and logs, and we connect the dots in order to make sure that you're able to understand when things are going wrong and be able to fix it. And I want to show you this in a very quick demo. So I have over here our demo environment, sorry, 1 second. So this is a Lumigo environment. Basically, this is a demo environment that is connected to AWS Wildride, a serverless environment. And I want to take you to one scenario that is very popular with our customers. So just refreshing the dashboard to make sure we will observe the last seven days instead of the last hour. That's basically taking us from like live monitoring when I want to have this as a dashboard that is kind of always open to something that is more of can investigation. What I want to show you over here is this is an example of what we have in terms of environment like invocation. What is the number of failures? What are the functions that fail the most? Where do I have latencies? Do I have call start? What are the main issues with call start? Same goes to cost analysis, slow APIs, timeouts, dashboard, showing you the out of the box, the main thing that you should care about in a serverless environment. And let's suppose we got an alert to petroduty from Lumigo about this failure. So if we want to understand what's going on over here or over here, we click on that specific service. This is a lambda, in order to start investigating what actually happens and what are the cases where this failed. So we can see that this lambda ran 7000 time in the last seven days, three almost 50% failures. And over here we have the actual invocations and the results we want to drill down into a specific failure to understand what happened. And this is where we move from just monitoring to debugging. And Lumigo builds the end to end story of this request, of the request that failed. A specific invocation failed within that request. And now Lumigo will show me the story of that request across all the different services. What we see over here is that this is the actual failure, the reason that we started, we got here because this lambda failed. At the same time, I can understand what is the customer facing API and decide whether this is critical to fix now or not. When I want to understand what happened, I can click on a specific service. And then Lumigo generates a lot of debugging information, post mortem, things like what was the stacked race, what was the parameters of the stacked race when it failed, what was the event that triggers this lambda environment? Variables, the logs, a lot of. I call it debugging heaven, because it's always there with all the information that you need in order to understand what happened and solve the issue. And you don't need to go across thousands of logs and try to find what you want. This is without any code changes, without the need to issue logs. Logs. You basically can go into every service and see what are the inputs and outputs of that service. So this is an event bridge. I can see the message that went to Eventbridge, I can look at Dynamodb and see the actual data. This is a query to Dynamodb. And this was the response, no response. And this is also true to external services like Twilio. So I can click on Twilio and see the request for Twilio and see the response coming from Twilio, sent a successful sms, this number, so forth, so on and so forth. And I can also see the specific logs of that specific request. Maybe I have million requests, but this one request is the one I want to see. It's like a story with 62 logs over here, starting from the very basic first authentication that was done over here, all the way to going across the different services. And I can look at this and read what's actually happening like a story across all the different services. Great. So going back to summarize where we are, a couple of takeaways. We talked about the five layers of kubernetes. You need to monitor all of them. Microservices requires distributed tracing, whether it's containers, whether it's cloud native, whether it's serverless. Emerging monitoring. There's an emerging monitoring challenge around tracing of managed services. We saw the DynamodB, the event bridge. How do I know, how do I trace across them? How do I know the messages that go through them? When I need to investigate. This is growing and you need to make sure this is covered in your environment. Use existing frameworks, open source. We talk about a couple of tools that are available, commercial or non commercial, but make sure you bake it in there. And serverless requires you to also be able to manage to understand managed distributed tracing. So serverless is not just lambdas, it's DynamoDB, it's API gateways, it's eventbridge, it's stripe, it's Twilio. Make sure that you are able to distribute to trace across those services. I want to thank you and if you have any questions, please feel free to reach out. This is all my details, my mail and my twitter. If you want to try out Lumigo, we have a free trial and a free tier. You're welcome to just go to Lumigo IO, click start a free trial and it's five minutes to connect the system. No code changes and everything is automated so you can have a full view of your environment with this. I thank you very much. And wish you a great week.
...

Erez Berkner

CEO @ Lumigo

Erez Berkner's LinkedIn account Erez Berkner's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)