Importance of Observability in Platform Engineering

Video size:

Abstract

By correlating metrics, logs, and traces, platform teams can optimize resource allocation, accelerate troubleshooting, enhance developer experience, improve reliability, and make data-driven decisions for platform evolution.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hey y'all, hope you're doing great. Welcome to Conf42 Platform Engineering, I'm Jony Paul. Today, we will deep dive in the importance, the role of observability in platform engineering. Thanks for tuning in. I'm Jony Paul. So here is me, Neerja, a developer advocate at Middleware and Observatory Classroom, building various DevOps communities, running communities like Google Cloud, CNCF, Docker, managing more than 15 plus hackathons, So yeah, let's go ahead. So what's in store for us? Why absolutely matters in a platform engineering. So what are the key benefits? First is, optimizes resource allocation. It will drive the event in the sense like, is there a resource scarcity? Or the resources are abundant? It can easily analyze from the logs, from the matrices. AXS accelerates troubleshooting. From logs, we can easily, not easily, but yeah, we can get a lot of things about troubleshooting, how the error will be going. AXS enhances developer experience, improves reliability, enables data driven decision making. Decision making always happens from understanding the data, and data plays a quite important role. Of our next future predictions, so we're going to discuss all of these points ahead in our next presentation slides so core objectives Of how's it ready? Let's understand it first and then we can go ahead the core objectives are three major Log management, so we collect different type of logs application logs Service logs different sort of logs to understand the performance impact of the behavior and All services, which are going on gathering mattresses, even just the hand performance here and to try to understand the metrics, isn't it? The graph format will understand it in a better way. So if you want to, and troubleshoot the applications proactively, then your application will have less downtime and will have higher performance but more efficiency. That's the general thing, but yeah, elementary always helps you to improve your application. It is tracing. Traces, track the journey, request through services. It can help to understand the bottlenecks. The issues which you are laying down behind your performance, which you are downgrading. So it will help you to understand what could be the actual reason of that. Let's understand the role of the observability first versus optimized resource allocation. So Correlated Usage Data works with system demand to scale a resource proactively and efficiently. So we have data of a workplace application of how the application has worked or let's say how a single piece of product has been sold. So in that manner we can understand the demand and in the same manner we can understand how many resources are required more in the future. In that sense. Let's take one example. This proactive scaling. By monitoring CPU and everything, teams can understand in which sort of scenario they need to scale up the resources. Scale down the resources and when is the peak time and when is the low time where they need to upscale. Teams can easily understand from evaluating from observing their applications. Next is cost efficiency. Azure AD literally helps to reduce your cost. Because some of their resources are under provision and if you don't know That only 10 20% of the resources are used and you are being 70, 80% more money than anyone. It can definitely, you can definitely understand and from the monitoring mattresses and everything that you need to downgrade your resources and that's not, which can impact you across billing a lot. Next is extra troubleshooting. So we can quickly identify and resolve issues by logs, traces, patterns, effectively. So for root cause analysis, one of the good examples is root cause analysis. So for that, engineers can trace issues back to their origin, whether it's misconfiguration. The goal, but if the Monitoring of the ready is not setup, then it will be very hard to get what's the actual error or what's the actual problem they are facing. So it eventually helps. Next is incident response, so real time alerts so if they are using Some of the platforms for us to bring in there, it can differently alert them and so they can have a lower downtime and optimize their product or optimize their services in that particular time. Next is enhanced developer experience. The developer can understand in that scenario what should be the better thing, how the performance of the applications is better in production or is helping them understand how changes, how some changes in new applications could affect the sales or anything, sales or services or anything. From monitoring, from process, from logs, it is definitely helpful for the end. Because it will lessen the time for troubleshooting and more time for delivery of new features. It has improved reliability, continuously monitored, refined platform components. For better uptime and stability. For uptime monitoring, other tools, tech systems, and it immediately kills the alert. So if you have proper auxiliary setup, you can have a message on Slack, message on emails, message on Teams, anywhere. Most of the auxiliary platform deals with this. So if you got the alert, then you can definitely try troubleshoot and get back your application in live. Anyway, rejection. So it was auxiliary solutions use machine learning. We will definitely try to get this in our next slides, where we will discuss what are the future trends in obsolete. Yeah, ML, EI, definitely help you understand the anomaly detection. Yeah, that's a great point. Let's have a lean example. On a shopping sale, if your bank is your dad's wallet, Then you can, nothing can stop you to buy everything. Learning it was a black friday, most of the people shopped, got on the shop and tried to collect everything because there is a lot of sale, huge discounts. People collect a lot of things, people purchase a lot of things. So to understand that sort of scenario, If you are monitoring everything, so in this year, this sort of thing happened like X, your sales were there compared to normal day. So you can understand like how many times. So it will eventually have you to understand how people, which product, which thing is people have purchased and that dilution. So it will eventually help you to understand the analysis and understand and do the analysis for the next time for future why we do analysis and monitoring to improve our application and our future prediction and in what manner it will drive the train how to retrain so always there are some challenges so let's see what are the challenges Data overload, because when we collect data, we don't know what data is important, what and not to. So we always try to understand which data is important for us. We don't capture all the data and make a stack of everything. Next is integration complexity, because people nowadays have multi cloud setup and using different things. So it is definitely hard to have an observatory. So to integrate them, it is very difficult. The skill gaps are so big. People need to understand to implement EIMO and their operating. So what are the solutions? So use automated tools to filter and monitor the data. Because as I already told, you don't need to create stacks of data. Stacks of logs. You need to understand that data is important. But which data? Proper data. Not every data. EIMO. We need to understand, we need to set up rules, flag, and slate, those logs which are really important for us, which are like secure to bridge users, and is significant, which impacts our performance. We need to understand that sort of logs, and we need to put alerts, filters over them. This is selective data collection. We will only select captured data, so what's our business object, and that's the sort of thing we will collect the data. So we always need a unified platform because we have a multi cloud setup. We need observating where we go for logs with other environment, traces with other environment, monitoring with other environment, and it always create chaos. We need a single platform where logs, traces, monitoring, everything could be possible. Even downloads with different integrations with different third party applications are always possible. For example, you can see in middleware, so in middleware, you can connect with many databases, every cloud provider, seamlessly, you don't need to put your lot of efforts to it. So always a unified object platform is required. What's the future of obsolete and platforms? AI amelioration because always, as I previously discussed, AI plays an important role for troubleshooting. We m like that general error. Using AI to make our absolutely our analysis, more pro, more impactful, that can eventually help us to detect many things and eventually have to make future decisions, is already extending ability to edge computing. That's really a tough. Very wide topic and eventually in next few years, it will be a boom. Open standards, the growth of production of OpenTelemetry, new releases, new features every time OpenTelemetry have. And similar from some framework for free is giving a lot of edge to people in our space. Let's discuss some of the best practices for implementing observability. Start with clear objectives. Implement end to end tracing. Leverage automation. Adopt a continuous improvement mindset. Promote collection of observability in your company, in your space. Regulate validated objective tools and processes. So it's an end to end observatory with class measuring what's important and you need to collaborate both of thing to improve your performance, efficiency of your app. With me for any queries, feel free to reach out to me. Any suggestions and feedback will always appreciate it. Thank you. Have a nice day ahead.

See all 50 talks at this event!

Conf42 Platform Engineering 2024 - Online

September 05 2024 - premiere 5PM GMT

Importance of Observability in Platform Engineering

Video size:

Abstract

Summary

Transcript

Neel Shah

Developer Advocate @ Middleware

Join the community!

Featured event

2025

2024

Info

Conf42 Platform Engineering 2024 - Online

September 05 2024 - premiere 5PM GMT

Importance of Observability in Platform Engineering

Video size:

Abstract

Summary

Transcript

Neel Shah

Developer Advocate @ Middleware

Join the community!