Conf42 DevSecOps 2023 - Online

The Road Ahead: OpenTelemetry's Path to Observability's Future

Video size:

Abstract

In our tech-driven world, engineers often deal with alert overload due to complex & distributed systems. This talk explores the fundamental ideas of O11Y & OTel, which are important in modern engineering. The industry is actively embracing and contributing to standardised telemetry practices.

Summary

  • Siddharthakhare Khari is working as a technical account manager with Newrelic. After joining new Relic, he's more focused on mobile app observability. How industry is adopting opentelemetry and what is the future of openelemetry.
  • Opentelemetry is an incubating project of CNCF. It is formed by merging opensenses and opentracing. It has multiple set of APIs, libraries and integrations available. By 2025, 70% of the cloud native application monitoring will use open source instrumentation.
  • Open telemetry collector is perhaps one of the most exciting tools in opentelemetry. It's meant to be running as a standalone process, providing a central place to receive, process and export the data which we are collecting. It is completely vendor agnostic and support many common open formats for telemetry data.
  • Let's talk about how the industry is adopting opentelemetry. All the major cloud providers are adopting and contributing to the project. Only collecting the telemetry data is not useful. The instrumentation should be including the contextual data to make more sense.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hey everyone, thanks for joining. I'll be talking about the future of observabilitys following Opentelemetry's path. By end of this talk, you will learn about how opentelemetry can satisfy the need of observability. I'm Siddharthakhare Khari. I'm working as a technical account manager with Newrelic. Prior to joining Newrelic, I was working with Citrix as a software developer. I like working with mobile apps, especially enterprise mobile apps. And after joining new Relic, I'm more focused on mobile app observability. This is the agenda for today, where we will be talking about what and why observabilitys, what is opentelemetry and what are its core concepts? How industry is adopting opentelemetry and what is the future of opentelemetry. But before we start, let me clarify one question for you, which comes in everyone's mind when we talk about observability or monitoring. How many tools any company uses to collect the telemetry data? And the answer is somewhere around four to six tools. First, let's discuss about the difference between monitoring and observability. So traditional monitoring is all about the hiccups which you face in your system in day to day work. And mostly it's all about whether the service is red, or it's green, or it's up or down. And it can trigger some alerts around the response times, the application crashes, et cetera. However, observabilitys is a lot more than that, and it is based on three major pillars, which is metrics, logs and traces. So what is observabilitys? Observability is all about understanding the internal state of your system based upon the output which it generates. So this is something which everyone is familiar of where things works perfectly fine in your system. However, as you push it to production, it fails and that's where we consider it as an ops problem. In some scenarios, people even say that it's working fine in my container. Maybe you are not deploying the container correctly. Again, that is not the case. That's where observability comes to the rescue. And it can help different personas in your organizations. Let's take a look into it. So for developers, they can use observability to debug their code, to identify the performance bottlenecks, and to ensure that their features are working as expected in production. For DevOps, they can use observability to monitor their systems for health and performance, to identify and fix problems quickly, and to automate their deployments and performance operations for sres. They can use observability to manage the reliability of their systems. Product managers can use observability to understand how users are interacting with their products. They can use it to identify the areas of improvement and to make better decisions about future features and functionalities. Observability not only helps these individual personas, but it also helps you to run your business. Let's take a look about the background of opentelemetry and what it offers. Opentelemetry is an incubating project of CNCF. It is formed by merging opensenses and opentracing. So if you have used eager or zipkin, you have already experienced the flavor of opentracing. It has multiple set of APIs, libraries and integrations available which make it vendor agnostic so you don't have to dependent on any specific backend and more about all this observabilitys opentelemetry is setting a standard of how you should be collecting the telemetry data. Let's see about some of the features which are behind the rise of opentelemetry. First is ubiquity. So opentelemetry is designed to be highly accessible and commonly used across a wide range of programming languages, platforms and ecosystem. Second is its vendor neutral nature, where opentelemetry is intentionally vendor neutral and does not favor or promote any specific vendor. It is interoperable in nature, which means it has different libraries and sdks for each and every language, but with same specifications. And last but not the least, is configurable. So instrumentation can be done via automatic method or via manual method. You can leverage the sampling strategies, you can leverage exporters, you can leverage the context propagation and many more. And based upon the study of Gartner, it says that by 2025, 70% of the cloud native application monitoring will use open source instrumentation. Here you see the graph from CNCF project where Opentelemetry is the second most active project in CNCF space. First, one is obviously Kubernetes. Let's talk about the core concepts of opentelemetry. So opentelemetry is a lot and it is built on some specific building blocks. So with Opentelemetry the data is annotated. It depends on implementer to annotate the data in a meaningful way. It has certain specifications that software performs, for example HTTP calls, database operations, et cetera. So for all these operations there is a semantic convention. It has been providing the APIs and sdks with which you will be able to collect the data types for tracing metrics, logs, et cetera. It also offers automatic instrumentation and the last one is OTLP which is open telemetry line protocol which is used for sending the data to the backend observability platform of your choice where you can visualize the data, you can set alerts and many more things. Here what you see is the opentelemetry instrumentation way, right? So on the left what you see is automatic instrumentation where the number of lines of code is less. On the right you see the manual instrumentation where the number of lines are more. So it is always recommended that you should go with automatic instrumentation if it's a start of your journey with observability. Now once the data is instrumented, it is collected, and when the data is being collected, this is how it will look like. You will be able to get the deeper understanding about the application stack because when you instrument it, you will be building some blocks around it. Once you have that, you will be able to pinpoint the errors and even you will be able to understand where the problem relies. Now we have discussed about how the instrumentation works, what type of instrumentation we should go, but here comes the most important part which is open telemetry collector. We can consider the open telemetry collector as a superpower. So the opentelemetry collector is perhaps one of the most exciting tools in opentelemetry which it has to offer. It's meant to be running as a standalone process, providing a central place to receive, process and export the data which we are collecting. It's completely vendor agnostic and support many of the most common open formats for telemetry data. So here what you see on this slide is that the collector centers around three primary types of components. The first one is receiver for receiving the opentelemetry data. Second one is processor for processing the opentelemetry data, and finally exporters for exporting the telemetry data to the back end like new relic or any other observability backend. Just like the Opentelemetry sdks for each language, the collector is also designed to be extensible. So if you visit the opentelemetry's collector GitHub repository, you will find that there are already many components developed that you can use in your environment. So collector is not just the data exporter or the data middleman. The collector has all of these multiple components that can help you to do the filtering, the batching part, and even adding some of the attributes. And then these processes are the key parts of the whole collector process. In this process. If you are using the Prometheus and Grafana, you might be wondering what will happen to them. So Prometheus and Grafanas is also supported, but it's in an experimental phase so you can check their official documents. And before you start here, what you see is that opentelemetry is not just restricted to your application data. The collector can help you to scrape the metrics from your infrastructure. In this sample we are capturing the cpu, memory and the networking details from one of my infrastructure and you can just define what metrics you want to capture and this is how it will look like once the data is collected. Let's ideas deeper into sampling process where different sampling process are available. First one is head based, then we have tail based and we have probabilistic sampling. Head and tail based sampling are the commonly used samplings. Headbase sampling upfront samples, all the requests and the spans that are generated by the individual services. It takes the statistics of all the requests generated from services and keeps all the spans and it takes the decision at a very initial stage. The main issue with headbase sampling is that when the sampling decision is being made, the root span has a limited visibility and does not know what will happen in the future. With tailbase sampling, the sampling process happens at the end where it waits after receiving the first spans until the period of time to collect the spans for other services which has the same trace id. After all the elected spans are grouped together based on their trace id, iterates over to check for the error status and the duration of the spans. Based on those analysis, high value traces are selectively sent to the next process, such as your observability back end. So I'll show you the sample of how a tailbase sampling will look like. So this is a sample configuration where I am leveraging the tailbase sampling and I have multiple policies. One of such policy is only collect the trace which has a latency of 5000 milliseconds. If you see the output which it generates is very helpful because before leveraging the tailbase sampling, the throughput was very high and the data ingestion was very high. But as soon as I implemented the test policy two which you see around the latency, it dropped. So the answer is because the policy is only collecting the spans that took over 5 seconds to complete and because of which I'm able to save some cost of ingesting the data as well. So this is where how you can leverage the tailbase sampling process in your collector. Yamls this is what we call it as a probabilistic sampling where you can define the probability of how much percentage of trace you need and the configuration is as what it is showing here. Now we have understood a way of instrumenting the app. We have understood the way of sampling the traces. Now once the data is sampled, how you can export that data, right? So that's where you see these three examples where first, I have used a zipkin as an exporter where I am exporting the telemetry data with the help of Zipkin. Second, I have leveraged Prometheus to extract that data and in the third example I have used Newrelic to extract the data where I am leveraging new relics, OTLP URL to extract the data and these are the attributes which it requires. Let's talk about how the industry is adopting opentelemetry. So here you see the top industry adopters. These are some of the big names which are leveraging opentelemetry at a production scale. Let me share one such success story where one of the industry adopter pairs open standard with observabilitys and that industry's adopter is Skyscanner. The results are really great. They were able to retire twelve internal and external systems. They were able to reduce approximately 15 minutes on each merge request for mobile build pipeline and they were able to create slos from any metric event or telemetry data, regardless of whether it comes from the back end or the front end. Let's talk about the future of opentelemetry. There are lots of contributions which are happening throughout in the opentelemetry space. You can look at this particular table where you will find the detail about what type of telemetry data is stable with respect to the language. All the major cloud providers are adopting and contributing the opentelemetry project. Amazon has built their Amazon distro for opentelemetry. You can also call it as ADOT which can be used as a lambda layer. Microsoft has natively opentelemetry capabilities in. Net framework and supports opentelemetry tracing on Azure. Newrelic is one of the proud enabler and contributor for Opentelemetry and fully compatible with Opentelemetry line protocol or OTLP. Kubernetes and containers are natively supported and many companies are building native integrations to support and export telemetry in opentelemetry format. Even next, JS, which is a web framework, has included a custom SDK to export Opentelemetry out of the box. Let's recap what we have discussed so there is no doubt that Opentelemetry is growing at a rapid pace. We have to be sure about our maturity before adopting opentelemetry. That is what type of telemetry data we need. Only collecting the telemetry data is not useful. The instrumentation should be including the contextual data to make more sense. With Opentelemetry standard, it's easy to gather the telemetry data. If you invest in Opentelemetry, it will help you to run your business on data and not just on opinion. That is why we say load data. Eject opinion with this us. Thanks for attending my session. These are my credentials. If you want to talk about Opentelemetry observability or mobile app, I'm happy to connect and answer all your queries. Once again, thank you.
...

Siddhartha Khare

Technical Account Manager @ New Relic

Siddhartha Khare's LinkedIn account Siddhartha Khare's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)