Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey everyone, thanks for joining. I'll be talking about
the future of observabilitys following Opentelemetry's path.
By end of this talk, you will learn about how opentelemetry
can satisfy the need of observability. I'm Siddharthakhare
Khari. I'm working as a technical account manager with Newrelic.
Prior to joining Newrelic, I was working with Citrix as a software
developer. I like working with mobile apps,
especially enterprise mobile apps. And after joining new
Relic, I'm more focused on mobile app observability.
This is the agenda for today, where we will be talking about
what and why observabilitys, what is opentelemetry
and what are its core concepts? How industry is
adopting opentelemetry and what is the future of opentelemetry.
But before we start, let me clarify
one question for you, which comes in everyone's mind when we talk
about observability or monitoring.
How many tools any company uses to collect the telemetry
data? And the answer is somewhere around four to six tools.
First, let's discuss about the difference between
monitoring and observability. So traditional monitoring
is all about the hiccups which you face in your system in day
to day work. And mostly it's all about whether
the service is red, or it's green, or it's up or
down. And it can trigger some alerts
around the response times, the application crashes,
et cetera. However, observabilitys is a
lot more than that, and it is based on three major
pillars, which is metrics, logs and traces.
So what is observabilitys? Observability is
all about understanding the internal state of your system
based upon the output which it generates.
So this is something which everyone is familiar of where
things works perfectly fine in your system.
However, as you push it to production,
it fails and that's where we consider it as
an ops problem. In some scenarios,
people even say that it's working fine in my container.
Maybe you are not deploying the container correctly. Again,
that is not the case. That's where observability comes
to the rescue. And it can help different
personas in your organizations. Let's take a look into
it. So for developers, they can use observability to debug
their code, to identify the performance bottlenecks, and to ensure
that their features are working as expected in production.
For DevOps, they can use observability to monitor their
systems for health and performance, to identify
and fix problems quickly, and to automate their
deployments and performance operations for sres.
They can use observability to manage the reliability of
their systems. Product managers can use observability
to understand how users are interacting with their products.
They can use it to identify the areas of improvement
and to make better decisions about future features and
functionalities. Observability not
only helps these individual personas, but it also
helps you to run your business. Let's take a
look about the background of opentelemetry and what
it offers. Opentelemetry is an incubating
project of CNCF. It is formed by
merging opensenses and opentracing. So if you have
used eager or zipkin, you have already experienced
the flavor of opentracing. It has multiple
set of APIs, libraries and integrations
available which make it vendor agnostic
so you don't have to dependent on any specific backend
and more about all this observabilitys opentelemetry
is setting a standard of how you should be collecting
the telemetry data. Let's see about
some of the features which are behind the rise of
opentelemetry. First is ubiquity. So opentelemetry
is designed to be highly accessible and commonly used
across a wide range of programming languages, platforms and ecosystem.
Second is its vendor neutral nature, where opentelemetry
is intentionally vendor neutral and does
not favor or promote any specific vendor.
It is interoperable in nature, which means it
has different libraries and sdks for
each and every language, but with same specifications.
And last but not the least, is configurable.
So instrumentation can be done via
automatic method or via manual method. You can leverage
the sampling strategies, you can leverage exporters,
you can leverage the context propagation and many more.
And based upon the study of Gartner, it says that
by 2025, 70% of the cloud native application monitoring
will use open source instrumentation.
Here you see the graph from CNCF
project where Opentelemetry is the second most
active project in CNCF space. First, one is obviously
Kubernetes. Let's talk about the
core concepts of opentelemetry. So opentelemetry
is a lot and it is built
on some specific building blocks. So with Opentelemetry
the data is annotated. It depends on implementer
to annotate the data in a meaningful way. It has certain
specifications that software performs,
for example HTTP calls, database operations,
et cetera. So for all these operations there is a
semantic convention. It has been providing
the APIs and sdks with which you will be
able to collect the data types
for tracing metrics, logs,
et cetera. It also offers automatic instrumentation
and the last one is OTLP which is open telemetry
line protocol which is used for sending the data to
the backend observability platform of your choice where
you can visualize the data, you can set alerts and
many more things. Here what you see is
the opentelemetry instrumentation way,
right? So on the left what you
see is automatic instrumentation where the number
of lines of code is less. On the right you see
the manual instrumentation where the number of lines are
more. So it is always recommended that you
should go with automatic instrumentation if it's a start
of your journey with observability.
Now once the data is instrumented, it is collected,
and when the data is being collected, this is how it
will look like. You will be able to get the deeper
understanding about the application stack because
when you instrument it, you will be building some blocks around
it. Once you have that, you will be able to pinpoint
the errors and even you will be
able to understand where the problem relies.
Now we have discussed about how the instrumentation
works, what type of instrumentation we should go,
but here comes the most important part
which is open telemetry collector. We can consider the
open telemetry collector as a superpower.
So the opentelemetry collector is perhaps one of the
most exciting tools in opentelemetry which
it has to offer. It's meant to be running as a standalone
process, providing a central place to receive,
process and export the data which we are collecting.
It's completely vendor agnostic and support many of
the most common open formats for telemetry
data. So here what you see on
this slide is that the collector centers
around three primary types of components. The first one
is receiver for receiving the opentelemetry data.
Second one is processor for processing the opentelemetry data,
and finally exporters for exporting the telemetry
data to the back end like new relic or
any other observability backend.
Just like the Opentelemetry sdks for
each language, the collector is also designed to be extensible.
So if you visit the opentelemetry's collector GitHub
repository, you will find that there are already many components
developed that you can use in your environment.
So collector is not just the data exporter
or the data middleman. The collector has all
of these multiple components that can help you to
do the filtering, the batching part,
and even adding some of the attributes.
And then these processes are the key
parts of the whole collector process.
In this process. If you are using the Prometheus
and Grafana, you might be wondering what will happen to them.
So Prometheus and Grafanas is also supported,
but it's in an experimental phase so you can
check their official documents. And before
you start here, what you see is that
opentelemetry is not just restricted to your application
data. The collector can help you to scrape the
metrics from your infrastructure. In this sample
we are capturing the cpu, memory and the networking
details from one of my infrastructure and
you can just define what metrics you want to capture
and this is how it will look like once the data is collected.
Let's ideas deeper into sampling process
where different sampling process are available.
First one is head based, then we have tail based and
we have probabilistic sampling. Head and tail based
sampling are the commonly used samplings.
Headbase sampling upfront samples, all the requests
and the spans that are generated by the individual services.
It takes the statistics of all the requests
generated from services and keeps all the spans
and it takes the decision at a very initial
stage. The main issue with headbase
sampling is that when the sampling decision is being made,
the root span has a limited visibility and does not
know what will happen in the future. With tailbase sampling,
the sampling process happens at
the end where it waits after receiving
the first spans until the period of time to
collect the spans for other services which
has the same trace id. After all the
elected spans are grouped together based on
their trace id, iterates over to check for
the error status and the duration of the spans. Based on
those analysis, high value traces are selectively sent
to the next process, such as your
observability back end.
So I'll show you the sample of how a tailbase
sampling will look like. So this is a sample configuration
where I am leveraging the tailbase
sampling and I have multiple policies. One of such
policy is only collect the trace which has a latency of
5000 milliseconds. If you see the output
which it generates is very helpful
because before leveraging the tailbase sampling,
the throughput was very high and the data
ingestion was very high. But as
soon as I implemented the test policy two which you see
around the latency,
it dropped. So the answer is because the policy
is only collecting the spans that took over 5
seconds to complete and because of
which I'm able to save some cost of ingesting the data
as well. So this is where how you can leverage
the tailbase sampling process in your collector. Yamls this
is what we call it as a probabilistic sampling
where you can define the probability of how
much percentage of trace you need and
the configuration is as what it is
showing here. Now we have understood a way of
instrumenting the app. We have understood the way of sampling
the traces. Now once the data is sampled, how you
can export that data, right? So that's where
you see these three examples where first,
I have used a zipkin as an
exporter where I am exporting the telemetry
data with the help of Zipkin. Second,
I have leveraged Prometheus to extract that data
and in the third example I have used
Newrelic to extract the data where I am leveraging
new relics, OTLP URL to extract the data and
these are the attributes which it requires.
Let's talk about how the industry is
adopting opentelemetry. So here you
see the top industry adopters. These are
some of the big names which are leveraging opentelemetry
at a production scale. Let me share one
such success story where one of the industry adopter pairs open
standard with observabilitys and that industry's adopter
is Skyscanner. The results are really great.
They were able to retire twelve internal and external systems.
They were able to reduce approximately 15 minutes on
each merge request for mobile build pipeline and
they were able to create slos from any
metric event or telemetry data, regardless of
whether it comes from the back end or the front end.
Let's talk about the future of opentelemetry.
There are lots of contributions which are happening
throughout in the opentelemetry space. You can
look at this particular table where you will find the detail about
what type of telemetry data is stable with respect
to the language. All the major cloud
providers are adopting and contributing the opentelemetry project.
Amazon has built their Amazon distro for opentelemetry.
You can also call it as ADOT which can be used as
a lambda layer. Microsoft has natively opentelemetry
capabilities in. Net framework and supports opentelemetry
tracing on Azure. Newrelic is one of the proud
enabler and contributor for Opentelemetry and fully compatible
with Opentelemetry line protocol or OTLP.
Kubernetes and containers are natively supported and
many companies are building native integrations to support and export
telemetry in opentelemetry format.
Even next, JS, which is a web framework,
has included a custom SDK to export Opentelemetry
out of the box. Let's recap what
we have discussed so there is no doubt that Opentelemetry is
growing at a rapid pace. We have to be sure
about our maturity before adopting opentelemetry.
That is what type of telemetry data we need.
Only collecting the telemetry data is not useful.
The instrumentation should be including the contextual data
to make more sense. With Opentelemetry
standard, it's easy to gather the telemetry data.
If you invest in Opentelemetry, it will help
you to run your business on data and not just on
opinion. That is why we say load data.
Eject opinion with this us. Thanks for
attending my session. These are my credentials. If you want to talk
about Opentelemetry observability or mobile app,
I'm happy to connect and answer all your queries. Once again,
thank you.