Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              Let me give you an overview about open telemetry. Here's everything
            
            
            
              that I'll be walking you through today. I'll give you a
            
            
            
              brief background about open telemetry, the core concepts,
            
            
            
              the building blocks and architecture of open telemetry.
            
            
            
              We'll quickly dive into the instrumentation part where we'll
            
            
            
              see the code and we'll start instrumenting with
            
            
            
              traces, metrics and logs for a simple node JS application
            
            
            
              using the co op framework. It's a very basic application where
            
            
            
              we try to cover all these concepts and try to extract telemetry that
            
            
            
              makes sense to us. Lastly, I'll also cover about open element
            
            
            
              collector how you can get started, how it's beneficial
            
            
            
              and when you should be using that. And then again,
            
            
            
              we have heard from developers across the world it works in my machine.
            
            
            
              It's an Ops problem, Ops complaining that it's the app problem.
            
            
            
              Today I've even heard people complaining that it's my container
            
            
            
              is working fine, you're just not deploying my container correctly.
            
            
            
              Let's see how open telemetry or observability helps resolve this
            
            
            
              conflict in today's world. A quick background about
            
            
            
              open telemetry open telemetry today is an incubating
            
            
            
              project in CNCF landscape. There has already
            
            
            
              been a proposal made to make the project to a
            
            
            
              graduate state. It was originally formed in 2019
            
            
            
              by a merger of two famous projects, open tracing and mobile census.
            
            
            
              Open tracing was developed by Uber to monitor their
            
            
            
              terms of microservices and open sensors was developed by
            
            
            
              Google for the purpose of monitoring their microservices
            
            
            
              and telecommetrics. Some of the core goals of open telemetry
            
            
            
              is to provide a set of APIs, libraries,
            
            
            
              integrations to connect the telemetry from across
            
            
            
              your system and services.
            
            
            
              It helps you set the standard from to collect the telemetry from
            
            
            
              all your application and infrastructure.
            
            
            
              One of the best part about open telemetry is how it provides you
            
            
            
              an option to set all this telemetry collector to
            
            
            
              a choice of your observability back in, which means you're not logged
            
            
            
              into a single vendor or any specific tool.
            
            
            
              Regardless of how you instrument your application,
            
            
            
              services and infrastructure. With open telemetry, you are free
            
            
            
              to choose where you want to store your telemetry in house,
            
            
            
              third party, or a combination of both. You'll see in
            
            
            
              this in this chart how quickly open telemetry has risen.
            
            
            
              You'll see that today. Open telemetry is second
            
            
            
              most fastest growing project in CNCF space.
            
            
            
              It is right behind kubernetes with the number of contributions
            
            
            
              and adoption. This is because there is a strong
            
            
            
              interest in modern observability. There was
            
            
            
              a report by Gardner in 2022 that
            
            
            
              a lot of companies are looking to enhance open standards,
            
            
            
              which is what open telemetry, EBTF and Grafana are working towards.
            
            
            
              If you want to read more about it, you can scan the QR code on
            
            
            
              top right? Let's see some of the core concepts and the building blocks
            
            
            
              of open telemetry open telemetry is
            
            
            
              basically a specification. It's not as one specific
            
            
            
              framework language or an SDK.
            
            
            
              Open provides the specification with which each individual
            
            
            
              languages and frameworks develop their own set
            
            
            
              of SDKs. Now, these SDKs are built on top
            
            
            
              of the API specification provided by open telemetry.
            
            
            
              These APIs are based on tracing metrics and logging.
            
            
            
              All these APIs follow the same semantic convention across
            
            
            
              so that any language or framework that are being built for
            
            
            
              these using open telemetry remains standard.
            
            
            
              And today you might instrument Java application
            
            
            
              and tomorrow you have to instrument bay with
            
            
            
              that. Having the same specification and semantics.
            
            
            
              You do not have to worry or revisit the documentation
            
            
            
              or reinvent the vein often. Most SDKs
            
            
            
              today provide you with the option of automatic instrumentation.
            
            
            
              For example, no JS provides you an option of automatic
            
            
            
              instrumentation with libraries like Express core,
            
            
            
              MySQL, and any other common framework that are
            
            
            
              being used with node JS. We'll see about that in some
            
            
            
              time when we get into the hands on part of that.
            
            
            
              Lastly, one of the important protocols and part of open telemetry is open
            
            
            
              telemetry protocol. This protocol is used to
            
            
            
              send all the telemetry that is collected from your applications,
            
            
            
              infrastructure or services to the choice of your back end.
            
            
            
              OTLP works on two famous protocols.
            
            
            
              One is HTTP or GrPC.
            
            
            
              Depending on your system architecture or your requirements,
            
            
            
              you can choose to use either, or you can choose to
            
            
            
              use both. Let's quickly get into the hands on
            
            
            
              part and see how we can get started with instrumenting. Simple service
            
            
            
              as no. J's, the conventions remain same across
            
            
            
              other languages. The APIs and APIs are
            
            
            
              similar. Only thing that will change is the packages
            
            
            
              and some SDKs. Here's a very simple application
            
            
            
              which is built on top of known express using the core framework.
            
            
            
              Core is a very simple light framework similar to
            
            
            
              Express, which helps you write rest APIs.
            
            
            
              Quickly we'll get into each of the packages of open telemetry
            
            
            
              that we'll be using. Let's start with the tracing
            
            
            
              and automatic instrumentation with node for node JS,
            
            
            
              how we export that telemetry to our collector and
            
            
            
              moving towards metrics and
            
            
            
              combining all these three, sending it to collector and how
            
            
            
              collector exports all these metrics to Neuralick
            
            
            
              new relic is one of our observability backends.
            
            
            
              Neuralik helps you provide the contextual information by
            
            
            
              stitching together all the telemetry exported from the collector.
            
            
            
              This is a very simple application as I mentioned. You'll see
            
            
            
              that there's nothing much. It's just a very basic application
            
            
            
              which has a certain endpoint. Here I have
            
            
            
              at least four API endpoints which is one root path,
            
            
            
              a post request, and another request with get which
            
            
            
              accepts certain parameters. Each of these requests will be automatically
            
            
            
              capturing the traces with by using the open elementary
            
            
            
              SDKs. Now to start with the safest
            
            
            
              option and the easiest option that you can opt
            
            
            
              to get started with open telemetry is automatic instrumentation.
            
            
            
              We will not be modified anything from in the source code of this core.
            
            
            
              Instead, the recommendation from open telemetry for node JS
            
            
            
              is that you create a separate wrapper file which will
            
            
            
              be the primary module for you to start your Nord Express applications.
            
            
            
              We will start with setting up this file and adding all
            
            
            
              the packages. I'll walk you through about the details of each
            
            
            
              package that we are using and the topic that we will be focusing on.
            
            
            
              Firstly, we'll start with the tracing. For that we'll be focusing on
            
            
            
              a couple of packages which are automatic instrumentation for node
            
            
            
              SDK trace node SDK trace base
            
            
            
              and if you and you need to focus on
            
            
            
              what we are importing from each of these each of these different packages
            
            
            
              for the automatic instrumentation, it provides us an API called
            
            
            
              Get Node auto instrumentation which helps you capture
            
            
            
              the telemetry automatically from all of your node js and underlying
            
            
            
              libraries. There are some conventions for you to set up
            
            
            
              your open telemetry service, right? For that we'll use some helper packages
            
            
            
              like semantic convention resources which helps us
            
            
            
              configure our application name and other attributes of our
            
            
            
              application correctly. Let me quickly scroll down to the part
            
            
            
              where we are setting up the tracing for our application.
            
            
            
              We'll ignore everything else that's configured for now and point
            
            
            
              you towards what is important for you to get started quickly.
            
            
            
              Firstly, we require a tracer provider.
            
            
            
              A tracer provider is an API which will register
            
            
            
              your open register the application with the open telemetry
            
            
            
              API. Now here is where we provide our resource.
            
            
            
              Now this is the resource where we have set up our application name.
            
            
            
              This is the most basic configuration that we are setting up here.
            
            
            
              What we'll do is we'll just add our resource name which is will
            
            
            
              be our open elementary service name. Once we
            
            
            
              have added our name, we can set the configuration of how frequently
            
            
            
              we want to flush our traces. Flush basically
            
            
            
              tells the SDK how frequently you want the telemetry
            
            
            
              to be sent out. Once we have the provider API
            
            
            
              configured, we have to add a span processor. Now traces
            
            
            
              are built of multiple spans. Each span
            
            
            
              is a is an operation within your application that
            
            
            
              can pull information about its execution period, what was
            
            
            
              the function, and anything that happened in the specific operation
            
            
            
              like error or exception. Span contains all that information and
            
            
            
              building stitching all these spans is what is called as a straight
            
            
            
              as a trace. Now in this trace provider we
            
            
            
              add a span processor. A span processor basically sets
            
            
            
              up an information. A span processor
            
            
            
              basically sets up and tells the SDK how each of
            
            
            
              these spans should be processed from this application. For this
            
            
            
              particular example, we'll be using the batch band processor. It is
            
            
            
              also a recommended processor so that there is not much
            
            
            
              frequent there is not much frequent export
            
            
            
              so that there is not much operational load on SDK
            
            
            
              for processing the span. The batch span takes
            
            
            
              a number takes a few optional settings which are default.
            
            
            
              All the values that you see here are the default values. You can increase or
            
            
            
              reduce this as per your requirements. Basically batch plan
            
            
            
              processor collects all your span and processes them in a
            
            
            
              batch. It takes another parameter which is
            
            
            
              an exporter which says where all these process spans
            
            
            
              have to be exported to. For this one I've configured
            
            
            
              it to a simple character. Character is will be running in my
            
            
            
              local setup in a container. I talk about container towards the
            
            
            
              very end. Once we have instrumented our application covering
            
            
            
              the topics of traces, metrics and logs,
            
            
            
              all of these will be exported to our collector and our collector
            
            
            
              will be exporting it to other to our observability backend which
            
            
            
              will be neurofic. Now in the collector there's only one configuration that's
            
            
            
              required because the collector is running locally. A simple localhost URL
            
            
            
              which is the default URL for open telemetry collector traces
            
            
            
              API and once we have done that, added our batch span processor,
            
            
            
              we can optionally register certain propagators.
            
            
            
              Now in the same trace provider will register w
            
            
            
              three c baggage propagator and trace context propagator.
            
            
            
              These propagators helps us find the origin and stitch the
            
            
            
              request that is hopping through multiple services.
            
            
            
              I'll show this in example of what it looks like once we have included these
            
            
            
              propagators. These basically helps you give you the overview and
            
            
            
              complete picture of how many different services your request
            
            
            
              has opt on and what was the operation or what was
            
            
            
              the problem at particular specific service that happened and
            
            
            
              helps you capture that information by stitching it all together.
            
            
            
              Once we have configured everything for our trace provider, we need
            
            
            
              to register our instrumentations.
            
            
            
              Register instrumentation is part of a open
            
            
            
              telemetry instrumentation library and you can import
            
            
            
              that and start adding our instrumentation.
            
            
            
              Basically, register instrumentation tells you what
            
            
            
              what is that you want to focus on this application's
            
            
            
              instrumentation. First thing we need to provide it provider
            
            
            
              for us. The tracer provider is our trace provider. That which is
            
            
            
              configured and in the instrumentations list,
            
            
            
              which is an array, will provide everything that we want to focus on.
            
            
            
              The get known auto instrumentation library has tons
            
            
            
              of libraries which can automatically capture
            
            
            
              metrics and traces from your application. I do not want
            
            
            
              to capture any of the file system operations that node Js or my
            
            
            
              Goa framework or co op framework does
            
            
            
              any operations on. I do want to capture
            
            
            
              that anything that's happening with respect to core framework
            
            
            
              be captured and in the same auto
            
            
            
              get node auto instrumentation library instrumentation and
            
            
            
              in the same get node auto instrumentation configuration
            
            
            
              you can add much more. For example, if you have my
            
            
            
              sequel or anything, you have tons of different libraries
            
            
            
              which are prepackaged and all you have to do is add it to this
            
            
            
              register instrumentation list and it will start capturing that information.
            
            
            
              Now we'll focus on this part later. Once we go through the
            
            
            
              logs part, this specific configuration will be focused focusing
            
            
            
              on decorating our logs. Now, this is all the
            
            
            
              configuration that's required for you to start capturing your
            
            
            
              traces from your node application. Let's quickly see
            
            
            
              in the console window what the traces looks like once we
            
            
            
              start instrumenting. Once we start utilizing
            
            
            
              our wrapper with our node application. Now in
            
            
            
              my terminal, before I start my application, I'm busking a few environment variables.
            
            
            
              These are the two environment variables which are helper,
            
            
            
              which are basically helper configurations. For me to start this application,
            
            
            
              the hotel service name is what will be referred by the wrapper,
            
            
            
              which is going to provide you a name of that specific service.
            
            
            
              And the portal log level basically helps us debug all our
            
            
            
              configuration that we just did in our wrapper file.
            
            
            
              This particular command basically tells node that before loading
            
            
            
              the main file, which is our index js,
            
            
            
              load the portal wrapper and then load the index js.
            
            
            
              This basically loads hotel wrapper as the primary file and then
            
            
            
              executes our index js. This way we are able to capture
            
            
            
              the metrics from the start of our application.
            
            
            
              Now, my application is running successfully on port 3000.
            
            
            
              Now this is a typical behavior of any basic Nord express or
            
            
            
              node core of node core service.
            
            
            
              Let me add another send some request to
            
            
            
              this particular service that is being run. I hit the roots
            
            
            
              in I hit the root of this specific API and
            
            
            
              you see I hit the root API of this service
            
            
            
              that's running and you'll see that my API
            
            
            
              responded and got executed. I got a console log
            
            
            
              from my application which is saying my service name, what is
            
            
            
              the host name of where it is running, what is the message and the
            
            
            
              timestamp. This is a typical logging of console
            
            
            
              dot log or any logging library that you are using.
            
            
            
              For this instance, I'm using the bunion library for my logging application
            
            
            
              and it adds certain attributes. The rest
            
            
            
              of the output that you are seeing is not from the API,
            
            
            
              but it is actually from open territories.
            
            
            
              Debug logs this is a typical trace that is,
            
            
            
              that gets exported and this is, these are the attributes
            
            
            
              that are automatically attached. You'll notice the trace id and span
            
            
            
              id that got attached to this specific resource.
            
            
            
              Let me call on the API now. You'll see I got
            
            
            
              the response here and the debug output
            
            
            
              from my orbital metric. The SDK
            
            
            
              prints out, the SDK prints out the debug
            
            
            
              logs for all of the requests that are coming in now.
            
            
            
              Be mindful that we did not modify anything in our actual source code,
            
            
            
              which is our index J's. All we have done is added a wrapper
            
            
            
              wrapper around it and all these attributes are being captured
            
            
            
              by SDK using the automatic instrumentation.
            
            
            
              I'll start my application with certain environment variables that I require
            
            
            
              for my hotel SDK, one of which is hotel service name.
            
            
            
              Auto service name is basically telling what the service of my
            
            
            
              name should be when I start executing this or export it to
            
            
            
              any observatory backlight. The other variable that I have here
            
            
            
              is auto log level. This helps me debug any of the
            
            
            
              problems that are occurring in the hotel configuration
            
            
            
              as part of the wrapper file. And the command here just basically
            
            
            
              tells node to load the wrapper file before loading the main module, which is
            
            
            
              the index js. Let's execute this file and
            
            
            
              see what happens. You'll see that there are a lot of debug
            
            
            
              statements that got printed. This is all because we
            
            
            
              have set the log level to debug. It says that it's trying to
            
            
            
              load the instrumentation for all these libraries, but there is not many,
            
            
            
              there are not many libraries that it's found. Only the library that it finds
            
            
            
              is for Node J's module and HTTP
            
            
            
              and also for core, and it's applying patch for that.
            
            
            
              Basically the libraries which are pre built as part of automatic instrumentation,
            
            
            
              it patches onto that. So any requests that are going or any operation
            
            
            
              that have been made from these libraries or any operations that
            
            
            
              are happening as part of these libraries are captured.
            
            
            
              Now you'll see that there are a couple of libraries which is bunny are
            
            
            
              being also been patched. We can do it how the logging works,
            
            
            
              but for now the logging is disabled. Once we have enabled
            
            
            
              that, you'll see that the middleware framework is being patch,
            
            
            
              which is our core, and some more
            
            
            
              debugging statements that are being generated. Let's quickly scroll
            
            
            
              down and we make a request to our service here.
            
            
            
              What I'll do is make a simple call to the root API
            
            
            
              that are configured. It's just going to return a simple hello world hello
            
            
            
              world response and let's see how the trace is being generated
            
            
            
              from open telemetry. So this is my logging line from
            
            
            
              my bunny logger, and let's see what happens after
            
            
            
              that. All this output that we are seeing right now is
            
            
            
              the actual trace that is being generated from our particular request.
            
            
            
              So if I scroll to the top, this is my log line that
            
            
            
              got hit and generated as soon as I hit my API.
            
            
            
              And these are the spans that got created, which is
            
            
            
              basically capturing all the information of execution of
            
            
            
              this particular API. You'll see it's capturing the
            
            
            
              metrics from the core library. It's having certain attributes.
            
            
            
              What kind of span processor are being used? What is the different
            
            
            
              body parsers that have been used that we are using in our application,
            
            
            
              the different span ids, as mentioned earlier,
            
            
            
              multiple spans and build a complete trace.
            
            
            
              And it provides a parent span id. So whatever the first request
            
            
            
              had, the parents, what are the first requests that came into the
            
            
            
              system becomes the parent span, and that id is
            
            
            
              attached to all the rest of the lifecycle of that particular request.
            
            
            
              There's a lot of output, again, a lot of debugging output which we will not
            
            
            
              focus on. We see how this output looks like
            
            
            
              in our observability back end once we have exported. For now it's
            
            
            
              getting exported to collector and routed to a backend.
            
            
            
              I'll show you directly how it looks like in the backend.
            
            
            
              I'll cover deeply about connector and how we can configure that
            
            
            
              we are exporting all our telemetry to Neuralink using the OTLP
            
            
            
              protocol. Let me click on my services and
            
            
            
              my service convert 42 hotel is already available here.
            
            
            
              You'll see some metrics are already coming in.
            
            
            
              Since I'm not capturing any metrics for my service yet, I switch
            
            
            
              to spans that are the cache captured from my service using the hotel
            
            
            
              SDK. We'll see. Some of the requests and response time
            
            
            
              are already being available here. I'll quickly just
            
            
            
              switch to distributed tracing to look at the rest APIs
            
            
            
              and the traces and analyze them. I'll click on this
            
            
            
              first one and start to see that there is this particular
            
            
            
              request. Let me click on the first request in
            
            
            
              this list. This is giving me an overview of what
            
            
            
              the request response time cycle was.
            
            
            
              It took 3.5 milliseconds for this request
            
            
            
              to get completed. There are certain operations that happen and
            
            
            
              if I just click on this particular trace you'll
            
            
            
              see there are attributes that got attached which we have seen most of
            
            
            
              it in the debug window, the entertainment,
            
            
            
              the duration, what was the SDB flavor,
            
            
            
              what was the host name, the target? So for
            
            
            
              we made a request on as a root rest API and
            
            
            
              that's what's being captured, the id of this particular trace,
            
            
            
              what was the type of request, and you
            
            
            
              also see how this actually was captured.
            
            
            
              The instrumentation hyphen HTTP this library
            
            
            
              was patched and it has captured this particular request and
            
            
            
              the library version. There are certain other attributes which
            
            
            
              are of help. If you have attached any custom attributes,
            
            
            
              it should also get listed in the same space.
            
            
            
              This is regardless of where you export your telemetry. It should give you an experience
            
            
            
              similar to this. Neuralink helps us get to the point quickly
            
            
            
              and that's why we're able to materialize all the traces
            
            
            
              fairly quickly. Let's look at some other trace.
            
            
            
              Let me make some two different APIs.
            
            
            
              I have the endpoint to which I'll make a request
            
            
            
              which is giving me basically weather information about
            
            
            
              the particular location. So I'll change the location to where
            
            
            
              I'm currently at and the API
            
            
            
              responded very quickly. So this particular endpoint
            
            
            
              basically makes a request to an external service which
            
            
            
              is also instrumented using open telemetry. But that
            
            
            
              service is not on my localhost, it is actually deployed
            
            
            
              elsewhere on an AWS instance. Let's see
            
            
            
              how open captures the essence of this request and
            
            
            
              helps us get the stigma information to give us a complete picture.
            
            
            
              I'll make a few more requests so that I'm able to have sizeable
            
            
            
              data that we can go through. I'll also make a request
            
            
            
              that can that should fail, with which we'll see what the
            
            
            
              errors look like. Once we start implementing our application
            
            
            
              using open telemetry. I'll just give a blah blah land
            
            
            
              and we'll see this specific request returned with 404.
            
            
            
              Now to show you quickly what happened, I'll just show
            
            
            
              you in the terminal. How many requests
            
            
            
              happened, what were the debug logs and if there were any
            
            
            
              errors. So you'll see this specific request failed
            
            
            
              and had a 404. So my request has completed, my trees got generated
            
            
            
              and this was the parameter that I have called. That was my
            
            
            
              HTTP target. Keep in mind we have not modified the source
            
            
            
              code of our application. We have just added a wrapper around
            
            
            
              our main application. So this is helpful if you want to quickly get
            
            
            
              started with open elementary so you don't disturb the existing application
            
            
            
              code and start experimenting with SDKs.
            
            
            
              Let's look at this specific trace in our back end coming
            
            
            
              back to our distributed tracing. I'll click on trace groups now
            
            
            
              I see that there are few more trace. Let me click
            
            
            
              on this and get into it further.
            
            
            
              Now there is this one request on Toggle,
            
            
            
              which was the most recent request and it has the most duration of
            
            
            
              all the other requests. I'm assuming this should be the request
            
            
            
              for weather. Let me click on this and as
            
            
            
              you see here, you'll see a map of the journey of your API
            
            
            
              request and how many other services it has topped my
            
            
            
              original service node js, from which I'm from which I made the
            
            
            
              request, and it has made an external call to an
            
            
            
              external service which is instrumented with open telemetry,
            
            
            
              which is in turn making another external request which is making at
            
            
            
              least eight different calls. And now that is unknown because that service is
            
            
            
              not incremented. Let's expand and see what happened underneath.
            
            
            
              You'll see the information of all the operations that happen underneath.
            
            
            
              There was a get request which we made from our system.
            
            
            
              It went to API weather request of
            
            
            
              Node express service, which is the Olodexpress portal,
            
            
            
              and there was some middleware operations and it also made a
            
            
            
              get request which is an external call.
            
            
            
              We can check what is the service that this particular
            
            
            
              service made a request to. You'll be able to understand what
            
            
            
              service external services are causing slowness in your application
            
            
            
              so that you are able to improve that particular area and aspect of your application.
            
            
            
              So basically this service is the open weather map application,
            
            
            
              open weather map service, and from which from this particular
            
            
            
              service it's requesting all the weather information.
            
            
            
              Once we have all this information, it becomes easier
            
            
            
              for us to triage and understand the behavior of our application,
            
            
            
              not just in happy scenarios, but also in problematic scenarios.
            
            
            
              Let me quickly go back and click click on the errors.
            
            
            
              You'll see that this particular request which failed at 404. Now this
            
            
            
              is being highlighted here. I quickly want to
            
            
            
              understand what has failed and new link provides you a
            
            
            
              good map and overview of which services were impacted.
            
            
            
              In this particular map, both the services are highlighted as red,
            
            
            
              which means both these services had some form of errors. We have seen
            
            
            
              that individual operations being captured in the spans,
            
            
            
              but if you want to focus on your errors, there's this convenient checkbox
            
            
            
              that you can just click and you'll see that there is a get request
            
            
            
              which actually failed and there was an error. Now this,
            
            
            
              since this is just automatic instrumentation and an external service,
            
            
            
              there is not much details. Now this is where manual instrumentation
            
            
            
              comes into picture. Once you have identified the areas, or once you
            
            
            
              know what areas you want to instrument, you can use the manual instrumentation
            
            
            
              and customize your error message, or even add additional spans
            
            
            
              to support your debugging and analysis journey of your
            
            
            
              applications. That's all about the tracing.
            
            
            
              We have set our tracing for unreal application using the
            
            
            
              automatic instrumentation trace provider and stress
            
            
            
              train the instrumentation using the pre built pre package
            
            
            
              library release as part of the automatic instrumentation.
            
            
            
              That's as simple as it to get started with automatic instrumentation
            
            
            
              for your node j's applications,
            
            
            
              let's get back to our code and
            
            
            
              include a metric instrumentation open
            
            
            
              industry provides us with packages to help
            
            
            
              us capture the metrics from our applications.
            
            
            
              In this part, we'll focus on configuring our
            
            
            
              metrics and extracting the metrics of our application over
            
            
            
              similar to the traces, there are a couple of packages
            
            
            
              that we need to be aware of, one of which is
            
            
            
              SDK matrix. This particular SDK provides
            
            
            
              us with meter provider, the exporting readers and
            
            
            
              helper function for debugging, which is console metric exporter.
            
            
            
              Similar to setting up a trace provider. What we do
            
            
            
              is we start a periodic exporting meter where
            
            
            
              we are configuring our meter provider to
            
            
            
              send all the metrics being captured to a console.
            
            
            
              But first, what we want to do is capture all these metrics from our
            
            
            
              application and export it to a packet.
            
            
            
              In this case, we are setting up a OTLP metric exporter
            
            
            
              without any particular particular URL. One of the default
            
            
            
              settings of any exporter SDKs for any of these traces,
            
            
            
              metrics or logs is that it always points to localhost 4318
            
            
            
              or 4317 depending on the protocol available.
            
            
            
              It tries to export it directly to that. The connector
            
            
            
              supports both 4317 which is receiving on GRPC,
            
            
            
              and 431, which receives on HTTP. Now for
            
            
            
              meter, once we have our meter exporter, we'll start
            
            
            
              with setting up our provider as it
            
            
            
              races require a trace provider. Meter also requires
            
            
            
              a meter provider. What we'll configure here is part of a
            
            
            
              meter provider. We just supply a name again, a resource which
            
            
            
              has a service name and the number of readers that we can add.
            
            
            
              This can be an array or a single value. Here I'm
            
            
            
              adding both a console metric reader and a metric reader exporter
            
            
            
              which is going to send it and export it to our observability packet.
            
            
            
              Once we have our service provider once we
            
            
            
              have our meter provider service, we can register it globally
            
            
            
              in two methods. One is using the open telemetry API
            
            
            
              which is open telemetry metrics and setting a global meter
            
            
            
              provider. We can use this we
            
            
            
              can use this in cases where you do not have
            
            
            
              trace provider or register instrumentations API available.
            
            
            
              With this you are able to configure just the metrics provider.
            
            
            
              But since we are using the automatic instrumentation API and
            
            
            
              a trace provider with registered instrumentation, we are going to
            
            
            
              enable this in the larger scope of our application as
            
            
            
              part of the list of instrumentations. The API that also accepts is
            
            
            
              meter provider. Here we can specify the provider that we just
            
            
            
              configured, which is our my service provider which contains the
            
            
            
              console metric reader as well as metric reader exporter, one which
            
            
            
              exports it to our console, another which exports it to a bucket.
            
            
            
              That is all the configuration that's required for us to enable the meter provider.
            
            
            
              Let's look at how the output changes for all our open
            
            
            
              telemetry metrics. I will disable the
            
            
            
              tricks debug information so we are only able to
            
            
            
              see the information from our
            
            
            
              meter providers. Let me switch back to console
            
            
            
              and restart my application quickly clear the console.
            
            
            
              I'll retain the logging level since I have modified it directly
            
            
            
              in the go. Everything else remains the same. Now we
            
            
            
              see a lot of debug output because we have just disabled
            
            
            
              that, but we will start seeing the output from the meter provider
            
            
            
              once we start hitting any of our endpoints. Let me quickly hit some
            
            
            
              of the endpoints and see what it looks like.
            
            
            
              One of the configuration that we have enabled is the flush
            
            
            
              timings for the console we have set it to,
            
            
            
              but you can go as aggressive as 1 second. It is
            
            
            
              the default settings is set to 60 seconds. Since meter can
            
            
            
              become too aggressive and EW cpu cycles,
            
            
            
              it is recommended that you configure it and tune it as per your requirements.
            
            
            
              Now this is the output from our meter provider which is capturing
            
            
            
              the information for histogram and it is capturing the duration
            
            
            
              of our inbound HTTP request, which is in this case is
            
            
            
              our endpoints that we are just calling. There are some data
            
            
            
              points and values that it attaches, but it's basically
            
            
            
              easier to visualize this rather than reading the raw data,
            
            
            
              but it also helps you understand what kind of data is being
            
            
            
              captured with these console outputs. If I make
            
            
            
              a few more calls to different endpoints, I'll start
            
            
            
              seeing similar output. There's not much difference except
            
            
            
              for the value and start and end time of that request
            
            
            
              registered or the application level basically captures all all the
            
            
            
              operations for available libraries. Since we also registered
            
            
            
              this as part of our automatic instrumentation, it's going to capture
            
            
            
              all the operation for each of these endpoints that happen in
            
            
            
              our let's switch back to our observability
            
            
            
              backend which is neuralink, and see how the metrics are being reflected.
            
            
            
              As earlier, all the traces that were being captured
            
            
            
              were exported via collector. Metric is also been sent
            
            
            
              into the new relay via collector. I'll switch to metrics
            
            
            
              in this instance and you see now the metrics chart has
            
            
            
              started getting populated metrics. Capturing these
            
            
            
              metrics helps you populate these charts, which gives you insight about your
            
            
            
              response time, your throughput, and if there are any errors,
            
            
            
              you also start capturing those details. Additionally, if you want to
            
            
            
              dig further into metrics, what are the different metrics that you have captured?
            
            
            
              You can go to the matrix Explorer and see for yourself.
            
            
            
              I'll come back to my goal and now that we
            
            
            
              have captured the traces and metrics, it's time
            
            
            
              to focus on one of the most important telemetry
            
            
            
              logs. One of my previous mentors had a famous
            
            
            
              saying for all the engineers loves logs.
            
            
            
              If there are no logs, there's no life. And that's particularly true
            
            
            
              especially for DevOps and SRE engineers. If the services go down,
            
            
            
              they start digging to the logs and try to identify what has actually gone wrong
            
            
            
              before they can recover the services. Let's focus on
            
            
            
              the logs aspect of instrumentation. For our node
            
            
            
              J's service we have focused on metrics, which was
            
            
            
              fairly simple. All you require is a SDK metrics and the
            
            
            
              exporter. For logs it's similar,
            
            
            
              but requires a few more steps than setting up your meter or your matrix
            
            
            
              provider. For the logs we focus mainly on the API
            
            
            
              logs, SDK and another package
            
            
            
              which is SDK logs. The SDK logs provides us with logger
            
            
            
              provider, the processor and the
            
            
            
              log exporter. These APIs helps us set up our application
            
            
            
              to properly set up the logs attached with all the
            
            
            
              information with regards to traces and metrics. We'll see
            
            
            
              how all this ties up towards the end. The library that
            
            
            
              I'm using as part of, as part of our application here
            
            
            
              is Bunyan. Bunyan is simple logging library,
            
            
            
              which is a very famous library for adding any any kind
            
            
            
              of logger for simple service. There is
            
            
            
              a library available already. If you're using Bernie.
            
            
            
              There is a library called Instrumentation Bunyip which helps you capture logs
            
            
            
              in open telemetry format. We've seen in the console that logger
            
            
            
              log format of bunion is slightly different. We'll see how
            
            
            
              that changes automatically. Without modifying any of our application
            
            
            
              code. Using this package, we'll set up our logger
            
            
            
              to start using and transforming our logs into standard
            
            
            
              open dimension format. Now firstly, we require
            
            
            
              a logger provider which again accepts a resource and
            
            
            
              our resource is again the same global object where we are setting up the
            
            
            
              service name. This is particularly important if you want
            
            
            
              all these material traces, metrics, traces and logs attached to
            
            
            
              the same service. If you do not provide the name, it's assumed as
            
            
            
              unknown service. That's the default name that it accepts.
            
            
            
              It's always a good practice to add your own service name and the
            
            
            
              default value for exporter is this endpoint which
            
            
            
              is localhost 4318 version one logs
            
            
            
              each of the exporter endpoints and each of the exporter exporter
            
            
            
              APIs for different SDKs have these dedicated endpoints configured.
            
            
            
              I've included here for your reference. If even if you do not
            
            
            
              add this particular endpoint is going to point it to the default connector
            
            
            
              receiver to the default endpoint.
            
            
            
              Once we have our exporter and provider, we can
            
            
            
              configure and attach the processors that we require.
            
            
            
              Similar to the span processing logs also have different processor
            
            
            
              which is simple and batch processor. I'm using a simple
            
            
            
              processor for the console exporter and batch processor
            
            
            
              for exporting it to our back end where I'm using the exporter
            
            
            
              similar to the metrics provider where we can register it globally.
            
            
            
              If you want to capture only the logs from your application, you can use
            
            
            
              the portal logs API which is open telemetry API with
            
            
            
              logs and use the global logger provider.
            
            
            
              But since we are using the automatic instrumentation, I'll go ahead and register it
            
            
            
              as part of the register instrumentation and that is
            
            
            
              all that's required for you to successfully include logging with
            
            
            
              open telemetry in your node application. Once we have enabled the
            
            
            
              logger application, we do not have to modify anything.
            
            
            
              Since I'm already using the bunion here, it automatically
            
            
            
              patches the instance of bunion with open telemetry logger.
            
            
            
              Once we enable our provider with the logger provider and register it with the list
            
            
            
              of instrumentation, we can add additional options
            
            
            
              that we want for the bunion instrumentation to modify.
            
            
            
              For example, it provides a log hook option with
            
            
            
              which we can modify the log record and
            
            
            
              attach any attributes that we want. For example here I'm attaching the resource
            
            
            
              attribute which is the service name from the trace provider.
            
            
            
              Any customization that you want to add to your log record with open telemetry
            
            
            
              and bernouin you can do so here. Once I have
            
            
            
              enabled this provider and added my instrumentation, let me restart my
            
            
            
              application. I keep the command as same
            
            
            
              and you'll see it's running. My application is now running. Let's hit
            
            
            
              the endpoint and see what the output looks like.
            
            
            
              Now you'll see apart from the default application log
            
            
            
              line which was coming, which is the bundling standard of logs,
            
            
            
              there is another output which is coming from the logger provider of logs
            
            
            
              and the information of all the trace id span ids, the severity
            
            
            
              is coming. This is because once
            
            
            
              we have included our log, it attaches
            
            
            
              all the other information for that particular trace and any additional
            
            
            
              custom attribute that we have included. Now in this case, the attribute
            
            
            
              that we have added as part of the logbook is the service name.
            
            
            
              This can be particularly helpful if your application is running on
            
            
            
              multiple hosts and you're streaming all the logs to a central location.
            
            
            
              This can be helpful to identify which particular service is breaking and
            
            
            
              where the location is. Once we have set
            
            
            
              up our logging we can see this in the backend. Let me
            
            
            
              quickly generate some more load so we can see all the logs for different I'll
            
            
            
              just hit it a couple of more times. One, two, three and
            
            
            
              let's switch to our back end there which is nearly and
            
            
            
              see what the logs look like. You'll see conveniently there is a logs
            
            
            
              option within the same screen which you can click on. Once I click
            
            
            
              on this you see that all the logs have already started to flow in
            
            
            
              and they have eight requests that I've already made. Once I click
            
            
            
              on any of this request you'll see that all the
            
            
            
              logs, the body, what is the service name, the span
            
            
            
              id, the trace id have been attached, but this is
            
            
            
              not present as part of the standard logging statement of bunny.
            
            
            
              Any logs that are being generated as part of our application
            
            
            
              is now patched with open elementary logs provider and it decorates
            
            
            
              our log message with all this additional metadata.
            
            
            
              Beauty of all of this setting up logs and traces and metrics
            
            
            
              comes into picture now to having to reach to the
            
            
            
              root cause of any problem, having the right context is very
            
            
            
              important. For example, let me make another request to my
            
            
            
              weather API where I'm going to fail it with a
            
            
            
              wrong parameter. Once my API has failed,
            
            
            
              I'll fail it a couple more times. Now I'll come
            
            
            
              back to my back end, which is neuralink, and I start seeing
            
            
            
              all this very stitched together and providing me all the
            
            
            
              context of the failed request. You see that information like
            
            
            
              matrix and span are all available already, but what I want
            
            
            
              to focus now is on the errors. We've seen the errors in
            
            
            
              the context of traces and how that looks like. What I'm particularly interested
            
            
            
              now is to see the relevant logs of that particular trace.
            
            
            
              Let me quickly switch back to the distributed tracing and switch
            
            
            
              clip on the errors. You see the three requests that just
            
            
            
              made have failed and there are three different errors.
            
            
            
              I click on this particular trace and I'm seeing the errors for
            
            
            
              these particular services. This is something we have already seen as
            
            
            
              part of the trace exploration. But now what I'm interested is to understand the
            
            
            
              logs related to this particular request. I do not
            
            
            
              have to navigate away from this screen now. This is particular to
            
            
            
              the neuralink platform, but this is also a beauty of open telemetry.
            
            
            
              The place id and span id that got attached to the log statements
            
            
            
              come into picture of being really helpful here you'll
            
            
            
              see a small logs tab on the top. And once I click on this,
            
            
            
              you'll see there are a couple of log lines which are having
            
            
            
              from particular request and that specific function.
            
            
            
              Now, in a scenario where you have tons of requests and
            
            
            
              that particular trace, and a particular trace fails, you would want
            
            
            
              to understand that particular log. And to go through the tons of
            
            
            
              logs is already tedious. Having the right context
            
            
            
              helps you get to the root cause really quick.
            
            
            
              In this case, I'm able to reach that particular log line without having to navigate
            
            
            
              away much. The other way to reach to this particular stage is through
            
            
            
              log screen. Let's say I'm exploring logs from all
            
            
            
              these services and I see there are a couple of errors and I want
            
            
            
              to understand what trace was this that actually
            
            
            
              failed. I see that trace id is already attached and
            
            
            
              there is a log message that's also available. But what
            
            
            
              is also available is getting to that specific request directory.
            
            
            
              Once I click on this request, you'll see that it opens directly in that trace.
            
            
            
              And this completes the cycle of combining
            
            
            
              the traces, logs and metrics. It provides you the complete metrics.
            
            
            
              The trace cycle and the logs thus
            
            
            
              operate can help you avoid pointing
            
            
            
              fingers to dev or ops. Also helps you reach
            
            
            
              to the root cause of your application problem very quickly.
            
            
            
              Now that we have seen how we get started with automatic instrumentation
            
            
            
              for load j's and capture traces, logs and metrics,
            
            
            
              let's look at another important piece which is open telemetry collector.
            
            
            
              The collector is a very important piece and part
            
            
            
              of open elementary which helps you capture information from
            
            
            
              your infrastructure as well as your microservices and different
            
            
            
              applications. The collector is built with three different components,
            
            
            
              receivers, processors and exporters. Receivers is
            
            
            
              where we define how we want to get the data into
            
            
            
              collector which can be push or pull with the
            
            
            
              application automatic instrumentation that we have covered. We are
            
            
            
              using the push based mechanism where we are sending all the telemetry using
            
            
            
              the SDKs processors is which where we
            
            
            
              have we can define how we want to process our telemetry.
            
            
            
              We can modify, attach any custom attribute or
            
            
            
              even talk attributes that we do not want.
            
            
            
              Exporters again work on the same principle of push or pull with
            
            
            
              where we can export the telemetry to multiple
            
            
            
              or single packets. Basically it acts as
            
            
            
              a proxy where multiple telemetry formats can work as
            
            
            
              an agent or a gateway. If you want to scale the character, you can send
            
            
            
              it up behind a load balancer and it
            
            
            
              can scale up as per your requires. Here's a simple example
            
            
            
              of open data collector where we are using the configuration
            
            
            
              file to add a receiver for collecting host metrics.
            
            
            
              You'll see once we add a simple host matrix block in a Yami file,
            
            
            
              we are able to capture all this information which is system dot memory
            
            
            
              utilization file system information, networking and
            
            
            
              aging information. There are tons of more information that you can capture.
            
            
            
              All you have to do is define in the receivers block for host memories.
            
            
            
              One of the important concepts with collector is how we sample our data.
            
            
            
              We have seen different processors for span and long which is simple,
            
            
            
              and batch processors with connector. The concept of sampling comes into
            
            
            
              picture of what we want to capture and how we want to
            
            
            
              capture. There are two most famous strategies,
            
            
            
              one of which is head based sampling and another is stain based sampling.
            
            
            
              Headbase sampling is the default that is enabled where it
            
            
            
              can just overall statistical sampling of all the requests that are
            
            
            
              coming through and the tail based sampling is something that
            
            
            
              captures and gives you the information for most actionable
            
            
            
              trace that are sampled. It helps you identify the
            
            
            
              portion of the trace data instead of the overall statistics of the
            
            
            
              data. Tail based sampling is recommended where you want to
            
            
            
              get only the right data instead of the tons of data that are coming through
            
            
            
              and flowing in from across your systems. You'll see
            
            
            
              how it changes depending on the sampling strategy on the left hand
            
            
            
              side in the configuration you will see in the processors block, we are defining a
            
            
            
              policy to enable the database sampling. You'll see the
            
            
            
              throughput that has changed before and after. Before we
            
            
            
              apply the database sampling, there has been a lot of throughput and cpu cycles
            
            
            
              that collector was consuming. This is because it tries to process all
            
            
            
              the information that is coming in and tries to export that.
            
            
            
              With tb sampling we can reduce that throughput
            
            
            
              and also reduce the number of spans that we are sending, which becomes
            
            
            
              easier for any engineer to start debugging and
            
            
            
              see only the actionable samples. There is
            
            
            
              another form of sampling which is probabilistic sampling. This is
            
            
            
              entirely different from head based and tail based sampling.
            
            
            
              In probabilistic sampling, you set the number of
            
            
            
              the set the sampling percentage of how many samples
            
            
            
              you want to capture from that particular system. It can be 15%
            
            
            
              to 60% or even 100%. This is particularly
            
            
            
              recommended, in my own opinion, that you can start using
            
            
            
              for any new projects that you are deploying. This helps you understand the
            
            
            
              behavior of your system and once you have understood what
            
            
            
              percentage of samples you want, you can switch to database sampling and
            
            
            
              refine your policies to get the most actionable samples from
            
            
            
              that particular service or infrastructure. Let me show you the
            
            
            
              configuration file of the collector that I was using locally to
            
            
            
              export all the telemetry from our node j's application and
            
            
            
              also how you can get started with collector by
            
            
            
              running it simply in a docker container. You can
            
            
            
              start using the open telemetry collector locally by running the docker image.
            
            
            
              There are various versions of open telemetry collector image that are available
            
            
            
              on the Docker hub. One particular image that you should
            
            
            
              be using is open telemetry collector iPhone contrib. This contains
            
            
            
              most of the processors, exporters and receivers
            
            
            
              which are not available in the primary mainstream branch
            
            
            
              which is open telemetry collector. The country version is where most
            
            
            
              of the community and the different plugins
            
            
            
              are available and are being contributed to. One thing
            
            
            
              that you need to be mindful is you need to have the ports 4317
            
            
            
              and 4318 open. You can get all this information
            
            
            
              directly from the open elementary country, GitHub, repo or the Docker
            
            
            
              page. Once you have this container up and running,
            
            
            
              you can start using the collector to receive the metrics and export it.
            
            
            
              Since we have already configured our application with instrumentation,
            
            
            
              let me show you what the configuration file that we use. The collector
            
            
            
              requires a config YAML to be present with which it's
            
            
            
              actually driven. There the three main components that we talked about,
            
            
            
              the processors, receivers and exporters.
            
            
            
              These. These are the three blocks that are made within the collector
            
            
            
              and with this we are able to configure how we
            
            
            
              want to receive the telemetry. What we
            
            
            
              want to do with the telemetry if you want to process that,
            
            
            
              attach any custom attributes and where we want to export it,
            
            
            
              we can see all the debug output with collector as well, similar to
            
            
            
              the SDKs. By adding the debug attribute and
            
            
            
              where we want to export it, here we've here I'm
            
            
            
              exporting it to newly. So I've added the exporter endpoint,
            
            
            
              which is OTLP endpoint dot in our data. With my particular license,
            
            
            
              you can add multiple exporters to different observability backends,
            
            
            
              or if you just want to extort it in time series database, you can
            
            
            
              do so too. The particular important
            
            
            
              block in the configuration is the service pipelines.
            
            
            
              Here is where you will enable all these processors,
            
            
            
              receivers and exporters. In the pipelines
            
            
            
              we we add what to enable, for example for receivers,
            
            
            
              I'm enabling OTLB for all of the traces, matrix and logs
            
            
            
              for processors. What processors to be used for individual telemetry
            
            
            
              and for logs for exporters. Which one you want to export?
            
            
            
              You may not want to export everything, but you just want to start debugging.
            
            
            
              You can remove the exporter from the pipeline and the collector will
            
            
            
              still process that without exporting it. For example, in the processors for
            
            
            
              logs and I'm attaching a custom attribute which is the environment,
            
            
            
              you can choose to add multiple processes for only one
            
            
            
              telemetry or all of it. With that, I want to
            
            
            
              conclude my topic of open telemetry 101 today
            
            
            
              to just recap and give you a few highlights of everything that I've covered.
            
            
            
              First of all, it's an exciting time for open source observability.
            
            
            
              Open elementary is growing and being adopted at a very rapid pace,
            
            
            
              and not just in terms of contribution from the community, but also in
            
            
            
              terms of adoption. Like GitHub,
            
            
            
              Microsoft, Neuralink are contributing heavily to include it
            
            
            
              in their own ecosystems. But you need to be mindful with your maturity
            
            
            
              and you have to plan ahead with the adoption of open telemetry. Start with
            
            
            
              automatic instrumentation and then advance towards manual instrumentation
            
            
            
              as a way to understand and mature of what is
            
            
            
              important within your system. Just having a form of automatic
            
            
            
              instrumentation and collecting telemetry is not observability.
            
            
            
              Your instrumentation should include a proper contextual information for traces,
            
            
            
              logs and metrics to improve observation. Remember the
            
            
            
              example that we covered where we are able to see logs, metrics,
            
            
            
              errors and traces all in a single place. That is
            
            
            
              the complete powerful observability system where
            
            
            
              you are able to reach to the root cause of your problems.
            
            
            
              You can deploy the collector easily. Together, there are multiple options that are
            
            
            
              available. You can deploy it as a standalone, as an agent,
            
            
            
              or as a gateway behind a load balancer. Or if
            
            
            
              you are using kubernetes you can also deploy it on Kubernetes in various
            
            
            
              modes of demonstrate, stateful, set or even as a Kubernetes
            
            
            
              operator. You can start collecting data from all your pipelines
            
            
            
              as well as multiple distributed systems which can help
            
            
            
              you with your MTDI,
            
            
            
              MTD and MTDR. One of my final advice
            
            
            
              that I would like to close with is there is a lot of active investment
            
            
            
              going on with open telemetry and it helps engineers
            
            
            
              work based on data and not opinion. Thank you
            
            
            
              and I'll see you next time.