Transcript
This transcript was autogenerated. To make changes, submit a PR.
Okay guys thank you for joining this hasom talk.
We take a look very quick or
not could be not about observability. Also we focus it
on QR nets that is focused for this talk and
we take a look more deeper about how
we can check an instrument and focus it observability in
our application. So please join to this awesome
journey for today. That is me, that is Jonathan
Jill and let me change the landscape here so
we can start here at this part we talk about
for this little
presentation about observability
and this is me Jonathan Chavez. But let me move here.
So who is Jonathan? Who is Johnny Palm?
Who is Jaden 24? Just a human that loves
share knowledge about instrumentation, about Linux,
about ecosystem, about cloud, about DevOps, about sire,
about something that I
learned I want and I love share that.
That is my social network on GitHub, on YouTube and also for
Twitter of X. And I love this
little quote. Life is really simple but we insist on making it
complicated. That is really true because when we
start for this journey on it journey we
have a lot of challenge every day. So that
is our preposition. I think this talk
observability in kubernetes also we
talk in deep about metrics, logs and traces with open source tools.
That is the best part the open source tools for
our companies on also every part of
the implementation here that is of content for today we talk
about for PSS scope, for introduction observability,
what is CNCF, what kind of tools we
can check for open source tools,
what we have challenged here using observability
on kubernetes and also we take a look in a little demo here.
So start with the introduction site. We need
to take a look very quick about
the QR nets architecture right at this
moment. So we have this part,
the information about how to the QR net
is internally works because we need to take a look here
the control plane and also the worker node that is
related here. And also if we deploy this cluster
using some cloud providers we need to communicate our
cluster using this cloud provider and also we have a little
here also these components.
So we need to take a look more in depth about the
internal worker nodes components that is
related here we have internal for the node we have
container runtime, Docker container
D or something like that. That is part of the node not
for the control plane. Control plane must focus it for the
brain behind of especially for ETCD
and scale part. That is pretty awesome job here
because using the reconciliation process they
mentioned how we can deploy our chain, our workloads
inside of the cluster of kubernetes.
So this part of cube left and Kuberoxy internal
for kubernetes. The Kubernetes is how do we can response
all requests from the control plane side and also using the
Kube proxy for how we can communicate one
process for another. I means how do we
can communicate one port for another port or
one service for another service that is part of the
Kubernetes proxy internal deployed here.
The another scope here are related for the pod.
The pod that is the minimal part that we can check inside
of kubernetes but is the most important I think because
that is related for our application. That is related for
all demon set that we can generate using depends
of our needs current
crds behind of kubernetes. So we need to take a look
for this. That is at the end of the day the world that
was executed and contains our applications
for service all our customers.
So we have here one container or couple
more than one container inside. Here we have one
part for the network, another part for the volume that
you attach for your pod. Depends for your
needs that you need to deploy inside of kubernetes.
Also we have at this moment kubernetes add ons
like DNS like metric server
that you added to the node and you can expose
and communicate and generate this relation between another tools
and another part. We have demons jobs and another objects
inside of kubernetes that will be deployed inside of
the node. But depends exactly for
what kind of object you implemented for your clusters.
Right. So that is a big scope about the architecture
for kubernetes. We have here a lot of challenge if you try to
observe what happened in your cluster, what happened with
your application. The focus for today and this tolkie is focused
for pod that is related for our application.
Right. So we need take a look
very quick about what is observability we have
here observability that is related for the property
for the system. And we can take an
actionable insight for our application that
allow to us to understand the system.
That is the current state depends for the external
inputs and how to these inputs modify
the behavior internal for their application. Right. So I
take this definition from the glossary that is related for CNCF
that we need take a look more in
deep about in this talk. So at this part
we define the oily and white. The oily and white
is related for o for this o and the
eleven characters here and the y that is
the n very fancy. Very similar for the QR. Net scope that
is for Kus. So we have the
same correlation here. Please explore if
you can take a look from some article, something like that you
find observability, you could be you find the oil
and white at the same part that is very common internally.
Depends of your teams or depends of your advantage that you have
internally for all your application that you have currently.
Right. So move for the next.
We start in deep about the
observability equal in triangle. That is very
important for us especially because at
this part we define what kind of information we
can get from the everyday application. We have for one
part once defined for logs, that is
an obstructor data that the application start to execute.
When we execute one application generate
a lot of data behind of the implementation framework,
the implementation languages. So you have the name
for the application could be on something like that and you can store
and get that information. At the end of the day that is one part.
The second part is the metrics that is related for
quantitative information around the components
that support your application. What components? The cpu,
the memory and also the disk that you attach
it for this case for the pod. And also we
can take a look about the traces. That could be
the most challenge here because we have
one request that comes from our application and how do
we can measure that request
jump to another application, how do we can take a look
for that and we can see what happened with this request and what
happened with another response for the autonomous systems or the implementation
and also the communication side that we
need to take a look. How do we can adopt this challenge
for us and how do we can move this challenge also for our
teams internally for take more deep about
information from our systems, right. But we need to take
a look for these three information site
for our applications. So why
we need observability, that is an awesome part because
we take a look very quick for our application.
There is our customers at this site,
we have our cloud provider, something like that or could be environmental.
And this request comes to our firewall.
This request jump to the load balancer, this request jump to
the application, the front application or back end application, I don't know.
And this application queries some data
to the database side and also get information
for our archive. So that is a normal
architecture that you could deploy for your
super or your system at this site. And also we have
here one challenge because we have this spider
guy, right? We have this application and we
generate this application and split this application using the microservices
or services or nanoservices behind off depends of the
architectural definition that you have internally. So you
have this challenge right? Because we are the spidey guy,
but what happened one day to this
Spidey guy happens. The database
wrong. Database fail, the firewall
fail, the application fail, the archive fail,
our cloud provider failed or could be load balancer fail. How do we
can check if these components works well
or not? We need is inside of
our infrastructure, inside of our
application also because if we don't take
a look for these components inside of our application that it's very
huge problem here.
Identify exactly the root cause here.
That is one part that we need to take up our separability because we
don't have these eyes inside of application.
We don't have these eyes normally for the infrastructure
side we need generate this eyes for our
little spider guy. So we need to pray for this
little guy because at the end of the day this is the man
that we need to take a look for what happened internally for this application,
right? So we move from the
spidey guy to instrumentation. Why? Because we
need to use a couple of definition
behind of that. We talk about for one
part for the pod we need to talk for one part for
these little architecture definitions. But when
we move for the instrumentation we take one spar
from the Wikipedia and this Wikipedia mentions
to us, hey, we define the instrumentation
as part for the physical drives and you
most could be for the customer's
site. How do we can get this measure internally
for your,
sorry, your granary or something like that or your farm. How do
we can explore these physical devices for
obtain information about the environment, right?
If the environment is well or not or it could be wrong
with deploying all these focuses
all these data that you can be
collected from your side on also what happened internally
for the farm. So we need to take a look very quick
and very in a same way for observability
because we need to instrument our application. We need to generate
this metrology automation
for our application. Also how do we define exactly
the physical or in our case
the software devices for take a
look for this instrumentation. We need to define how do we can
instrument the application, how do we can call it this information
about the log stresses and metrics for every part of our
application. So we can define how do we can take a
look for this application or how do we can obtain this information
and also how we can send that information for
a correlation side and also how we
can store that information. Also at this
point we need to take a look for this the CNCF
the CNCF is related for the clonetic computer foundation
that is behind of one of the biggest projects here. That is once
for Q A. The second one is open telemetry. That is
pretty huge projects behind of that
the community, support for the community and also deploy and support
to the community that it's very nice ecosystem
because when we talk about the
community we talk for one part of the biggest
smart guys behind us. That is the biggest challenge here
and also that is the biggest impact for the community, from the community.
So as part of the CNCF
behind of are the Linux Foundation,
Linux foundation could be, you know, could be done that is
the Linux behind of the open source operation
system that you could be implemented for your internal application
or could be your docker, AI set application or something
like that. So now in that online
we have at this point we have the landscape
that is the address here you can explore
also and we have one chapter here related for observability
and analysis. And also when we
start to explore that we have a lot of tools
that day from the pay side also and also
the open source side we have at this
moment 98 tools here focused for monitoring.
We have at these 21 tools
focused for lodging and we have at this point
18 tools for tracing. So it's
just two more ecosystem here that I don't have
in this presentation. But if you explore the landscape you can
check that's one for chaos engineering
and also another one is for the optimization
part. So when we start for observability we
can enable a lot of components behind
of the application, a lot of components behind of our structure
and also for our companies how do we can move very quick for
another practice internal for our companies that
is pretty nice because when you start seeing
what happened internally for your company, you can start to move and
start to generate more focus for another part that
you could be done take a look in deep for some case,
right. So with
this part we need to take a look very quick about these open
source tools. I recommend for instrumentation using open telemetry
you can use another SDK depends for
your cloud provider also you added the SDK
libraries and you can deploy these libraries
and implement and you define
with the development team how do we can start this journey
for the instrumentation side. But I recommend that it's
very quick with open telemetry because with open telemetry you
have the two possibilities for instrument your application,
the auto instrumentation that is part for adding just
one line or could be in QR net is adding the sidecar pattern
and also this take a look about the
information from your application deployed
and take this information from your application and
start tracing these metrics inside for logs
for logs traces some metrics. That is one
part. The second part is related for manual instrumentation.
The manual instrumentation are related for hey I need to take a
look how do we can call it that information about the
log about traces or metrics and I
split this little site could
be this is more difficult but depends exactly
for your maturity appliance for observability
internal for your teams right.
But I recommend exactly open telemetry is
a nice framework. Sorry also for collect logs I recommend
file lock that is supported for biopentelemetry
and also for traces Jaeger and for qualities
metrics I recommend Prometheus. So that is
the path that comes from observability.
We instrument the application the instrument applies
to the application then the application generate logs
the application generate traces application generate metrics
and export that metrics the traces and logs
for one side using opentelemetry framework
and you can collect this information and also
seeing and take a look inside of Grafana for one side
we need to how do we can store logs how do we can store
the traces and what is the lifecycle
that you have internally for this quantity of traces or
times could be one month or six months or one
year or something like that depends on the object for the company
site and also you can take a look
inside on the top of these collected metrics
you can drop using Grafana. Right.
So what happened with observability
site on Kubernetes we have these two approach internally.
For Kubernetes site we have for one site open
telemetry that I recommend and also sees another part
that is related for EBPF. But EBPF
is more focused for implementation and efficiency.
And if you want take a look more in deep you
can check this article and also read another resources.
I check this article and I see that
it's pretty awesome how today explain the differences
between from one side and another side.
And I appreciate exactly this part of open telemetry
that is very easy to use and also all
the compatibility that currently have for all languages
that you call implemented your microservices.
If you want to take a look for open telemetry or OBPF
please take a look and please feel free to doing that
and also start this journey for observability side.
So that is the best part for our site.
That is the demo site, right?
We have here the achieve tourist behind
off. Let me out for the presentation mode
and also we can take a look very quick for this demo
for open telemetry we can access here
exactly opentelemetry define how
you can implement opentelemetry and
this part of the architecture site add to you
exactly this kind of frameworks for that
site. And also this is the architecture deployed
for this demo. We have on one site these
microservices deployed here.
This microservices was wrote
on. Net, C plus Plus, Erlang,
Golang, Java, Javascript, Kotlin,
PHP, Python, Ruby, ROS and TypeScript. That is the
common languages on
the current scope for we have and also when
we have this microservice here we define
exactly how to the data comes and
how to the data flow internally for our application.
Right. When we export and we try
to call it these metrics what happened internally we
need to take a look for Prometheus from one side that is related for the
EPM site exactly for cpu,
memory and disk, right? And then other side
that is related for JK that is for traces.
If you want to add the part of the log please go
ahead and take a look for this instrumentation, how you
can store the log and how you can define this flow for
the log site. So from Prometheus you collect this information
for the microcycle site, this part for the oddband
telemetry collector. That is the configuration internally for your
application and you explore this part. That is
how do you can receive that information, how do you can
process that information and how do you can export that information
from the open telemetry configuration. That is pretty
awesome here because if you want to generate could be a leaf
and chief and I don't like Prometheus, I use another
tool, I don't like jigger, I want to use another
tool you can remove from here and you
added your flow and you can move very
quick for start using another tool. That is pretty
awesome because you don't need exactly change more
internally for your teams or your development team,
you just need change the configuration site that it's pretty awesome
because at the end of the day you remove this responsibility for
the development team and you add
this responsibility for the it guy. But you need to know
more in depth what happened internally for your traces and how
you can call it the traces for every signals
that you have internally for your system, right? So that
is internally for Prometheus. Prometheus received that information
using this URL
and also this information will be storage on the Prometheus
transactional database and at the end of the day when
the data was storaged, the Prometheus reflects
that spark for export that metrics and also you can
see that metrics using Grafana that is part for Prometheus
and it's the same way for Jake. Jaker comes here
using the GRPC protocol and comes that
tracing here Jaker store that information and jagger
export that information that will be consumed from Grafana site.
That is pretty awesome because at the end of the day you have
this site that you can take a
look for like a pipeline, something like that internally for
open telemetry configuration site and you can drop
here one tool that is Prometheus and you drop here another
tool that is called jigger. And also you can use
Grafana internally at this point for explore
your metrics and also generate this awesome dashboard that you can
share with your it guys or could be for more
than for your CTo or CEO
guys. And also when you drive
to this dashboard you need to take a look what kind of
the customer end user need to
see this dashboard. There is another journey but
you need to take a look internally for that, right? So that
part for the architecture site and also that is related
for the ingest flow or the telemetry data flow
or the signals flow for store that information.
So we can take a look very quick this demo,
this demo actually is currently on
my GitHub and also if you want
to explore and generate this information also and explore
that demo internally for you pretty
well. Currently I deploy all my
infrastructure using AWS. So let me move here
very quick to our
demo site. So previously I
don't run exactly the part of the cluster
generation. Let me start the cluster. So at this side
we need to take a look here. Eks create cluster
here and also drop here. At this
point we start to create the clusters
site. So let us wait for
create this cluster. Okay, so we
take a look for this cluster recently created,
that was created from using the EKCTL
CLI version from AWS generated.
We send here the cluster information related for
version that is 1.27
and the name of this cluster will be conf 42
qa native. So if we take a look here,
Betty queen for elastic Kubernetes service that is internal for
kubernetes for AWS on Saclier
region USA. Two we have also the
possibility to check what kind of component was deployed on this
command. This command deployed
the wall cluster that we talked internally on previously
slides that is related for the control plane and workers node
site and also another capabilities that AWS
needs to manage this clusters because when
we deploy using the EKS CTl or using the
eks provisioning way we deliver
some responsibilities for AWS. That is pretty awesome
because when you define a cluster you
have to define exactly what kind of part of
the cluster you generate here. So let me move
because I select the wrong part
here. So move pretty quick to AKS.
Again let's secure network services and
also take a look for the clusters create. At this point you
have the possibility to check what happened
with your cluster. We just have the empty cluster
because we don't drop here any kind of
components at the moment.
So let's check. Meanwhile Qctl get
pods when we send
this information we connect from the API side
and request all pods from
this API. And with this
option hyphen a in uppercase
we request
that was focused for all namespace internal of
Q a, right. So we have here the part of the
Kubernetes side, the Qrnet is deployed
on AWS and also
you can see here the command end the
all pods internally deployed on Kubernetes side,
right. That is our decode deployed git current service
API. That is timeout here
that it's very uncommon part but could be you
don't have to see that we create this cluster
20 minutes ago. And also if we can take
a look for the compute here we
have these worker nodes that is part of the q and
a site exactly for the worker nodes side
when they supported the deployed here all
pods that we need to deploy it. So we continue
here with our cluster definition.
We can use this alias, this alias that is very helpful for
you. If you want to send more quick comments
inside of kubernetes you just send Kubectl get bots
open a and that's it. And it's
more quick for generate and send that request for
the API side. Okay we need to take a look
for another tool that is called helm. Helm is a package manager Internet
of kubernetes and also we start using
this tool that is part for the
components deployed and that's it. They take
a look from the pod site and end and
we pass here the command. With this command
we added the repo with
these commands here the repo called it open telemetry.
And this take from this helm chart that
are storage on GitHub site. Open telemetry already
exists because we added previously helm
repo update because in some cases you need
to update the reference instantly on your system,
that is on your local environment nor
currently yet for the cluster site just update
this repository on your site. So that
is the important command. The one is the important command here because
we added this configuration site because if you
sense the first one here you have our
own definition for helm file because depends
exactly for the version for qrnettes also and we deploy the Kubernetes
using the API deployed on use
Qrnate site for 1.27.
So at this site we install a
package management for qrnates that we
will call it my hotel demo. And also we
use the open telemetry, open telemetry demo. So we have here the
open telemetry site and we use from this
package management the open telemetry demo using the version exactly
that is the components that will be deployed inside
of the Kubernetes site. So we need to wait here
little time because they need to download this
package for this version and start deploying all packages
inside of Kubernetes. What kind of packages? We talk about the pods,
we talk about the services. We need to deploy a lot of components behind of
the part that as part for this architecture that was
defined for the open telemetry site right.
Also that is very quick if you want to install
your applications because you install
don't want an installed site that it's awesome
for your application and start your network observability
using open telemetry site that it's very quick for
start your PoC or spikes depends of the maturity of
your teams. So we need to take a long
little time here wait for this appliance
internal for the Kubernetes side. When these components are
already deployed we can send
this and prepare this command
here and also these references internally
for the Kubernetes side okay hence the
installation site and we can open using
the port forward configuration
site here and we drive this port
forwarding for the service internally for the
service call it my hotel demo and we generate
this port forward from 80 80 locally and
they take from the internal site for 80
82. So we can send this for
the Qrctl port forward site and
this granted to us access to this
information that we return from the helm
installation site. So open the first part
that is related for the one deployed here. That is for
the web store that is the application that we deployed
using the architecture here. And also we
need to take a look for Grafana that is the part that
we start to check what happened here.
So meanwhile the port rewards start,
we wait also. So that is our
application side. And this jump for every
request that was returned for our web
browser here that is the part for our application that is
a hue application. When you down the products
start to load then the requests start to generate
also. And when we move to Grafana we
have that part for the once configuration
site. If we move pretty quick here for the configuration,
just for the data source, we have here the jigger configuration
that we mentioned that is related for the traces and also for Prometheus
that is related for this
cpu and memory site for our application.
Right. So we take a look here very quick
for the default browse the
full dashboard that was created on this deployed.
So open and open and open this four dashboard.
And also we can take a look pretty quick about
these dashboards. If you saw that it's
very quick this installation, just one
couple of minutes for start your journey to
observability using Q a site. Also you can
start the journey using other tools. Yeah,
but could be you need take a look more in deep about these
concepts and also generate this implementation in
your application site. So that you
need to take a look internally with your development team.
Right. So we saw here information
related for this service,
right? We have the feature flag service
and also we jump for every service here and
these graphics update for that service and what kind
of service and also how this service
was deployed inside of the Kubernetes,
right. You can jump and explore all service here and
you can see here what happened with the cpu. What is the recommendation here,
the recommendation for memory, the recommendation for scout, the error
rate, the service latency, what kind of times,
what's generated for every service and from
the request to response the times between of that
and the error rate that is related for the errors
that you have internally for your application site that
it's pretty awesome because you have using the
traces that comes from open telemetry you have
this dashboard. And also if you added your microservice
you can drop here and identify your service and you start
this journey using your open telemetry instrumentation
site that you can enable for that.
For that site we have for this dashboard we have information
from receivers from exactly this component
that we call it here. Let us move,
I close this part but when we
talk about the configuration side, we talk about the
one flow here, let us load again
the open telemetry page.
Oh that is here this auto configuration,
right. We have internally deployed on Kubernetes,
one components on Kubernetes. So we
need to take a look what happened with this component.
What is the behavior for every request that comes from our application
or for the flow that I ingested?
From Prometheus and also for Jagger and
what kind of behavior we can check here,
right? What is the cons requested for and what
is the response and what is supported for that. But it's
pretty awesome because enable the observability
also for your components deployed for that. The another part
that is the traces baseline that is the complete opentelemetry
collector data flow that is related
for this part, the TS flow, how do they
jump from one side to another site? The collector here,
the processor here, the batch here and the sporter here and for
logging are also for open telemetry.
That is for the traces pipeline at
this part for the metrics pipeline, right.
We have currently obtained that information from
one part for the trace from metrics
and what happened with Prometheus. If we open
this view here,
we saw the Prometheus collect
information for these services. That is
all pods deployed on the current deployed
cluster, right? So we have here exactly
what happened with these components for this configuration
side that is related for this diagram
and that we can take a look here and also what happened
internally for these components it was accepted or refused.
What is the total, what is the batch site, what is the total for this
batch and what happened here for logs and also for the pen
telemetry configuration site. The next is
related for the span metrics team. That is
all traces that currently we have from opentelemetry
site. And also we saw here what happened
exactly with these components. What is the jumps, what is
exactly the request times, what is the endpoint
latency for every component that you currently have.
And it's pretty awesome because when you start this journey you
need to identify exactly what kind of components are behind
of your application and what kind of these components
you need. Take a look more in deep for this implementation
that you currently have. So that is the exactly
configuration for that. If you want to take more in deep
about these metrics you can move very quick here
and also you can run these metrics and you generate the metric
for that case we use the request,
you run the query here and
you take a look that information from this and
also you want to take a look what happened with your
jumps using for Jagger you can hey please
load this information from the jigger site.
Run query. No, we need to take a look for this search and
generate and select one service here and
execute the query and you start seeing what
happened with the trace internally and when you select this
trace you have the possibility to show
what kind of jumps this request generated.
This request generated one request for
let us open here,
open here this request generate
another request for checkout services, for car services,
for product service catalog, for currency, for product catalog
service again for current and these all
jumps that you are currently had with this
instrumentation site for your metrics that it's
pretty awesome I think because at the end of the day you
could enable this demo for start your journey
to observability site.
So that is part for the demo
site and also you can take a look more in deep for
your site. And also if you want to deploy a clusters or could
be you have your cluster, you have the possibility to
deploy that using your current cluster. You don't need exactly to
generate a new cluster also. So that is a brief
resume that we talk for today that is from metrics
loss and traces with open source tools. We have on one side
the Kubernetes architecture in a pick scope. The observability what,
why, where and how the instrumentation
site that is part of the how site, the golden triangle
that we need to take a look with
our development team the CNCF what is
the scope from CNCF, what is the community for the community and
also the open source tools that we take a look for doing and
start for this awesome journey to observability using
Kubernetes site. So thank you,
I appreciate your time. I appreciate you learn something new.
And also if you want to contact me you can
send me a message using GitHub or could be using YouTube or
X or Twitter like you mentioned you have here exactly
what kind of reference that you can read.
Also for take more in deep about this
research about process, about the terminology behind us.
So I appreciate your time. See you soon. Thank you.