Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey everyone, let's talk about the road ahead is due observability.
Before we talk about it, who am I?
I'm Siddharth Kare.
I'm working as a technical account manager with New Relic.
I spent almost a decade in the industry and worked prior
to working with New Relic.
I was working with Citrix as a software developer.
I'm a mobile enthusiast and I'm usually work on creating the awareness about
why the observability is required for your mobile application and how It
will help you So a brief agenda today will be around the introduction of
microservices Istio recap, what and why, of Istio, key metrics for monitoring
the Istio, followed by a short demo.
So what are microservices?
So it's an architectural style that structures an application as a collection
of services that are independently deployable and loosely coupled.
Services are typically.
Organized around business capabilities, e services often owned by single small team.
You can see in the background how we categorize different teams
and they own their own services.
Basically, with microservice, you will have an autonomous structure like each
component service in a microservice architecture can be developed, deployed,
operated, and scaled without affecting the functioning of their services.
It is specialized like each service is designed for a set of capabilities and
focuses on solving a specific problem.
And there are multiple benefits.
like agility, flexibility, easy deployment, and there are many more.
Now, let's talk about what is Istio So Istio is a service mesh, dedicated
infrastructure layer that you can add transparently to your applications.
This new layer adds extra capabilities to the infrastructure, allowing you to manage
the traffic between your microservices.
So you Can create your own rules to balance the traffic
based on their preference.
Implement the fault injection rules.
To apply chaos engineering to your code and many more options, right?
And Istio service mesh is made up of many different components split
mainly split into two layers like control plane and the data plane.
So there are some of the features that the Istio control plane provides like
load balancing for the HTTP, traffic.
Control flow mechanism of traffic behavior by implementing rich routing
rules, fault injection, failovers.
Et cetera.
with Istio, control Train is Studio is named, mainly the name for the
Istio service mesh, controlled plane.
It consists of few components which are listed here, like
Pilot Citadel, ga, gallery.
So what Pilot does is it is a component, which is responsible for
configuring the proxies at the runtime.
It propagates new configurations to steer objects.
Through the ny proxy.
So see if it's seated in what it does is it issues a certificate
and ensures They are being rotated.
It's like a, internal certificate authority, you can consider.
And then last we have Galley.
So this, basically validates and distributes the configuration
within the Istio service mesh.
So after validation, configurations are being sent to the pilot and
then pilot distributes these.
Now, if We spoke about the control plane.
So what is there in the data plane?
So you might have heard multiple times about NY, right?
So NY is a data plane component in The issue service mesh architecture.
It is not the part of we can't say it's a part of a Control plane, but
its role is key to make the institution service work n y is a proxy that
is collated in Pods as a sidecar.
So along with the original container, which is being deployed So we need to
make just make sure that the injection is enabled So this sidecar proxy is
responsible for handling the traffic between services in your cluster from
internal to the external services and We can say like without in why
it wouldn't be possible to propagate any changes You or establishing the
communication from one service to others of, services in your service
mesh in short, nothing will work, right?
So now, the question comes how you can monitor, right?
So And once you start monitoring, what are the key metrics?
So these will be the key metrics like this your request total request duration
milliseconds tcp connection open total the request total to the destination
service and how it is being monitored.
So Istio uses envoy And it's a very high performance service proxy
to handle inbound and outbound traffic through, a service mesh.
Istio's envoy proxies automatically collect and report the detailed metrics
that provide high level application information via Prometheus endpoint.
So there are multiple ways how you can monitor your Prometheus endpoints, right?
so for today's demo, I'm leveraging New Relic.
So if you go to New Relic to add your Kubernetes integration, you just
have to click on integrations and within the integrations, you will
have a, Kubernetes, option available.
Once you go to Kubernetes, you can leverage any of the method.
and you can start monitoring once that, once it is being monitored
at that particular time, you will start seeing that the cluster
will start coming up into new day.
So in my case, the cluster name is con 42 and you can see the metrics here.
So if I click on this particular cluster, so you will see the overview
of the cluster where it will show you how the clusters are performing.
If there is anything pending or.
what all things are in running state, right?
So in my scenario, it's Istio.
So I'll just search for the Istio proxy.
So here you can see that there is a Istio ingress, right?
So if I click on this, here, you will have an option.
You will start seeing the Istio ingress.
Let's say now, if you want to understand the pod details, you click
on the pod details, you will get the complete information about how the
metrics looks like, what is the CPU utilization, throttling, memory usage.
all these details you will start seeing here now at any point
of time Let's say you just want to see the logs of your issue.
so Ingress, right?
So you click here you go to click on see logs Once you click on that, you will
start seeing the logs coming up here.
So Let me just increase the time frame.
So once I have, increased the time frame, you can see all the logs coming up here.
and you can use these logs to debug any problem.
Now, what are the metrics which, we discussed, right?
So there are multiple counter metric, which we spoke about.
So this is a kind of, there are N number of metrics, actually.
If you see here, I just went to metric and there are agent certificate related.
There are, expiry related metrics.
There are, It's go, routine related metrics for his two agent.
there are go threads, related metrics that request by counts
request by terms durations, which is what we were talking about.
So the counter metric here, the counter metric is issue request total which
measures the total number of requests handled by an issue proxy and the
distribution matrix is issue request duration milliseconds which measures,
time it has It takes for the STO proxy to process the HTTP request.
And then another distribution metric is STO request bytes and STO response
bytes, which measures the HTTP request and response body size, right?
so you can see here, the list of metrics.
And let's say if you want to see a specific metrics to NY, so you will get
the whole set of NY, metrics as well.
And it will be listed here.
Now, let's say you have these many metrics, but now you want
to visualize your data, right?
So if you just want to see what is the request volume per minute for your
application You will be able to see the information coming up here where
you can see the app name and sorry the pod name and the app names you will
start seeing here You can start seeing the request duration milliseconds.
You can start seeing the traffic on front end and from which particular
front end application and from what source the traffic is coming.
You can start seeing the traffic on your back end application and
What is the source of that traffic?
So in this particular scenario on my back end database i'm generating the
traffic through my sleep pod and through my front end But and here you can see
the outbound request by The response and the source application so you can
see that there are 200 okays for and 400 404 errors, and then we have a histogram
for the client request in milliseconds.
So it's very simple.
you just have to write, the nrql, which is new relic Query language and you will
be able to create these visualizations.
Similarly, we have something called Visualization for the request response
code where you can see how much percentage of time it is Going for 200.
Okay, and of 404.
So this is how easy it is to Monitor your issue service mesh by
leveraging any specific tool again, as in this case, it was New Relic.
So I hope you enjoyed this session.
If there are any queries related to how to set up the environment
or how to start monitoring it, feel free to reach out to me on LinkedIn
and happy to connect with you.
Thank you and have a nice day.