Transcript
This transcript was autogenerated. To make changes, submit a PR.
Everyone, I hope you having a great time in the Kube native conference.
I am Twinkll Sisodia, software engineer at Red Hat and I
work with red Hat partners to build their robust cloud native architectures.
So today we'll be looking into why monitoring is so important.
We'll look closely into what Prometheus is, its usage
and its components. Then we will look into
Grafana, how it can be used to visualize metric data,
and lastly in the demo, I'll walk you through how an
application is deployed on Openshift and how it can be monitored
efficiently. We'll also be using an observability
operator which will deploy Prometheus and alert manager instances for us.
So as we all know how cctv cameras are used for
safety and security purposes. Similar to that,
we have few tools like Prometheus Grafana which is like cctvs
for our system, say your cpu,
your memory utilization reached critical limits,
or your Kubernetes resources like pods deployments
failed. So in this case monitoring can
help and will minimize the risk of server down or unavailability
of resources. And with that it will also help in proactive
management of clusters. So for monitoring, we have few
open source tools and one of them is Prometheus.
Prometheus collects and stores its metrics as time series data,
and it was designed to monitoring highly dynamic container environments like
kubernetes, Docker, swarm,
say there are many servers on which containers are running
and they are all interconnected. Now,
maintaining such complex systems becomes really very challenging and to make
sure that everything runs smoothly without downtimes.
Now imagine having multiple such infrastructures and you have no
idea what's going inside it, either in the hardware level or in the application level,
like errors, response latencies,
overloaded, hardware down, or maybe running out of resources,
et cetera. So this complexity would be minimized
if you have a tool which constantly monitors your resources
and activities, which is happening inside the cluster and alerts
whenever something critical happens. So all this automated
monitoring and alerting is what Prometheus offers as part
of modern DevOps workflow. Now, for us to
enable monitoring, we would need few Prometheus components,
and I'll start with service monitors.
Service monitors specify which services should be monitored.
In place of service monitor. You can also use pod monitors.
The difference is it specify which pods the Prometheus should
monitor. Next we would need Prometheus rules.
Now, Prometheus rule defines recording and alerting
rules. Recording rule allow you to pre compute
frequently used data and alerting rules specify when
should we get the alerts like setting up the thresholds.
Next we'll need alert manager config which specifies
config for the alerts and custom receivers like Slack,
pagerduty, etc. So this is
a short glance of service monitors in
this namespace selector has all the namespaces it will
monitoring. The selector has the label for the app
blue which it will match. And lastly the endpoint
is HTTP port. Next is
the Prometheus rule. It contains alerting and recording rule.
In this example the app request per minute
is greater than 20, so it will send low
load alert and so on, so forth medium high.
Next is the alert manager config secret. It has
the API URL for the slack workspace and it has the channel
name to which all the notifications will be sent.
So so far we have seen how Prometheus works and how
it collects and stores its metrics as time series data.
Now let's see how we can visualize those data effectively on Grafana.
And what's Grafana? Grafana is
an open source software which enables us to query,
visualize, alert on, and also explore metrics,
logs, traces, wherever they are stored.
Grafana provides us with tools to turn
the time series database into insightful graphs
and visualizations. Now these are the Grafana
operator components we would need. So on the Grafana side
we would need Grafana data source and Grafana
dashboard. This is a short glimpse of how
the data source manifest looks like it takes the
Prometheus service URL, it takes the type the database
type as Prometheus. So now this is an
architecture diagram I'll be implementing in the demo how
I monitored an application and got metrics out of it and visualized
on Grafana. So here you can
see we will deploy on an openshift dedicated cluster.
We'll have a blue application in the blue namespace,
an observability operator in the monitor namespace which is responsible
for creating like instances of Prometheus and alert manager.
Here Prometheus will scrape the metrics from the blue app and it
will send alerts to alert manager, which will then
send the alerts to slack as notifications. And lastly,
the Grafana dashboard will visualize metric information in the
form of graphs. So now let's
move on to our demo. On the right hand side you can see the red
hat openshift dedicated cluster, and on the left bottom corner
you can see the slack workspace where all the notifications and alerts
will be coming. So on the openshift dedicated cluster we have
two namespaces, one for the blue application which is deployed already,
and the other for the observability operator and its instance
which is up and running already. Next I'm
going to create the Prometheus components like service monitors, Prometheus rules,
and alert manager. So I'll
create the service monitors.
Service monitor is up. I'll create the Prometheus
rules. After that I'll create alert
manager secret okay
so the Prometheus components are in place. Next I'll
create the cluster role and cluster role bindings so that
the monitor namespace will have the permission to scrape
the metrics from the blue namespace.
The cluster role blue view is created. Next I'll
create the cluster role binding so
the role binding is now created.
I'll port forward the Prometheus pod
and let's see how the Prometheus dashboard looks like.
So this is the Prometheus dashboard. If I navigate to alerts
I can see all the lets like high medium, low. If I navigate
to rules I can see the recording rules
and the alerting rules. And lastly if
I go to targets I can see the blue application which we
have deployed recently up. Now let's trigger
this blue application and see how we get the alerts on slack.
So I'll created
and curl it for at
least 25 times.
Youll once the threshold is met we can see the
alerts popping up on the Slack channel.
This shouldn't take time, should be like 25 to 30 seconds.
So you can clearly see that the alerts are getting triggered low load,
medium load. So on expanding one of these alerts we
can see the metadata like where this alert is coming from. Like the
alert name, the container name, endpoint IP address,
the namespace path et how.
So this is a small use case of how an organization
can use all these monitoring tools. Like Prometheus,
we can use slack alert manager to enhances
the workflow and this is how one can minimize the risk
of downtime. So far we have seen how we used Prometheus and
alert manager to send alerts to slack. Now let's see how
we use that data to turn it into insightful graphs and
visualizations using Grafana. Let's move to operator
hub and install Grafana operator.
I'll install it into the monitor namespace and
once the Grafana operator is installed I'll go forward and
create its instance and data source. And then we'll port forward
the Grafana pod to look how the dashboard looks like.
So the Grafana operator is installed,
I'll go forward and create its instance.
The instance is created. Next I'll create the Grafana
data source and
the data source is created. Now I'll see if the pods are
up and running or not. It is not.
Okay, now it's up and running.
So I'll port forward the Grafana pod at
port nine. At port 3000 it's
put forwarded. Let's put forward to
3000 and sign in with
the same username and password I provided in
the Grafana instance.
Now before proceeding, I'll just quickly confirm if my data
source is working. See when test
my data source is working fine. I'll navigate
to import and quickly import my sample dashboard
which I created. Now you can create your own or
just import it from the Grafana website.
I'll rename it to blue dashboard and import it here.
You can see we are getting different metrics. Starting with alerts.
We can see which alerts are being triggered recently.
So high load, low load and medium load are alerts
which are triggered. What was the alert state, which container
it was, what was the endpoint, et cetera.
Next we see that blue request per minute
metrics. So this metrics show that how many requests
were there per minute for the blue application. Apart from that
we can see response status, process,
cpu, and lastly the up metrics. The up metrics
show that how many containers are up currently. So there are
one out manager, one for blue application and one for
Prometheus which are up and running. So this
is how you can use a data source like
Prometheus and convert the data into insightful graphs and visualizations.
This will help can sre to be mindful
of all the resources and all the costs involved.
And with that it will also help organizations to minimize
their downtime.
And that concludes my presentation.
Just to summarize what we have discussed so far.
So we have talked about the importance of monitoring.
We have discussed about Prometheus Grafana components involved
in the demo. We deployed an app, we deployed observability operator
which installed Prometheus and alert manager. And finally
we sent alerts to slack. Then we deployed
Grafana operator and its components. And lastly we
imported custom dashboards to see insightful graphs us.
So here I would like to thank everyone. I hope
you all enjoyed it. If you want to get connected,
I'm there on LinkedIn and if
anyone wants to do like a hands on you can visit my
GitHub repository. It has all the in depth details.
Read me for that. So yeah, thanks everyone.