Transcript
This transcript was autogenerated. To make changes, submit a PR.
Monitoring observability. It's obviously incredibly crucial
whether you're on prem cloud kubernetes,
standard containers, wherever you're running, you need to
understand what's happening in your environment, whether it's monitoring.
So graphs, alerting, seeing everything on a screen,
understanding what's happening from that perspective, or observability,
which is more around the idea of taking action.
My name is Michael Levan. Welcome to my session at Conf
42. We're going to dive into a bunch of hands on stuff,
but primarily I'm going to show two different realms of focus.
The homegrown or open source style solutions and
the enterprise solutions. We're going to walk through installing both.
We're going to do data dog on Q Prometheus and we're going to talk about
the differences, which should hopefully help you decide which one you're going to go within
your organization. Let's go ahead and jump right in. Let's start by
diving into cube Prometheus. Alright, so I'm going
to open up my terminal, do a quick cube Ctl, get nodes here
we can see I am on an EKS cluster.
It's not going to matter though, if you're on aks GkE on
prem, all these steps should be relatively similar.
Okay, so the first thing that you're going to do, you're going to want to
add the helm chart for Q Prometheus. I essentially do everything
via helm chart. Why? It's a great package
manager. It's much better than just going and calling out to a bunch of kubernetes
manifests. And instead of again using 567 different
kubernetes manifests, everything is under one roof. So I typically
go with helm charts. Next, going to go ahead and update
the repo. Once that's done, we will install
Cube Prometheus. Okay, now as the name
sounds, Cube Prometheus is going to be a combination
of Prometheus and grafana.
Can you install these separate? Absolutely. But the reason why I
actually like to do it together is because Kube Prometheus gives
you a bunch of dashboards out of the box that are all kubernetes
related. So let's say I just install Grafana and Prometheus separately.
I'm not going to have any dashboards, but if I install Kube Prometheus,
it comes again pre installed with all these different Kubernetes dashboards,
which we'll go ahead and take a look at in a second. And once
this is installed, it takes of course a little bit because there's
a bunch of different pods that need to come up. You can forward and look
at Prometheus via port forwarding, or you can just go ahead
and hit Grafana, right? So let's go ahead and do
this. That way we can get a nice visual,
right? And then let me go ahead and open up a web browser.
Web browser is up. We can see that here.
I'm going to go ahead and just take a look here. To log
in, the default username is admin, password is prom operator.
So admin from hyphen operator.
And now we're logged in. So what I was referring to
before, if I go to dashboards,
notice here how I have all these different Kubernetes dashboards.
You will not have this by default. And of
course if you want to, you can import a new one. So for example,
if we just take a look here,
we have the argo cd dashboard, for example. So what I can do is I
can actually copy the iD, go back
new, import, paste that
id in load, you can see it is in fact argo
cd import, and then boom, we have the dashboard. So it's pretty
straightforward. You can also write your own dashboards. I believe they're still written in
Python, at least they used to be, but nonetheless you can create your own.
But there are a lot out there already, so don't reinvent the wheel if you
don't have to. But if I go back to dashboards here and let's
say I click on Kubernetes API server. Now, I haven't made
any requests or anything to this, so it's probably not the best,
but we can see here again, another dashboard, compute resources,
some cpu information, some memory information, etcetera. But point
being is we can see the dashboards work and then if we
want to, we can get alerting on various dashboards and all that
fun stuff. So this is the monitoring piece,
and if you want the full observability stack for logs,
traces, metrics, you're gonna have to do prometheus, which is already here,
and then tempo and low key for traces and
logging, and then you'll have the full monitoring and observability stack.
But there are a couple things here, and it's not necessarily a
bad thing, it's just you got to kind of figure out what
option you want. So this is the homegrown solution. This is open source.
I'm not paying for anything, okay? But I actually am,
right? I'm paying for engineers to manage it, I'm paying for infrastructure,
because this has to run somewhere, so there are still costs. And again,
this isn't a bad thing. It's just all going to be dependent on your organization.
If you're a startup, for example, and everybody's already working
13 hours a day, adding another tool
may not be the best method. Or maybe it is, again, depending on
how the organization is structured. So let's say you
want all these tools, monitoring and observability and even APM
and alerting and a bunch of other stuff under one roof. Maybe it's
a SaaS so you don't have to manage the infrastructure or anything like that.
Probably want to look at a enterprise paid solution.
Okay. And that's kind of what we can get with Datadog.
Now, with Datadog, again, we get everything under one
roof, metrics, logs, full monitors,
service management, infrastructure management, APM, all of it.
All we have to do for this is if I go under my
and I click on API keys, right? I'm going to have an
API key here. I'll go ahead and I'll just
create a new one. We'll just call it con 42
create key, right? And then now I have this API key.
So if I copy it, I'm going to head back over to versus code.
Okay. And I'm just open up a new terminal here
and I'm going to paste in that API key, my cluster name,
ks. Quick start. Okay,
first thing you're going to want to do, going to want to sign up for
Datadog. It's free to sign up. You're not going to be paying for anything.
I've been doing demos on Datadog for a long time now and
haven't got a bill because I just delete my stuff right away. Okay.
But I'm going to set these environment variables.
I'm going to use helm. Okay. So if you don't have the data dog helm
chart, you're going to want to add it and update it. And then I'm going
to use this fairly large helm installation.
And the reason why is because this sets us up for high
availability. So we're going to see, you know, multiple replicas,
cube state metrics is enabled, we're enabling logging,
we're enabling all the logs for the containers.
So let's go ahead and run this and
it may take maybe two to three minutes to actually see
all the information within your environment.
Right? So if I head back over here, I click finish,
I'm going to go to dashboard. Oops,
sorry, infrastructure and kubernetes explorer.
Okay. And we can actually see all this stuff in here right
away, but I want to click on one other.
Let's see kubernetes overview. Okay, here it is. So if
I check here, I can see my cluster, I can see
all my namespaces. See the monitoring namespace, right. Because we deployed
Q Prometheus. And then if I click on explore,
I can see everything running here. So if I
look into one of these pods, maybe, you know, one of the Q Prometheus
pods, we can see the cluster, it's on the service
that it's in. Well in back of the monitoring namespace,
the host, the deployment, replica sets, ips, everything. We can see
everything here, even the metadata. Okay. We can see
any related resources which this is actually really cool. It's a little
graph here that we can see. Right. Troubleshooter.
I don't think we have anything on. Status is ready. Alright, so we're
good to go here. So we have the pod phase,
which is actually nice. We get a little bit of different information
here, logs, if we turn them on.
So any logs that are coming in through the pod,
okay, metrics, etcetera. So point
being is this, we have everything under one
roof. Of course if we install it, we have to install different things for trace
and stuff. But everything is under one roof.
Okay. So we can dive down. We also have a visual of
this, right? So we dive down, we see our clusters
running, we see our namespaces, see all of our workloads.
Okay. We see our networking.
And this is really solid. Now, Datadog is expensive,
don't get me wrong. But again, this is a good
implementation. If you want that enterprise,
I don't even want to say enterprise grade feel because you can get the same
feel from Grafana and the Prometheus stack.
But if you want that SaaS based solution that's set up for you,
you just have to run a couple of installations or even just one.
You got support behind you all that. Data dog is
a great implementation. Again, just keep in mind,
you know, never think that you're not paying because I
know a lot of people go open source because they don't want to pay.
Either way you're paying. You're either paying engineers to manage it
and the infrastructure to run it on, or you're paying a SaaS solution.
It's really going to be up to you at the end of the day.
Thank you so much for joining me for the session. Really do appreciate it.
Hope that you enjoyed it.