Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey y'all, hope you're doing great.
Welcome to Conf42 Platform Engineering, I'm Jony Paul.
Today, we will deep dive in the importance, the role of
observability in platform engineering.
Thanks for tuning in.
I'm Jony Paul.
So here is me, Neerja, a developer advocate at Middleware and
Observatory Classroom, building various DevOps communities, running
communities like Google Cloud, CNCF, Docker, managing more than 15 plus
hackathons, So yeah, let's go ahead.
So what's in store for us?
Why absolutely matters in a platform engineering.
So what are the key benefits?
First is, optimizes resource allocation.
It will drive the event in the sense like, is there a resource scarcity?
Or the resources are abundant?
It can easily analyze from the logs, from the matrices.
AXS accelerates troubleshooting.
From logs, we can easily, not easily, but yeah, we can get a lot
of things about troubleshooting, how the error will be going.
AXS enhances developer experience, improves reliability, enables
data driven decision making.
Decision making always happens from understanding the data, and
data plays a quite important role.
Of our next future predictions,
so we're going to discuss all of these points ahead in
our next presentation slides
so core objectives Of how's it ready?
Let's understand it first and then we can go ahead the core objectives
are three major Log management, so we collect different type of
logs application logs Service logs different sort of logs to understand
the performance impact of the behavior and All services, which are going on
gathering mattresses, even just the hand performance here and to try to
understand the metrics, isn't it?
The graph format will understand it in a better way.
So if you want to, and troubleshoot the applications proactively,
then your application will have less downtime and will have higher
performance but more efficiency.
That's the general thing, but yeah, elementary always helps
you to improve your application.
It is tracing.
Traces, track the journey, request through services.
It can help to understand the bottlenecks.
The issues which you are laying down behind your performance,
which you are downgrading.
So it will help you to understand what could be the actual reason of that.
Let's understand the role of the observability first versus
optimized resource allocation.
So Correlated Usage Data works with system demand to scale a resource
proactively and efficiently.
So we have data of a workplace application of how the application
has worked or let's say how a single piece of product has been sold.
So in that manner we can understand the demand and in the same manner
we can understand how many resources are required more in the future.
In that sense.
Let's take one example.
This proactive scaling.
By monitoring CPU and everything,
teams can understand in which sort of scenario they need
to scale up the resources.
Scale down the resources and when is the peak time and when is the
low time where they need to upscale.
Teams can easily understand from evaluating from
observing their applications.
Next is cost efficiency.
Azure AD literally helps to reduce your cost.
Because some of their resources are under provision and if you don't know That only
10 20% of the resources are used and you are being 70, 80% more money than anyone.
It can definitely, you can definitely understand and from the monitoring
mattresses and everything that you need to downgrade your resources and that's not,
which can impact you across billing a lot.
Next is extra troubleshooting.
So we can quickly identify and resolve issues by logs,
traces, patterns, effectively.
So for root cause analysis, one of the good examples is root cause analysis.
So for that, engineers can trace issues back to their origin,
whether it's misconfiguration.
The goal, but if the Monitoring of the ready is not setup, then it will be very
hard to get what's the actual error or what's the actual problem they are facing.
So it eventually helps.
Next is incident response, so real time alerts so if they are using Some
of the platforms for us to bring in there, it can differently alert them
and so they can have a lower downtime and optimize their product or optimize
their services in that particular time.
Next is enhanced developer experience.
The developer can understand in that scenario what should be the better
thing, how the performance of the applications is better in production
or is helping them understand how changes, how some changes in new
applications could affect the sales or anything, sales or services or anything.
From monitoring, from process, from logs, it is definitely helpful for the end.
Because it will lessen the time for troubleshooting and more time
for delivery of new features.
It has improved reliability, continuously monitored, refined platform components.
For better uptime and stability.
For uptime monitoring, other tools, tech systems, and it
immediately kills the alert.
So if you have proper auxiliary setup, you can have a message on Slack, message
on emails, message on Teams, anywhere.
Most of the auxiliary platform deals with this.
So if you got the alert, then you can definitely try troubleshoot and
get back your application in live.
Anyway, rejection.
So it was auxiliary solutions use machine learning.
We will definitely try to get this in our next slides, where we will discuss
what are the future trends in obsolete.
Yeah, ML, EI, definitely help you understand the anomaly detection.
Yeah, that's a great point.
Let's have a lean example.
On a shopping sale, if your bank is your dad's wallet,
Then you can, nothing can stop you to buy everything.
Learning it was a black friday, most of the people shopped, got on the shop
and tried to collect everything because there is a lot of sale, huge discounts.
People collect a lot of things, people purchase a lot of things.
So to understand that sort of scenario, If you are monitoring everything,
so in this year, this sort of thing happened like X, your sales
were there compared to normal day.
So you can understand like how many times.
So it will eventually have you to understand how people, which
product, which thing is people have purchased and that dilution.
So it will eventually help you to understand the analysis and understand
and do the analysis for the next time for future why we do analysis and monitoring
to improve our application and our future prediction and in what manner
it will drive the train how to retrain so always there are some challenges
so let's see what are the challenges
Data overload, because when we collect data, we don't know what
data is important, what and not to.
So we always try to understand which data is important for us.
We don't capture all the data and make a stack of everything.
Next is integration complexity, because people nowadays have multi cloud
setup and using different things.
So it is definitely hard to have an observatory.
So to integrate them, it is very difficult.
The skill gaps are so big.
People need to understand to implement EIMO and their operating.
So what are the solutions?
So use automated tools to filter and monitor the data.
Because as I already told, you don't need to create stacks of data.
Stacks of logs.
You need to understand that data
is important.
But which data?
Proper data.
Not every data.
EIMO.
We need to understand, we need to set up rules, flag, and slate, those
logs which are really important for us, which are like secure to
bridge users, and is significant, which impacts our performance.
We need to understand that sort of logs, and we need to
put alerts, filters over them.
This is selective data collection.
We will only select captured data, so what's our business object, and that's the
sort of thing we will collect the data.
So we always need a unified platform because we have a multi cloud setup.
We need observating where we go for logs with other environment, traces with
other environment, monitoring with other environment, and it always create chaos.
We need a single platform where logs, traces, monitoring,
everything could be possible.
Even downloads with different integrations with different third
party applications are always possible.
For example, you can see in middleware, so in middleware, you can connect
with many databases, every cloud provider, seamlessly, you don't need
to put your lot of efforts to it.
So always a unified object platform is required.
What's the future of obsolete and platforms?
AI amelioration because always, as I previously discussed, AI plays an
important role for troubleshooting.
We m like that general error.
Using AI to make our absolutely our analysis, more pro, more impactful,
that can eventually help us to detect many things and eventually have to
make future decisions, is already extending ability to edge computing.
That's really a tough.
Very wide topic and eventually in next few years, it will be a boom.
Open standards, the growth of production of OpenTelemetry, new releases, new
features every time OpenTelemetry have.
And similar from some framework for free is giving a lot of
edge to people in our space.
Let's discuss some of the best practices for implementing observability.
Start with clear objectives.
Implement end to end tracing.
Leverage automation.
Adopt a continuous improvement mindset.
Promote collection of observability in your company, in your space.
Regulate validated objective tools and processes.
So it's an end to end observatory with class measuring what's
important and you need to collaborate both of thing to improve your
performance, efficiency of your app.
With me for any queries, feel free to reach out to me.
Any suggestions and feedback will always appreciate it.
Thank you.
Have a nice day ahead.