Transcript
This transcript was autogenerated. To make changes, submit a PR.
Are you an SRE,
a developer,
a quality engineer who wants to tackle the challenge of improving
reliability in your DevOps? You can enable your DevOps for
reliability with chaos native. Create your
free account at Chaos native Litmus cloud hello
everyone, I'm Parveen Khan. I'm a senior QA consultant at Thoughtworks
which is based in London, UK. So today I'm
going to share my experience and talk about a peek into observability
from tester's lens. There might be quite a lot of you
all who already know about what observability is,
or there might be few people who are new to this and that's why you
are here for things conference. The purpose or aim of this talk
is to introduce you all to this topic and also
introduce you to show how it can be helpful for testers.
So before even jumping into the topic, I want to quickly take
you through a simple scenario and pray for this scenario goes
to Pierre set. I really liked how he
used this example to explain this concept, so I'm
using the same example. So imagine what
would you do if you come across this foggy road while you're driving
in a weird weather condition.
One thing comes to our mind is that we need to slow down,
right? But why do we need to slow down?
It's because we don't have the visibility of what's ahead
of us and we kind of consciously make the decision
based on the risk, right? So if
you're not able to drive, does that mean that we are
bad drivers and whats we can drive
fast enough on a forky road? Not at all. Right? In fact
we are good drivers because we are making decisions based on
risk. So does that mean whats the
car isn't good enough to drive faster? Not at all.
So our cars can drive much faster, but we just hold them back
because we know it's bad to drive when there's no visibility.
So we are kind of ultimately stuck, right?
But how about these planes all
the time? They fly among the clouds, right? So it is
because pilots have additional instruments to do that and they don't
have to solely rely on their eyes. So now
what is it to do with software development?
We all know the current trends is all about going faster and
we all are kind of adopting different practices like agile
DevOps, working in different distributed systems, working in microservices
and whatnot. We want
to deliver value quickly, right. By doing this all
we are doing is we are building faster cars,
we are moving into microservices, architecture or distributed
systems and the reason why we are doing is because
of simplicity of development, right? But on the others hand,
there's a lot of complexity and multiple moving parts at the same time, which means
it's more even more complex. When distributing
a system. We are also distributing the places where things might
go wrong.
So we know that we need visibility,
but how can we get that visibility into our system? So the answer to
that is by having observability. So before even going
ahead in trying to understand what observability is,
let's first try to understand why do we need it in first place.
Like I always look for real life examples to understand any given
concept and kind of don't get convinced by just reading through theoretical
concepts. So I'd love to share my real world experience with
you of how I came to an understanding
of why we need observability on the
system that we were working on. So I joined a
new team and the entire team was new.
So I had an opportunity to join this team and work on really exciting
and interesting product. It was a completely new domain for
me and it was kind of an automated invoice system which was built
on a microservices architecture. So the
work which we were doing as a team was to build new features
and also fix the bugs.
So as a tester, when we start working
on a new product, the first thing we try to do is to try
and understand the product. So the
more I was learning, the more I
was trying to learn, but the more I
was trying to learn about the product, I would feel that it's too complex.
And another reason to feel this
way was that I have seen a pattern, like a
pattern of a lot of tickets being marked as blocked.
I used to see a lot of production issues each day and
developers would pick up those and investigate
to find the root cause it. They used to
spend days and weeks kind of like and
then marketers blocked because they couldn't find any
information and they couldn't find why
it was causing this issue and they even couldn't find where the issue was.
That kind of made me think about like
what's wrong here? So at this point I
really stopped thinking that the product is complex as an issue.
So interestingly, at these same time I was trying to read a
lot about observability without even knowing when and
where it can be used.
Not just reading but using different kind of
tools. Whats promise to deliver observability to see what
does it bring.
So having a conversation at that point
I was trying to have some conversation with one of my team member who was
a developer. So that conversation gave me some food
for thought. Whats this is what we are missing on our product and
that's what is observability. So the conversation
was like a light bulb moment for me because that kind
of unlocked quite a lot of answers to few questions,
but it did open up a lot of questions too.
Of course I got an answer that we had
very less or kind of no visibility onto our system,
which is why a lot of issues were marked
as blocked as the developers could not debug.
And because these could not debug,
they could not find the root cause.
I keep talking about observability. Now let's
try look into what it is.
So there are quite a lot of definitions
that can be found if we try to google about it, but this is kind
of a simple one which I thought I can share.
So observability is a measure of how well
the internal started of a system can be inferred from
its external output. It means you
can answer any questions about what's happening on the inside of
the system just by observing the outside of the system and without
having to ship new code to answer new questions.
When systems are down, you need to find answers
by asking questions as quickly as possible.
Right? So the system
needs to be observable so that it can explain what's happening,
so that we can find out what's happening on the inside of the
system by just observing from the outside.
But how can we make the system observable?
The answer is by using the data. Now,
how can we get the data? And what type of data do we need to
have an observable system?
We can get the data to make the system observable by adding instrumentation.
And that instrumentation can give us the data
that can be in the form of, like, it could be
logs, it could be traces, or it could be metrics.
So now let's talk about each
of these before moving ahead with the story.
What are logs? Okay, a log is
a simple message which has some kind of information.
It might have a timestamp and a payload,
and that can help us give more
context. Right? Again, if you're talking about distributed systems,
we don't want to get into each different services and try to
look at these logs. So rather than having,
we should have them centralized at one place so it's easier.
Right? So let me tell you,
while I was working with this team, we used to have logs.
It's not like we didn't had any logs, but we did had some logs,
but they were all stored separately for each service.
And what we used was like we used n log.
And to access those logs, we had to access those
separately for each service. And only way to view those was to
open it in notepad plus plus.
So whenever there used to be an issue, we would end up like having
multiple notepad plus plus tabs open.
So it was such a pain to add
to that, the way we could search the logs was by using
control f. Can you imagine?
So this is the reason why logs
should be centralized, so that we can access all the logs at the central place,
and the log should be easily
searchable. And the way we can make it easily searchable is by
having the structured logs. Now, coming to
the metrics, a metric could
be a simple trending number, or it could be a simple value that
kind of like expresses some data, but the system. So these
metrics might represent different things. Like metrics might
have some name, the time and the value. So these metrics
are usually represented as counts
or measures, and kind of often calculated over a period of
time. For example, a system metric can tell you how
much memory is being used by a process out of the total,
and an application. Metrics can show you the number of requests
per second being handled by a service. Or it can tell you error
rate of an API and business metric could
be something like how long
does it take for a user to log in? Or how long does it take
for a user to do certain action while
using our product. So metrics are
really good at aggregating things, but not
really good at pinpointing specific detail about something.
Like at this particular time, this is the customer who was having a problem.
So how could we do that?
By using traces. So trace is
kind of like telling you a story which gives more low level
details. It kind of shows the entire flow of the request,
and I think it's kind of a really valuable while debugging.
So a single trace shows the activity for an individual transaction
or request or event as it flows through
an application. So it kind of shows the
end to end request. And traces
are kind of very critical part of observability, as they
kind of provide a lot of context.
Okay, so I've been
saying that with observability we can ask questions, but what
kind of questions can we ask? So I can give you an example of
some of the questions that can be asked.
Is something like, why is x broken?
So what service does my service depend on?
And whats services are depending on my service?
What went wrong during this release,
why has the performance degraded over the past quarter?
Or what logs should we look at right now? Or it
could be like, what did my service look at this point? X.
So just like how we talk about DevOps,
we cannot say we are doing DevOps by just having
some automated tool in place or by
having some sort of process in place. It's more than
tools, it's more than okay. So it is kind of a
cultural and mindset change. Similarly,
we cannot say we are doing or having observability
by just having some different tools in place and logging
some information. It is not just about getting the tools
and sending some data and trying to observe the system.
It is a cultural change.
Now, you might be thinking what's in for testers
with all the observability and all these new tools, why and how is
it useful for testers, and how testers can be helpful
and useful and they can use absorbability and how can they use it?
So testers are like,
you know, it is, it is easier to find more information
around the issues, right? So for
example, while we are testing,
we might see some unexpected behavior or maybe see
some kind of failures. So having access to these kind of tools
and having these kind of tools in place allows
the testers to look under the hood to find out what is happening with the
request. And not just that, but it also allows
testers to learn more about the system of
how it communicates and works. So like for
example, I would be using devtools to see what's
going on when something didn't look right
or while I was testing or while I was looking at from
the UI point of view. But I wouldn't get enough information
by just looking at the devtools. So by having these tools
in place helped me in getting more information that could
be added to the tickets while we are raising
the bugs, which can be helpful for the developers.
It's not just about finding the information while looking
into the issues, but it could help us uncover understanding
of our product, which is really very important for a tester.
Testers are really very curious explorers and great at asking
questions. So things could be a tool for exploring and asking questions.
So as a tester, I tend to ask a lot of questions when I don't
understand things.
Testers are great at exploratory testing, not just good
at asking questions, but testers are always curious to find
the information about these system. So while exploring the logs,
the metrics or the traces or any kind of data,
testers might point out where there is need for
more instrumentation. And not just
that, but it also supports and helps testers for testing in production.
So it allows the teams not just to shift left
but also to shift right. So I
really like this tweet by Mahd
and how this has been put together, saying that
a lot of times good debugging and good exploratory testing are both
indistinguishable. When developer explores, they call
it more often debugging, whether they know there is a problem or
just suspect there could be. When testers explore,
debugging is close to the last word being used.
So to summarize,
by making systems observable, anyone on the team can easily navigate
from effect to clue in the production system. It makes
it easier to debug. The goal of
observability is not just to collect the logs,
metrics or traces, but using the data
to get the feedback. It just doesn't allow us
to find the knowns of the system, but also allows us to know
the unknown unknowns.
So every learning experience and every journey has
something to take away. So I kind of had some
learnings as well to take away from this experience.
So the key takeaway for me was that we
as testers can go out of the way and think outside the box.
We care and advocate about quality and that
could be related to bringing in the improvements
in the process and bringing in the new tools related to test automation.
But that's not the limit. I learned that we do
not have to limit ourselves and say that this is not related to
testing. So let's not look into this or let's not learn
about this. I saw the problem my team was
going through and the problems were like developers getting
frustrated when they couldn't resolve production issues and the
product, others getting frustrated because they had to
answer the clients and these had no enough information related to
those production issues. I didn't knew the answer or
solution to it, but being active in the community and seeing new
tools and concepts and exploring them and then finding these
solution and then trying out myself
using open source tools and then presenting that as
a suggestion to my team by building a proof
of concept kind of help, not limiting myself to
testing tools only and trying to think outside the box
to help my team. And when
I left the team, we were not yet in the terms of complete
observability implementation, but we kind of had started our
first steps into it. So we were like from having no
visibility to we kind of
had structured and centralized logs that can be easily
querable. And we were then taking the next steps.
So to end with, I would like to say that observability gives
power to the entire team to get the visibility when needed.
And observability is much more powerful when
you apply with the right mindset and clear processes
in place. It allows these
teams to become proactive towards the issues rather than being reactive.
It kind of gives power and superpowers
everyone on the team, whether it's developers, whether it's ops engineers,
whether it's sres, or whether it's testers.
So thank you so much for joining
my session and listening to my story.
Happy to answer any questions if you have any, and be sure to
check out my blog post pervincans.com
and do follow me on Twitter at pervine.
Thank you so much.