Abstract
There are many opinions on DevOps, open source, and observability, but what is actually being practiced? What can we learn from the collective experience of the community? We went and surveyed over 1000 engineers across the globe about their DevOps practices, challenges, and more, with special focus on enterprise observability. This session will share data and insights from the survey, with key trends (compared to previous years’ DevOps Pulse surveys), points of interest, and challenges that developers experience on a daily basis.
This session will help you learn from the collective experience and emerging best practices in the community, to help guide decisions on processes, tooling and architecture choices.
The survey analyzes topics such as:
- What are your challenges with running Kubernetes in production?
- How long does it take to troubleshoot production issues?
- Which tools do you use for ticketing, event correlation and notifications?
- Who is responsible for ensuring observability?
- How do enterprises handle shared services? And much more.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Um,
hello everyone. Glad to be back at Sreconf.
Went 22 and thanks for inviting me again this years to
speak. I hope that that means that I wasn't too boring last year round.
And this year I'd like to talk to you about the state of DevOps and
observability in 2022.
I'm going to use the data from DevOps Pulse.
This is the years survey that we run at blogs IO,
my company. Essentially it's a questionnaire that people like
you answer. Over 1000 people
answered the last surveys from various companies, all the
way from startups with dozen employees to enterprises
with 5000 employees or more from different
countries, different industries.
First, I'm really glad to say I was
very enthusiastic to see that on the gender diversity side,
we're improving. On the last survey I said,
and I shared, that 86% of those who answered were male.
And this year I'm glad to say that only
79% are male and we have 15% female.
So encouraging. And also you got the first stat
of the survey. The survey covers many areas.
In this very short talk, I'd like to use
the coming minutes to look into some common assumptions
around cloud, cloud native and DevOps on
what people use, what the issues are, what the
solutions they used, and check how these assumptions hold
true in light of the results.
A word about myself. My name is Dotan Horowitz.
I'm the principal developer advocate at
Logs IO. At logs IO, we provide a cloud native
observability platform that's based on popular open source
stacks such as elasticsearch, OpenSearch processes,
Jaeger, open telemetry and so on.
I've been around for quite some time, both as a developer,
a solutions architect, a product manager.
I'm also an advocate of open source software, open standards and
communities in general, and the CNCF, the cloud
Native computing Foundation in particular.
I co organize the local CNCF chapter in Tel Aviv.
So if you're around, do join one of our monthly meetups.
I also run a podcast called Open Observability talks,
so if you're interested in open source DevOps observability,
do check it out on all your favorite podcast apps. And in
general, you can find me everywhere at Horvitz.
So if you are treating anything interesting out of this talk, do feel
free to tag me. And let's go straight to the
first assumption.
Everybody is in AWS.
What do you think? True false.
It was very clear on the survey, AWS still rules.
And in fact, in this survey,
AWS has increased its market share from 66%
last survey to 71 this
time around who run there in AWS.
Also, Azure and Google Cloud have significantly bumped up their adoption,
from around 1112 percent on the last survey
to around 30 this time, as you can see. In fact,
last time Azure was second place on this survey,
Google Cloud runs second place. So 32 on
the Google cloud and 29% on Azure, as you can see.
And still, we need to remember that they're still behind
AWS quite significantly, and most
of the rest are pretty much non existent.
Perhaps vMware, as you can see, is showing some strong
signs. So that's about public
clouds and cloud infrastructure. And let's go
on to the next assumption.
Everything is containerized.
What do you think? True false.
Ah, it's definitely happening. Over 50% said
that more than half of their apps are containerized, and more
impressive is that 30% said that over three four of their apps
are containerized. So that's pretty very impressive.
And do remember, it's not just young startups
who answer this survey, but also enterprises with 5000 employees
or more. So it's definitely happening across the board.
And if we talk about containerization, then obviously the
next topic is kubernetes.
So let's go on to the next assumption.
Kubernetes. A piece of cake.
Just give me Yaml and I'll manage your containers.
What do you think? True false.
Not really true. People reported challenges
across the board. The top difficulties that people reported in
this survey were with security and with monitoring and troubleshooting.
But people also reported issues with networking and cluster management
and storage. You name it, you can see that here on the screen.
In short, everyone's moving to containers,
as we've seen, but still don't know
how to manage them, how to do it right. Simple production
ready. And as we
said, monitoring and troubleshooting is a top challenge. Which leads
me to the next assumption.
Monitoring and troubleshooting. Just use metrics
and logs, you fool. What's new? We've being doing that
for ages. Right? Right.
Indeed. Logging metrics are still
the most common. 80 90% use them. Not surprising,
as you can see here. If you summarize, there's another bar here
on all of the above. So if you summarize the basis for the specifics
and the bar with all of the above, you'll see that it's around 88%
for logs, 80% for metrics. So definitely there.
Interestingly, distributed tracing increases its adoption
with around 48%. Nearly half
of the companies doing something with distributed tracing.
So that for me was astonishing.
And it actually continues the strong momentum trend we've
seen in the previous year's surveys. So 48%
this year, 26% on the last survey, and 19% the
year before that. So you can definitely see the trend.
It's happening. This is where the tracing is having a
very strong momentum. Another interesting
thing here to mention is APM that,
although perceived as traditional tools,
is used vastly 43% of
the users use APM. And maybe the most
impressive on this year's survey, at least for me,
is that 21% use all of
the above. More than fifth.
One fifth of the people use all of the above. And that's
a significant step towards adoption of
full observability. I've been preaching for that for
quite some time, and if you read my blogs and articles and podcasts
and everything, and it's really, really encouraging to see that people realize
that logs are not enough, even not logs and metrics, and you
need the combination of signals and the correlation of data to
actually gain observability into your system.
And going back to the trend
around distributed tracing adoption, among those
who don't yet use distributed tracing,
70% are planning to start using it
in the coming one or two years.
70%.
So, to summarize, tracing definitely
stands out as a central tool for monitoring microservices undistributed
systems, of course augmenting logs and metrics
as we, as we said, and to be honest, we've seen a
strong momentum also on past surveys and people expected adopting
it very quickly. We've seen slightly slower
adoption than expected. However, you've seen the numbers, it's definitely
picking up. And let's move on
to the next assumption.
We're getting better on our MTTR,
the meantime to resolution or meantime to recovery.
True? False. What do you think?
When we asked people on the survey,
68% said that they're getting better
with MTTR. You can see here the the breakdown 14%
said that they greatly reduced MTTR,
23% said they were making great strides in reducing MTTR,
31% said that they're slowly making progress, but still
68% indicated that they're improving
on their MTTR. That's very,
very positive and encouraging answer, right?
However, the actual MTTR numbers are,
how shall I say it, less optimistic.
When we actually asked the numbers around 64%
of the surveys, respondents reported that
their MTTR during production incidents was over an
hour. Over an hour, 64% nearly
two thirds of the people. And if you
compare that to last year, it's increased from 47%
last report to 64% this year. So it's not just high,
it's also increasing in a very, very fast pace.
So I'm not sure how better we are
getting at this at the MTTR reduction.
So let's summarize the takeaways. So far we've
seen that everyone's moving the workloads to containers
and to kubernetes, but still experiencing many
challenges operating kubernetes in production. The top challenges
were with security and with monitoring and troubleshooting.
Around a third of their respondents reported that
we also far from taming the MTTR,
in fact it's increasing, perhaps as a side effect
of the growing adoption of kubernetes and cloud native
architectures. And we've seen nearly two thirds
of the people take more than an hour to
reach the full recovery. And speaking about
monitoring challenges, distributed tracing is rising
in popularity for monitoring and troubleshooting microservices,
alongside logs and metrics, of course.
And also we've seen that more people use
full observability leveraging logs, metrics,
traces and APm. Over one fifth of the
people, 21%, are already using all of the
above. So that's it for
this survey. But wait,
what about security? What about data volumes,
cost, open source team structure?
Don't worry, you can find the full survey
in this link with all of these, the above topics,
and even more so do check it out. I prepared a
short link for you so it'd be easy to remember. Or you can just
take a screenshot bitly DevOps
2022 and you can find the full results
there. You can also look at and see the surveys
from past years. So interesting to also see the trends over year
over year. And of course
you're more than welcome to share your feedback. You can find me at
Horowitz Horovits. So if you
have any feedback on the survey, on the talk,
on my insights, or my maybe misinterpretation of the
or anything else, feel free to reach out to me at Twitter
LinkedIn medium, whichever. I'd be more than happy to
catch up. I'm Dotan Horvitz and thank you very much
for listening.