Transcript
This transcript was autogenerated. To make changes, submit a PR.
You. Hello. Well thanks for joining me here today and
allowing me to stand up and use my clicker and everything's
a privilege and thank you very much for joining. So to kick
things off, my name is Chris Nesbittsmith. I'm based
in London and currently work with some well known brands like Learn K
eight, Control Plane, Esnergy and various
bits of UK government. I'm also a tinkerer,
open source stuff. I've been using or abusing
Kubernetes in production since it was zero four.
So believe me when I say it's been a journey.
I've definitely got the scars and wall wounds to show for it.
We'll have time to be able to deal with questions in the chat,
so please do drop a line in and let
me know kind of where you're joining from and any questions that
youve may have. If I don't get to them in here then
please do feel free to find me on LinkedIn and
have a conversation there so
the history of pets versus cattle terminology is muddy,
but most link it to a presentation by Bill Baker from Microsoft
made in around 2006 around scaling the SQL server.
Way back then in the before times we called ourselves sysadmins and
treated our servers like pets. For example,
Bob the mail server. So if Bob goes down, it's all hands to the pumps,
the CEO can't get his email and it's near. On the end
of the world we see some incantations, make some sacrifices
to an altar and resuscitate Bob, bringing him back from the dead.
Crisis averted. Cue the applause and accolades for
our valiant Chris admins who stayed up late into the nights.
In the new world, however, servers are numbered or maybe just
given a UUID. So they are like cattle
in a herd. For example, web one
to web 100. So when one server goes down,
it's taken out the back shot and replaced on the line.
So why am I telling you this rather morbid story? Well, Kubernetes deals
with all of this, right? And saves us from the tyranny.
And you're right, it does. All of your computers are called nodes and they're
abstracted and given arbitrary names. Auto scaling
groups and such will automatically detect the sick in your flock,
take them out and bring a replacement in, all while seamlessly
rescheduling the workload that was on that failed machine.
And Kubernetes takes that a step further. Your workload also
has unique names, so like the physical servers,
your workload failures can be detected and replaced seamlessly.
So where's the pet, you might ask.
Well, what's the first thing
we do with a brand new Kubernetes cluster?
I'll give youve a hint. It's not deploying your application or actually
anything that the business itself cares about.
Something like that look vaguely familiar? Yeah, we had to do
a load of things just to make this cluster able
to start running our workloads.
And it's worth noting that with a trend towards more and more features being
kind of out of tree, which is to say that they are optional
add ons and don't ship as part of core Kubernetes.
So example of this, things like flex volumes policy
and basically all the Kubernetes Sig projects that many find
essential is only exasperating the issue.
Well, that might work for when you've got, say, a single
cluster, but what about when you've got dev integration,
staging and Qa that your app all
needs to run on? Or worse,
when you need separation between your teams or products.
So maybe you've automated all of that, say some bash,
ansible, terraform, whatever you like. Well, cool,
good on you. However, you'll find it won't be
long before there's an updated version, perhaps patching a vulnerability
that you care about. And you may be stuck trying to test every single
app and permutation across your estate.
So this is what we're calling day two operations. We used to call it BaU,
or business as usual, and it's where we find reality catching
up with our idealistic good intentions.
So you'll quickly find that clusters are running various versions.
So given the rate of change in the community and industry,
it's unrealistic to run like latest everywhere
confidently, at least without breaking production and disrupting
your operational teams. So if permutations
of seemingly common tools and
choices. So some teams might use Kong,
others Nginx, another Apache, another envoy, all for
plenty good reasons I'm sure will
find yourself seamlessly infinite possibilities across
the estate. Emerging sad
times, right? Congratulations, you're now the proud
owner of a pet shop. Or if youve managed to automate the creation
of them, you can call it maybe a pet factory,
but it's a headache.
So how does this hurt you, you might ask?
Well, maybe you like pets. Well,
assuming, of course, you're in cloud, your world could roughly be summarized
into a few tiers. So, apps, well, these are the
things that your boardroom level can know about and can
probably name them. So think your public website, shopping cart system,
customer service apps, online chat interfaces, email systems,
and so on. So these are all implicitly providing some
value in of themselves to your end customers.
Infrastructure with cloud Chris is hopefully all a commodity.
Thankfully the days where anyone in your business should be caring
about the challenges of physically racking up hardware got overloading
the weight in the cabinet, taking pride in how well they've rooted,
all the cables have hopefully all passed and
you're now just consuming infrastructure. Hopefully you've codified this.
But even if you're into click ops, making sure it's running is
not really your problem. No one in your business is
concerned with hardware failures patching routers every time there's
a critical vulnerability, testing the ups and generators,
regularly, upgrading the HVAC when you add more kit
and so on. Yawnorama as
my 16 year old would say, and then curse me for repeating.
But your interactions with any of this are basically
a few clicks or lines of code and some infra is
then available to you with an SLA attached to it from your cloud vendor.
If only the story ended there though.
Sandwiched between those is a gray layer of
all the operational enablers, so it's where your DevOps or SRE
team live. So think log aggregation, certificate issuers,
security policies, monitoring service mesh and others.
These are all the things you do because of all of the sorts
of reasons ranging from risk mitigation to emotion and technically unqualified
opinion, or just without the foresight of what was round the corner in
say, six months. Let's just make the
leap and assume for a minute that you are more
technically competent than your Goliath multi billion dollar cloud vendor.
You've completely negated many of the benefits of going to cloud in the
first place by ripping up that shared responsibility model.
All of this, whilst technically fascinating for people like me to
stand and stroke my beard at this, is delivering
absolutely zero business value. Unless of course your
business is building or training on those products.
And who'd want to get into that business, eh?
And that's not all. What about recruitment?
So you might think that you want a DevOps, right? Oh no wait,
a DevOps with Kubernetes experience. So maybe a CKA
or cks. Oh yeah, well it's on AWS and
we use Linkerd and in some places istio got the
current version or even the same version everywhere. A mix of
pod security policy, Caverno OpA policy,
some terraform, helm, Jenkins, GitHub action, super going
on, all in a monorupo. Apart from all that stuff that isn't well,
we're well outside the remit of commodity skills and back
to hunting unicorns. Sure, you'll find some victims.
Sorry, I mean candidates that you'll hire.
Well, now you've got one hell of an onboarding issue before
they can do anything useful and help your business move forwards faster than
it did without them. And if you've hired smart
people, they'll come with experience and their own opinions
of what worked for them before. So your landscapes get
bigger and bigger and more complex and diverse.
I did some googling, so this is what the CNCF landscape
looked back way back in 2017.
Choices, right? Choices and logos as far as the eye can see.
Have you seen it recently though? I mean,
this has got a bit out of hand. I'd say someone might shorts
have a word, but I suspect that would probably just make things worse
by adding yet more things on the board.
And don't get me started on operators. I mean, nice idea,
but they end up betraying any of the ideas of immutability with
crazy level of abstraction. And have youve seen the
crazy of mutating admission controllers too? I mean, if you're
really mad, you could nest these things with operators that create
crds for other operators that are all mutated.
I mean, heaven forbid someone bumps the version of anything.
And no doubt all held together with sticky tape, chewing gum,
glue, pipe cleaners, thoughts and prayers and
helm, a string based templating engine
where any community module has to eventually expose every
parameter in every object file abstracted by a glorified string replace.
So now I've got to have in my head all of the complexities of
a Linux or Windows host, how the container runtime works,
the software defined networking storage, the hypervisor before the
container, the scheduler, the controllers, the author policy, the mutating policy
in the cluster before I worry about how
someone in the nested helm chart mess of hell has mapped
the replica count of one of the deployments to a string called DB replicaccount,
and how that has changed on a new version of a dependency that wasn't
following semver to database replica count.
So instead of having my expected three,
I've now only got one, when I could have just written a
yaml patch for the replica account in the deployment object of the database
resource using a stable API versioning with the schema validation
all for free. The kids doing
kubernetes seem to have not learned from the past.
Don't get me wrong, I love the open source community with all
my heart, and it's so important and it's simply not
possible to do anything without it. Sorry, not sorry. Yes. As a sidebar,
every talk, pretty much this decade has got to reference log
four j. This is my slide. Deal with it. It's not relevant. It will come
out, hopefully soon. Everything,
literally everything that exists around us depends upon
it. And the community is brilliant at building some truly
remarkable, very high quality things.
But we must accept that
the open source community is awful at packaging things up
in this way for consumption, introducing needless abstractions.
But enough of that. I'm definitely going to hell. You can send me all
your hate in the mail. Okay, happy place, Chris. Happy place.
So where was I? Okay, yes.
Through all of this, I can't possibly think of a faster way to
go. From enthusiastic engineers playing with the
new, exciting, shiny tech to deeply unhappy
ones trying to fix something at 04:00 a.m. And before
they can do anything meaningful, they've got an orienteering exercise
to switch mental context to whatever the intended permutation of
things it is that they're meant to be looking at. Meanwhile, your business
value delivering apps are all offline, or perhaps worse,
at breach. Okay, so rewind a
minute. We didn't want any of these things. How do we get
here and what can we do about it?
Honestly? Bin it. Bin it all,
kill it with fire, and then learn
to love vanilla. Vanilla is great and delicious,
too. Does anyone remember kiss?
No, not the band. Keep it stupid simple,
or keep it simple stupid. And embrace
the shared responsibility model on offer. Make your cloud vendors do
more than just provide compute. Turns out, as it happens, they're actually not
that bad at doing it. I'm not daft. I know it's
not sexy and exciting, and you might even find recruitment a bit harder if you're
used to hunting magpies who follow the shiny and don't
like boring stuff, that just works.
So, to answer the question posed by the title of my talk,
is it time to put your Kubernetes cluster down?
Yes, it is. And in the immortal
words of s club seven, if you can bring it on
back immutably from code, all without anyone noticing,
I'm referring to the original version of those lyrics. Of course,
then maybe, just maybe, it can earn the right to stay
to die another day. So I've
been Chris Nesbittsmith. Thank you again for joining me
today and enduring my self loathing. Like subscribe whatever
the kids do on LinkedIn GitHub, and you can be rest assured
there'll be no spam or much content at all. Since I'm awful at self promotion
especially on social media. CNS me just points at my LinkedIn
talks. CNS me contains this and other talks and
they're all open source. Questions are very
welcome on this or anything else. I'll be kind of in
the chat and on LinkedIn if I'm not responding there,
I need to go and have a sit town. Thank you very much.