Transcript
This transcript was autogenerated. To make changes, submit a PR.
Welcome to get ready to spend more on cloud native than you expect and be
happy about it. Feeling happy is good, right? Let's see how we can feel
happy about cloud native and and spending money on it. It sort of
seems common sense. Today you pay for a server with a full stack,
tomorrow you run just a microservice. Without that overhead you
will reduce your cost, right? Sorry. Jevon says that
you are likely to end up spending more and that you will probably be glad
that you did. Let me explain by stepping through it's history
of sprawl and consolidation cycles and weaving Jevon's
paradox into the lessons learned. Learn what to prepares for and
which will not work in the microservices ecology. But first,
let me introduce myself. G'day. I'm Marco. I'm an ex CTO who
has worked for one of the top 50 international banks. I've supported data centers
for hospitals and service providers, and I've worked for some of the largest
industry vendors out there. I've lived in three countries and I've managed
teams across 13 countries. I also spent about five years
as an industry analyst running the data science team at four five one
research. Seeing technology from every side as an operator, a developer,
an analyst, a vendor, a buyer, and a CTO gives me
a unique view on technology. Now you can read some of my writing
or interviews in the publications on the left here or on my
website, tech whisperer.com, and currently I
am a vice president for quantum research at Inside Quantum Technology.
If you want to stay current on post digital computing,
then check out our newsfeed. I'll give you a link at the end. So,
enough about me. This session assumes that other speakers
in this conference will cover some of the developer challenges of microservices,
data synchronization, data staleness, circuit breakers.
For no reply hangs cyclic dependencies, microservice prerequisites,
and so on. I'm going to focus on what happens after you've written microservices
and are using them in production, and what that can
do for your opex. This topic comes in sort of two volumes
of three chapters each. The first trilogy examines why
I'm certain that you will end up spending more on microservices in
production than you expect, and in the latter trilogy, we will cover what
we can do to be happy about that outcome. Sounds good. Let's go.
There are several benefits of moving to cloud native of
moving from a monolith to a microservice. It's a smaller
piece of code, so it's easier for newer hires to
comprehend and support. It's independent. You can roll back
buggy updates without having to redeploy the entire
application. A microservice can fail without killing the
entire application. You get simultaneously. Different teams
can work on different modules at the same time without waiting for
the other groups to complete their projects. Before this group can ship,
you're just responsible for less, and you are running less.
Now, operationally, you run just the microservice
code, not the operating systems and hypervisors that are supporting it.
This is a big deal in a public cloud environment where you are pay
as you go for compute and storage. I mean, we moved from physical
to vms to containers to reduce our stack multipliers.
Instead of paying for everything, we moved to cloud,
even to try and reduce cost. In the same way, with cloud
microservices, we're just responsible for the application and
the data. Well, the rest is now someone else's problem, and we pay
that way as well. So no longer are we paying for
print services or services like that.
That would never be used in the cloud anyway. Now, about now you might be
asking, but Marco, if all of this is reduced, why are you
convinced that we will paradoxically spend more? The answer to
this comes from Jevon's paradox. Jevons is
not an IT person. He is an economist. He defined
a paradox about efficiency increasing investment.
Let's jump back to 1775. A guy called James White
invented the new steam engine, delivering more efficient steam for less coal.
It was awesome at that stage. In fact, his engines
were so efficient, they were licensed based on the amount of fuel they would save.
Now, wise economists for the day predicted, if we can get the same steel for
less coal, then we will not use as much coal. Sell our coal
futures. Unfortunately, the actual result was coal
consumption went up for about 100 years. People just shrugged and said,
well, that's weird. Until 1865, Jevon examined
these events. He crunched the numbers and identified that when there
is elastic demand, increased efficiency can lead to,
paradoxically, increased consolidation. And that's Jevin's paradox.
So Marco's variant on Jevin's paradox, with the efficiency and agility
that cloud native provides, people's increased usages will
exceed the cost savings. And in it, we kind of know this
is true because it's happened before. It happened every time.
In fact, in it, we have a history of sprawl.
The original IBM mainframes could only run one program at a
time in the mainframe lifecycle, you could not test.
So they created these things called logical partitions. Sort of
like a pretend separate server. But we got partition sprawl. Partition sprawl
and costs led to servers. And then in the server
cycle, we got physical server sprawl, as they were physical.
We could try to manage the sprawl by putting labels on the boxes and
creating ownership of them. But then hardware server underusage
led to virtual machines, and that cycle led to
vm sprawl. It also led to software license sprawl for
operating systems and databases within each of the vms. And that,
bizarrely, led to consolidation projects of VM. For a little while,
we tried to solve it by writing scripts to scrape the DNS
list and see if we knew them all. We were sort of like someone ticking
off their band statement to ensure the Venmo transactions are valid.
I will tell you, I started life as a bank teller. People make mistakes.
Always check off your bank statements. Anyway, we scripted ping sweeps
for all of the addresses to search for excess nodes out there, but then got
yelled at by the cybersecurity team for triggering their ddos to
distribute a denial of service sensors. Because we were pinging everything but VM overheads.
Well, that led to the container cycle. And surprise,
surprise, we got container sprawl. Container overhead led
to the microservice cycle. And we will get
microservice sprawl. Do you know how many microservices you have in
production today? No. Then you have sprawl. In fact, a division of
Deloitte did some research, and 67% of it directors
admitted they did not know the exact number used in their organization.
Sprawl is real. How is sprawl addressed in prior cycles?
Well, I call these edge trimmers because they're some of the
actions you can take to clean up your microservices footpath.
Now, I live in New York and I don't have any lawns, so hopefully the
analogy works. There are actions. Well, these are actions
that I took in prior cycles to solve sprawl. So let's look
at a quick list. In prior cycles, we thought creating rules
and processes would control the situation. It did improve
things. It was good for physical servers, for vms,
and even for containers. Processes and standards are great when onboarding
new people because can hand them the processes and standards
and they get an idea of how things are working in your environment. Also,
when you hear of somebody building something new, you can just email the
standards over there. But a few hints here build your naming conventions
so they allow someone seeing something for the first time to guess
who owns it and why it exists. Yes, it's fun to
name your infrastructure after Rick and morty characters, but useful
is better, and your operators will love you for it. Also, be careful they
don't become an overhead. You must sort of accept that not everything
will fit into a standard and not everything will be under your control.
In the past we used to use occasional audits. We generally wrapped them around
when we were going for an ISO certification or certification update.
We'd do it an occasional audit to check that things were as we expected,
as entities were more permanent in prior cycles, we found
that performance monitors were sufficient to discover items we were not aware
of, and that allowed us to sort of balance resources and clean up orphans.
But microservices are different. See, microservices are
independent. They accept a trigger, the work is done,
and then the output produced and the microservice ends. The taximeter stops.
That independence means that inside the microservice you
cannot confirm that the trigger was valid, because otherwise it would lose
its independence that you can leave. So what's
the issue with this? Well, that can leave you with what sres are calling the
thundering herd problem. Thundering herd is a little like the noisy neighbor
problem in the VM world. It's when your microservice is getting called 10,000
times. Now, 1000 of them might be valid customer facing calls,
but the other 9000, well, they could be a typo in some code that
just keeps calling the microservice and incurring cost for
no reason at all. With too many triggered executions,
you can overwhelm your microservice function. Some cloud providers
have limits on microservice executions. They have caps.
The typo driven thundering herd could be excluding productive
work because it's like a denial of service on your microservice.
So how do we solve sprawl for microservices?
Well, with physical service, we could count the boxes with vms,
even when they're not running. You'd have an image file, so you could scan file
systems for image files. But microservices are ephemeral.
They only run when they're needed. They are intended to scale and multiply.
The nature of microservice sprawl is different.
So sure, you can get sprawl in the number of different
microservices created, and yes, standards and performance monitoring will
help with that. So it's good to have those things in place. But to find
microservices, you need to sort of poke around inside the orchestrators,
whether it's vmware or kubernetes or azure or lambda or whatever.
Yet there will always be sprawl in the number of
each microservice actually running at any time. You need to correlate
back to the trigger to identify whether you have a bug versus a lot
of valid action going on. So how do you detect sort of unauthorized,
noncompliant changes to their virtual environments?
How will you have complete visibility from a single point of
control, especially when it is ephemeral? Service mesh offerings
seem the best approach here. Think of a service mesh as a specialized
layer seven network for microservice APIs. There are
many choices. You've got istio backend,
it's backed by Google and IBM. You've got buoyance liquidy.
In general, the sidecar proxy approach seems very popular, as seen in
projects like Nginx and Hashicorp and Solar IO and
others. A mesh will give you security, it'll give you the circuit breaker
for hangs, it'll give you load balancing. And your orchestrator
probably provides level four sort of transport layer load balancing.
But a service mesh can implement this at layer seven, an application layer,
and give you load balancing there. Now, on top of that, where they have
API modification, you can even use measures to
set up blue, green or canary deployments. So, okay to address
sprawl? Yes. Define your processes and naming conventions
and ensure you have performance monitors to discover and give you
observability into microservices, what you would have had
anyway, basically. But on top of that, introduce a solid service
mesh to help you control the variance of microservice sprawl.
And then from controlled sprawl comes
happiness. Let's talk about feeling happy for a second.
Philosophically speaking, there are two paths to feeling happy, the hedonistic
and the eudemonic. Hedonists take the view that in order to live a happy life,
we must maximize pleasure and avoid pain. In contrast,
the eudemonic approach takes the long view that we should live
authentically and for the greater good. If we take the eudemonic
approach, we strive for meaning. We use our strengths to
contribute to something that is greater than ourselves. Leading a happy
life is not about avoiding hard times. It's about being able
to respond to adversity in a way that allows us to grow
from the experience. Now, recently I've noticed a new
term coming up in discussions among the european strategy community,
regeneration. Now, you may already be aware of it, but I hadn't really run into
it in terms of business, and it strategy before regeneration is
about leaving things better than when you found them, not just finding
efficiencies, but creating improvements. Create your
microservices for the next generation. Build your cloud native
code for the next person who will have to build on it
and maintain it. Put the effort in to make life better for
the operators and the engineers who will next touch it. This is
the eudemonic approach to happiness. Now, I said that from controlled sprawl
comes happiness about cost. Why would we be happy spending more?
Because it's worth it. Because when you don't have sprawl, you have service.
When you have control, it means that they're not just randomly spawning themselves
or exploiting costs out there. You know there's a reason they run
and you understand what the value of that is. And if you're careful with your
microservices, if you can correlate them back to their business value,
then you will know that what you have is earning its keep.
So you'll be happy that you're spending more money because you know that it is
better for the business. It is better for your enterprise or your government body
or whoever you're working for. If we can discover, secure and monitor
them for performance and effectiveness, then it's no longer sprawl.
Then if you find yourself spending more on cloud native than you expected, be happy.
You made it more efficient. And now people are finding ways
to create value using that more efficient approach that you created.
That's just the devons paradox working for you. Okay,
here are some links. The first is to my personal website where you can read
some of my blog posts and you can track where I'm speaking next.
The others are some of the research for this presentation
that you might find useful at the end, here is the page for
inside quantum technology where I'm writing now. You can sign
up. If you follow that link, you can sign up for the newsletter. It's a
free newsletter that comes out and that will keep you current on quantum quantum
computing. So with all of that, let me say thank you for your time.
You can reach me on Twitter or LinkedIn. Find me in slack and
chat rooms for the event. But mainly, thank you for your time.
I hope you found this valuable. And thank you for our hosts for running this
event today.