Transcript
This transcript was autogenerated. To make changes, submit a PR.
Are youll an SRE,
a developer, a quality
engineer who wants to tackle the challenge of improving reliability
in your DevOps? You can enable your DevOps for reliability with
chaos native. Create your free account at
Chaos native. Litmus Cloud hi,
my name is JJ Asghar and I'm a developer advocate
for IBM Cloud. Hopefully you can hear me and see me
and thank youll so much for having me at Conf 24. I look
forward to this talk. So let's see if I got this right. We'll go
ahead and transition and you should see some slides now.
Wonderful. So yes,
let us continue. Yes, so we
are here to do migrating a monolith to cloud native and some stumbling
blocks you may have not heard about again. Hi IBM JJ, developer advocate
for IBM Cloud. You can reach me at awesome@ibm.com or
find me on Twitter at jjasco.
So your company has finally decided to move
to the cloud native ecosystem. You've landed on containerization
has your first step and you heard all you need to
do is containerize your first app, push it to kubernetes, openshift or nomad,
and the cost savings will just come. Youve done this and well,
things haven't gone as well as you thought they would. What do
you mean our Opex has gone up? Simply said, the promise of containerization or
migrating to the cloud native ecosystem can be a lie if you don't do your
homework. Sadly, most companies don't. And in this talk, I'll explain
a few gotchas that a few enterprises in
the guise of a company called Asgar Labs hit moving towards the
cloud native world. And hopefully you'll learn from their mistakes so
you don't trip down this path that you get closer
to the promise of containerization.
So let's talk about what JJ Asgharlabs JJ Asgharlabs is
just a multinational tech conglomerate with multiple subsidiaries,
also known as a mask for me, not naming
companies outright. So I'll just say Asgar Labs.
And you can imagine it's some fortune company out there. In all seriousness,
it's a collection of different companies that I've just ran into
and the stories that I want to tell to make sure the
companies are kept innocent. And yes, it's a fake company.
It doesn't really exist, though the website. I do actually have my email come of
my test emails from there, but no, and we're not hiring
inferior if you're wondering. It was supposed to be a joke.
So what did ask our labs or what didnt most of these companies think they
need to do. They thought they could take their migration from the physical
data center or colocation to the VMware ecosystem.
So, remember back in the day when the promise of virtualization and you
could be like, hey, instead of four physical machines,
youll could turn it into 16 vms.
With four physical machines becoming a hypervisor of some sort,
they thought they could take that same monolithic concept and
virtualization and just move it to cloud. Native bare metals to
vms is the same as VMS to containers,
right? Like, that's what they thought that we needed to do.
Spoiler alert, it's not.
So where do I come into this conversation? Normally, it's usually after a
few successful migrations that they've
had, which, honestly, I beg to differ that these were successful or not.
It's nowhere near what they thought they could provide or how
much they could have within time and effort and money they spent on it.
I came in as the cloud native person and asked some very simple
but very tough questions to these companies.
Let's go talk about the very first question here.
So let's ask some very straightforward questions, but they're deceptively
hard to answer. First, that question is,
who containerized your app? Was it developers or the
operations team? Was there more than a couple status
meetings? Cough Asgard Labs? Cough supposed
to be a joke between the project teams and then who actually
shipped it? Did you actually work closely together to make
that happen? I mean, was it a completely different team?
Was there a containerization team? I've seen what they call
centers of excellences, where they basically had
a team of people who are supposed to funnel all this
stuff, but they didn't have the same skills that the developers or the questions teams
had. So you need to figure out who actually gave
you that container and built it. I mean, if you're building it yourself and it's
a small team, sure, that's great. But if youre a massive
corporation and you have separation of powers to the extreme,
you need to know what made them decide to do the thing
they did and who they are. That leads into the second question.
It's like, why did you containerize your app?
Honestly ask that question. You should have that answer very
readily. Ask youre teams containerized because they were told
to, not for any other reason than some execs said that
our core software stack needs to now be next gen.
It's the exact same thing I saw at another company where they
were told they needed to be on the cloud, not realizing that
they were taking a massive engineering effort to move to the
cloud. That exact same CIO at a different company
or same personality, I should say personality type.
Personality type. You could tell, read some article saying that
we needed to be next gen so we could get the next group of engineers
to come and play with our technology. So we need to move things.
But why? If you dont actually need
to, you shouldn't need to if you're making money.
Anyway, we'll get deeper into that in a minute.
And where did you deploy or plan on
deploying this containerized app? This spurs from a conversation,
where is the choice of your cloud? Because of some
ela, there's a couple clouds out there that will give you all you can
eat for a first year and then all
of a sudden your costs skyrocket because you didn't realize,
you didn't cap it right. There are other companies like IBM Cloud
of course, that have some really interesting opportunities for enterprises
specifically. We don't do that by the way. We actually
have really good predictive modeling. But that's a different conversation.
Is this cloud the best for your company? Or is this something just forced
upon you? Did you do your homework to actually understand that?
Turns out Cloud A, cloud B, and Cloud C,
they all focus on different things. Maybe you should look at all of them,
or maybe you should put all your eggs in one basket. It really depends.
And you should spend that time to do that work. And believe
it or not, I've actually asked
things question to a rather large conversation.
And I said, so what did you containerize?
And they looked at me like I was crazy.
And then they're like, oh yeah,
we took this war file, might have actually been an ear file now that
I think about it. And we shoved it in a new container and we contain
youre iStar app. And I was just like,
wait, what? You just
took this was file and wrapped it in a container and shoved it
on Kubernetes and wondering why your app isn't doing what you expect it
to do. And they're like, yeah, well, isn't that the whole point of containerization?
And I took a moment and I was like, no,
there's a lot more here. And let's take a quick aside and
talk about some architectural changes that are required as you move towards this.
So yes, containerization, the promise of it is you should be able to take
your app, wrap it up in a container and ship it anywhere.
But there is nuances to this. And I really hope
through this presentation you actually see that
it's not just cut and let's
take, because of course I work at IBM, I need to have
websphere somewhere in my presentation. That's a joke.
But let's take a quick aside and go into these architecturals.
The first one, as our previous Azar labs company,
what they did is do it, a replatform example. They took their
legacy application, which was just basically a war file,
shoved it into a container, in this case from Websphere,
and threw it on the openshift or Kubernetes.
And that's cool. That's a great first step.
Not your final step, your first step, because you need to figure,
make sure that your containerize is fine and you start talking about the
advantages of it. But if you didn't design your architecture
or you didn't rearchitect your application, now basically
you have one big thing of your application, so you don't
take advantage of anything that's inside of Openshift or Kubernetes,
which we'll talk about here in a moment. The next step
is naturally repackaging to microservices, where you start
breaking up your application into a couple of different was files.
For instance, if you see this heard file to two different wires and you shove
it on to Kubernetes or Openshift and you have
your MQ sitting there and your application started talking
a little bit more intelligently to all the things internally inside
of the Kubernetes cluster. And that's really
important. That's the next natural progression. So you've taken your single was
file and now you've taken your little bit more complex
application and you've now put it on Openshift
or Kubernetes. And now you can actually have intercommunication, you can
actually have scaling now, which is nice. So you can scale out your application
layer if you need to or whatever, which makes things much more advantageous.
You start seeing some real ROI on here because now instead of having one
or two or three machines that run youre Webster infrastructure,
now you can actually leverage the cloudiness. And cloudiness
is trademarked, of course, that's another joke and
be able to came out how you need, but you're not quite done yet.
Now we need to talk about refactoring into the strangler pattern. And yes,
I did say the word refactor. And yes, you are going to have to refactor
your application. So now as you take your different was files
and you break them up, now you create them into little microservices and
actually start giving different functions
and different features of your application to
small microservices on containerize inside
of kubernetes or Openshift. This takes time.
This doesn't happen overnight. So when you had that promise of containerization,
of shoving that war file in and calling it a day,
leveraging it with microservices, now you can start taking away complexity
inside of that war file or inside of that job Application I'm just going
to use was as the canonical example.
But you could leverage the scheduler inside of Kubernetes or Openshift
now. Now you don't need a scheduler inside your application. So if you need
to spin out jobs to do other stuff,
you can spin out microservices to do that and giving
those features their own history
and microservice. In turn, you'll be able to
allow things to be rolled out quicker. Because now, instead of you doing that big
bang release of that war file every x number of days
or months or sometimes years, now you can release that microservice
in an intelligent way. But we'll get deeper into that in here in a moment.
So let's talk about some real tangible
architectural advantages and disadvantages of what we just went over.
And first of all, as I just said a moment ago, velocity, or implied
at least, velocity is probably the most important
thing that you get out of this. The ability to focus on their own histories
and scoping the clusters to what you actually need is
great. You actually start seeing real ROI now, you don't
need to have a bunch of machines sitting there in a data center or on
vms on the cloud not doing anything. Now, your Kubernetes cluster
can actually be scoped to what it needs to be. One pod
isn't as good as multiple smaller pods. It's not
like vms anymore. Granted, this does require a higher
level of cooperation between your teams, and you'll need to build more
advanced integration tests, along with most likely a
completely different deployment system and policy system that you have,
but you get some real, real benefits.
So I didn't realize this when I wrote this talk, but the CCB
was a teams that I thought was commonplace, but turns out
it's not. Well, ccbs means change control board.
So, JJ, what do you mean? Our goal is to move away from the change
control board now, at least in some of the enterprises
I've actually personally worked at. The CCB was literally a meeting,
well, not now, but in the day. It was literally
a meeting of the managers, first line managers that would
sit in a room and whenever they had a release,
they would literally put a thumbs up in that room to say,
yes, we can release it at that time. Well at that time I was
the operations guy and I has the privilege of
waking up at three in the morning on a Saturday to release that code.
Needless to say, that was not great. But when
it comes from the enterprise standpoint, it was great. It was wonderful.
Everybody had buy in, everyone actually didnt the thumbs up
to say yes, we should replace it. But inside
the cloud native ecosystem, you can't have that
if you're going to be releasing ten to 15 times,
sometimes n number of times a day,
you can't have a room of middle managers with a thumbs up there.
So you need to recognize that the CCB and that
type of policy still exists today is no
longer like as you move towards this cloud native ecosystem,
you need to get rid of it. If you have these meetings, and I know
some of you do, you need to be sure that they
go by the wayside.
So hey JJ,
aren't we doubling up on like I
have a schedule already build in my app or I
got a load balancer already on my data center.
This already exists on kubernetes.
So why would I do that when
I've already spent all this time and effort to get these knowledge in this space?
Well, youll need to audit and verify that you aren't actually doubling
up work in technology. You're going to have to sit down and really
refactor and architect your application. A great example
was that Azgar Labs had both a scheduler on their Java
stack and they attempted to use kubernetes to schedule pods.
It was kind of weird. It was really, really weird.
But they spent so much time trying
to figure out why when they scheduled
a job to
do, processing or something like that, it would always ever stay inside that
one pod. And because the scheduler for Java
would just spin up another process inside of the pod,
right? And they're like, JJ, we have this three node
cluster, it was a three node cluster at the time, but only one node
is ever actually doing any work. This teams really weird.
Like why is it doing this? And I started digging
into it and I recognized that, oh, well, it turns out the
reason why these are idle is because you're not actually leveraging the scheduler for
kubernetes. So you should spend some time and break
out your scheduler so it creates
other pods so you start overloading or share
the load across the whole closer.
It was a true moment of what's
a good word for it, it was a light bulb moment for that
cluster or the cluster and those people.
But the beauty of it is that actually, not that really
the beauty, more so the challenge is that they still haven't
actually done it because they didn't realize how hard it
was. So arguably they failed at that cloud native
migration and they were like, we have
other priorities. Anyway, long story short, you need to
recognize that there are tooling and
things inside of kubernetes and the cloud native ecosystem
like OpenShift, to be able to handle a lot of things stuff.
Take cloud balancers for instance, right? Like cloud balancers,
they exist, has a software layer
inside of Openshift and kubernetes. And the way the ingress works
and the way that works, are you really going to need can
f five in front of your kubernetes
cluster or your openshift cluster? Now you have to sit down and really
verify and audit what you're doing.
So isn't automation good here? And honestly,
why are things so complicated now, right? Like, come on,
there's a lot going on here. Well, first of all,
of course automation is good here. There's all these moving
parts. You're going to need to leverage automation to make the computers do
the work for you. Humans are error prone, we all know this,
right? I've probably made four mistakes in this talk already, but hopefully you
haven't noticed them. Another joke, hopefully. And you
need to take humans out of the equation. And then
honestly, your app has already probably always been this complicated.
You just now get to see into the complication, if that makes sense.
Youll have to visualize the complexity when you start breaking these things out to microservices.
No longer do you have two or three enterprise architects who
understand how youre whole application works. Now you have a bunch
of people who take care of a bunch of microservices
and they understand how it interacts between
one another.
It helps with remove tribal knowledge.
You'll be able to visualize and start focusing on the different bottlenecks and optimizations
that you can gain from having this knowledge now.
And when you've truly gotten to microservices, you'll be amazed on how
much information you can get on how your application is actually running
and where optimizations can happen.
Not just internal business logic, but external business logic too,
where all of a sudden you may discover that turns out there's no need
for this external API anymore because it turns out we can actually
do this internally or whatever. It opens up
so many things. Having the
shared knowledge is unbelievably powerful.
A great friend of mine said this to me the other day when
I was walking through this talk with him, and it really does focus
everything down when it comes to your monolithic app
and you're moving to microservices. You had an ordinarily,
I can't say the word, an unprofessional speaker. It's embarrassing.
Bull mastiff. And now you have 13 yipping chihuahuas.
Now take a moment and really, really envision that in
your head and you see that big dog. You still got to feed it,
you still got to walk it. Barks really bad
things happen, right? It can take down the postman if
needs be. Now you got 13 yipping chihuahuas,
all 13 of them working on. It might take them that
postman down and you're going to have to feed 13 dogs now. But at
the same time, it's much easier to deal with one chihuahua and
then have 13 happy or twelve others happy compared to one big
dog that you pay all your attention to. It's a really great observation
of moving into the cloud native ecosystem.
And on the flip side, if something goes wrong, just as Ken stole this
quote, I'm stealing this quote from him. It really hits home.
We replaced our monolith with microservices, so every
outage is like a murder mystery. It's true.
Youre going to have to really learn how to work together
as teams to make these things happen. You have to walk through each
process and what it did when more importantly, you have
to create standardized logging and standardized APIs
between the different outages and the teams so
people can understand what actually happened.
It's very challenging and it's something youve really got to spend time and
effort across your whole to do.
So let's talk about some questions you should ask to make sure that
the culture shift can happen. I mentioned the CCB earlier,
and at ask our labs, the CCB become something almost like
something out of almost the Phoenix project. At the beginning,
no one showed up or even if when they did, they engaged and it
just became a burden. Moving to cloud native, you need to
start allowing for self orchestration and
rollouts and updates. You need to lean towards more of the pipelines
and collaborate to get the different widgets out and the different applications
out at the right time. And I mentioned the pipeline because you're
going to have to build that pipeline with a cultural shift that's going to happen.
You'll need CI and CD pipelines.
You need to leverage the standards and linting so you can always make sure that
your come is to your standards.
One of the reasons why go is so easy to read is that the go
format command exists. Go came along with
a standard outside of the box. So at 03:00 a.m.
Now, when something goes horribly wrong, the cloudnative overhead of
reading code and arguing over where a parentheses is
is no longer there. As an operations person reading code
at three in the morning when my pager duty goes
off, I was never happy to try to figure out
why something was there, and I would spend time trying to understand it
instead of just reading it like a book. So having that inside your pipeline,
having formatting standards that everyone agrees on is the way to
write it. And formatting to be able to linting to
force this really does take away a huge amount
of issues down the line. You also
need to learn to collaborate with the other teams. One of the hardest things I
saw at Asgard Labs was to actually deal with the collaboration
between the teams. They had some great propaganda about scrum
teams and tribes and whatnot, but still, people wanted to do things the old
way. Collaboration isn't just status meetings, it's more
than that. It's declaring shared contracts for jobs and responsibilities
with constant communication between teams. Jira tickets can only
get you so far. One of the most successful things I ever saw
at a subsidiary of Asgard Labs was that every sprint,
they switched out one person from one tribe to another in
the global app. This allowed for new challenges and new blood
for each team. So every two weeks, someone new
joins your team with all the different microservices.
So then all of a sudden, you had to train someone new every two
weeks how to get that feature out. And before you knew it,
the amount of on ramping was negligible. The amount
of actual someone coming together and understanding. Oh, it turns out Billy
Bob and Jane Doe over there were working on
something just like this in another Microsoft version. It created this amazing
culture of that. Granted, it was a massive undertaking
to get that going off the ground, and there had to be some really high
up agreements about it, but the amount of velocity for that
company just skyrocketed. It is unbelievably
powerful when you start really, truly learning how to share,
collaborate and move forward. Started contracts and tickets
and things like that, but when you actually sit down and
work together, it's unbelievably powerful.
Escalator observations. Thought they could just buy one more product and call a day.
When it came to visibility monitoring, Nagius wouldn't cut it
anymore. And they learned it the hard way. Sometimes you have to put multiple
applications in visibility only have portions of
your team's what they care about. Those single pane of glass
is a great thing to give your marketing people so they can see the line
that goes up and to the right. But in all seriousness, when you're
actually doing this day in and day out, you're going to need different tooling
for different situations. You're going to have to sit down and realize that
even though some companies say you can do everything,
you're going to need a lot of different ones out there. And youre going to
need people to have expertise in the different technologies too.
As Adley Asgard Labs wanted a single pane of glass.
It's unrealistic. There are too many moving parts in the cloud native ecosystem that
you just don't have that visibility. So you need to work on all
things. It was a huge cultural shift, but again,
as long as you can get that graph that goes up and to the right,
that'll be some of the best monitoring. And if you want to ping me later,
I can finish that joke off.
So how do the economics of the cloud can differ from your data center?
Opex? Yes, and everything can be paid by
a credit card, which is great. Cfos go back and forth about this, but you
need to recognize cfos will start wondering why
expenses are going up. Some love it because assuming
your team can keep a hold of the costs, you can really predictably
understand what your costs are going to be. On the flip side,
you can't depreciate anything that is in the cloud,
which is a little annoying for some cfos. You're going to have to work with
your CFO and your accounting teams to make sure you sit inside. The budgets
has much as operations and sres and DevOps professionals
and people on the operations side don't ever think about budgeting.
At least most of them don't. Don't lie.
You're going to have to sit down and think about it. So I strongly suggest
build up a bridge to your accounting team and respect
what's going on there, because it'll only make life easier in the long run.
Hey JJ, what do you mean that all our support is
now on stack overflow? Well, yeah,
okay, you're right. In a few places it's true, especially if you're
using open source kubernetes. There's no company behind it,
right? There's not. I mean, there's some companies that put into
the ecosystem, but there's no throat to choke. There's companies that
can support you, but when it comes to actual upstream
kubernetes, sorry, I mean,
don't get me wrong, if you're running it on a cloud with aks or
eks or iks for that matter, if you're running Openshift
and you have red hat there, you have some throats to choke if
something goes wrong. But there are some companies out there that want
to leverage fully open source work and you need to start thinking about it.
When you move to the cloud native ecosystem,
the building containers with Docker and then
shipping those containers out, there are some conversations at
this moment right about it that you need to really think about if it's
worth it for your company or not and work towards that.
So keep in mind.
So let's talk about come tangible things so you
can really start with to move forward. There are a ton of technology and software
to help you keep going. The best thing you can do is take a moment
and figure out when you containerized your app, did you
really containerize it or just wrap it in a pod and wash your hands of
it? Have a large conversation on why you did
this. Was it because you didn't
want to be left behind? Or is there an actual reason for you to
move into the cloud native ecosystem? Or is it because you thought you could leverage
some other software out there to make your customers happy?
There's really a lot of options out there and you really need to have these
conversations. So let's talk about a
quick conclusion here. And ideally with masking
all these corporations, I've has an
exposure to Asgard Labs to help me highlight some of the consistent
issues I've found. The best thing you can do is
first ask, do you really need to? And if youll really are committed,
you really should take a beat and look for optimizations instead of
features. This will drive your teams crazy. It'll drive
your executives even crazier because they're going to have to be like, why are we
stopping and not releasing features? You're like, well, we're rearchitecting things.
It's going to take some time and you got to be reasonable about that and
everything. You pay up front,
you'll be able to get dividends for that later on. And if you use a
correct tool for the job, you'll get there. And as
a great friend of mine also said, you wouldn't use a saw
if you needed a hammer or you wouldn't use a hammer when you
needed a saw, right? They both can do the same job.
I mean, you can use a saw to hammer in a nail,
and you can use a hammer to saw a piece of wood
or break a piece of wood apart, but they're
not designed to do that. So when moving to the cloud native ecosystem,
be sure you're choosing the right tool, the right job,
and you'll miss those stumbling blocks.
Thank you so much for your time. And let me go ahead and go back
to the little other screen here.
Thank you. And again at jjasgar on Twitter
and awesome@ibm.com. I really hope for,
I look forward to hopefully seeing you in real life soon. Things so much.
Bye.