Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello, a very good morning, good afternoon and a good evening,
depending on the location you are connected from. And thank you so much
for joining me in this session. Efficiency in motion
mastering is delivery without compromising stability
where I will be talking about my experience working with teams
who thought they were doing microservices.
But eventually they realized that it was
actually a trap of a distributed monolith and they really
couldn't take a full advantage of a microservices
architecture, which tells you that you can ship a feature
as soon as possible the moment it is developed.
So what we will look at is what all changes this
team had to go through to ensure that they can really embrace a
continuous delivery of a single microservice as
a shippable unit. And about me I
work as a senior architect with a company simpler who is
into the domain of intranet platforms and has been positioned
as a leader in 2023. Gartner Magic Quadrant in
intranet packaged solution I specialize in
creating products using microservices and event driven architecture.
I also publish blogs on the site that you see on the screen
and if you want to connect me, these are my LinkedIn credentials
and this is how I look as of today. So let's dive
in and see what we have got in this session for you.
So this diagram is pretty self explanatory. If you
look in any of a product organization, you have got multiple stakeholders
who work together to deliver a product. You have a
product team who works with the end users and the
customers to understand the requirements, produce some
documents, brds and srds, and then
it's the dev team who works with the product team to understand
the requirements better. Work with the architects, come up with
the design and they try hard to produce a feature
which can be released into the prod environment as soon as possible.
Now, as soon as possible is the crux here. So the intent is
you release the feature soon and you wait for the
feedback from the users who will be using this feature.
And if they find everything okay, nothing like it,
but we generally do not see that happening. So they
will have some feedback, they'll pass it on to the product team.
Product team will again come back to the developers and then this cycle goes
on and on. So the way I said,
the key is you want to deliver the feature as
soon as possible. And you can smell
the team is trying to do a continuous delivery, which is
a software development process of getting the
code changes deployed into production quickly,
safely and with high quality. So we understand
all these buzzwords, not really a buzzword, but a key
things, which is you want to do it as soon as possible.
You want to ensure that whatever you are delivering is
not breaking any of the existing features. And of
course, it's all bug free. And that is where it says the high
quality. And one of the software
architects that complements this kind
of development process is known as microservices
architecture, which is nothing but an independent
deployable unit modeled around a business domain.
And generally these microservices collaborate with
each other to deliver a larger business use case.
So majority of us have done monolith architecture
and we have looked at the pain that
this architecture offers, and then people have
slowly, and I would say majority of the companies are by
default going with microservices architecture because they understand
the advantage that this architecture gives to them.
It's like smaller units, easy to comprehend, easy to understand,
easy to develop, test, deploy. I mean,
that is the real advantage of a microservice architecture,
right? And one of the key advantage
is team autonomous. So I can have multiple services
being owned by different teams. And because these teams
own the service, they understand the in and out and
is typically modeled around a business domain. So tomorrow,
if there is a change which is supposed to be done in one of the
service, it is just easy for them to go and make
a change and deploy the service without impacting
any other service. So in this case, you see there is service a,
which is on version one, and there is a change
which is supposed to be done in the same service. So this service
will simply make some changes, do enough testing, ensure that
things are not breaking, and they can just go and deploy the version
two of the same service. And they don't have to collaborate
with any of the other services because they know that the requirement
is only for them. So they have full authority to
make a change to deploy as and when they feel okay and
they feel confident. But this is what
is expected, that I can go and deploy my service as and
when I feel. But for many of the companies,
when you look at the way they
ship their features, things really
go crazy. And that is what I am going to talk about
in this session. So majority of the companies,
if you look at a deployment pattern, this is how
you will get to know that you have got multiple services who
have their artifacts in a single artifact
repository. But when they are trying to ship a feature,
it's not just one service which is being deployed. There are multiple
services which is being deployed in
the prod environment. And there
could be many reasons for that to happen.
But when you look at this kind of architecture,
right, it gives you a feeling that this is not microservices,
this is more like a distributed monolith. And yes, you're absolutely
true, this is a distributed monolith.
And honestly, there can be many reasons
which are quite technical. Maybe the service
boundary of a microservice is not correct.
Maybe a single microservices was broken further down into
so many services that a single requirement in that
domain leads to lot of changes happening across multiple
services. And maybe because of that reason you're forced to deploy these services
together. Other reason could be that you have an underlying
data storage which is being shared across multiple
services, a kind of entire pattern that
we say in microservices, where a change in the data
storage which is being shared, it can trigger a change,
a cascading impact to majority
of the services. And this could be also one of the reason why there are
so many changes happening and things are getting deployed even for a
single feature to be shaped. And one
more could be like too many shared libraries, right?
You are trying to share some code by
way of creating libraries. So one change in that library
and then you are forced to update all
the services which are using this library, and then you
will have to deploy those services once again when you're doing a
release. But there is one more reason
for such a deployment pattern. And this is pretty common for
companies like SaaS, companies who follow a specific
release cadence like they have a predefined feature releases schedule
where let's say, they say that we will do
a shipment every one month and
then we will accumulate all the features who were developed even in
the first week of the month. They are just holding onto it. They are just
waiting for the four week slot to be over,
the final day of release to come. And then they will ship the
features like multiple features being shipped the same day,
and then there could be valid reasons for it.
Some of the nontechnical reasons for doing this stuff
is maybe you want to create a market buzz.
Your customers are asking for a feature which is very in high in demand,
and then you want to create a market buzz that okay, we are
doing this feature in so and so release, and then you want to have that
craze going in the market. Other reason could be
your customers, they don't really have an appetite.
So as a company, you are okay to release
so many features, maybe daily,
if not weekly or monthly, but then your customers
are saying like, no, let me absorb whatever you have already
shipped, and then we are not in a position to absorb more,
you want to share, like maybe from a compliance season, you are
forced to share some report with every release that
you make. One of the example could be like a pen test that you are
doing for your product, where you want to release your security posture of
the product. And maybe you will
have to train your customer support teams. You want to publish
user guides, a lot of other stuff because of which you
are actually doing a release in a very specific cadence.
So we understand now that we
wanted to do microservices. We wanted
to embrace deploying one
service at a time with the assumption that one feature fits.
One service may not be the case always, but in majority of the cases,
that's how it should be. If that's not the case, then you have done something
wrong with breaking up your domain and assigning the
domain to a service. But then what
is the problem that we are trying to address here? Like what exactly the
issues that this distributed monolith causes?
Well, a very high risk
that you are getting into when you're doing this bulky deployment
is you do not know if something goes wrong,
why it went wrong. And this is coming, when I say distributed monolith,
this feature is coming from like this negative feature is
coming from the monolith world where you're doing a
bulky deployment, releasing so many things in
a single release package, and then you really do not know what went wrong
if things are not working as expected, right, and then
lot of debugging time it consumes for you
to at least understand that what could have gone wrong,
why things are not working as expected. Second is high
deployment time. With microservices,
the expectation is it should be like zero downtime,
or maybe roughly near zero downtime is what we say. But in
this case, with so many microservices getting
released together, your deployment time is eventually
going to increase, which might have an impact on the
user experience for the customers who are using your product,
when you are actually doing a release. And the most important
is what I would say, it literally burns out the
engineering team. Now, some of the teams
which I have worked with who were doing this kind of deployment.
So every team will have a designated person
on a release day. They will get into a conference room or maybe
onto a bridge where one person represents one service,
or maybe more than one service the person or the team owns.
And then there will be a kind of checklist that will
go first service a, followed by b,
followed by c, and followed by d, and so on. And once
every service is deployed, then you give a signal to
your automation team that you know what, we are done with
all kind of deployments, with all the services in all the predefined sequence.
Now it's time for you to go and just
perform the sanity. And then you just pray,
maybe the release God that things work as expected,
which generally may not be the case always.
And then you just have to wait for the outcome.
If things pass,
you are one of those lucky ones. But if things fail,
then you do not know what went wrong or which service
resulted into such kind of situation.
And honestly, some of the reasons
which I have also heard by the teams who are
not very proficient with microservices, it's like they
tend to avoid, they assume that my
network is always available. And once
I am able to somehow pass this kind of engagement
where every service is deployed together, all my use cases are passing.
Things will never fail after that. Right? So a
kind of myth, that network is always available,
but just a comment I wanted to
make. But then overall, you see everyone is
so occupied on the release day that the entire day is gone and
the productivity of the team, it goes to literally zero.
And just imagine the pressure the QA team will be having. And then followed by
the dev team where they just have to wait for the results to come out.
And if they feel something is not working as expected, then they
just have to work on it. And if they cannot fix
it in a given amount of time, then they will have to roll back.
But then what? To roll back, literally roll back
everything. And that is something we do not want and that is something
that microservices is not we use for.
Right. All right, so we
looked at the situation that the teams are into.
We looked at the problem that something like this
is going to have if you try to go
with this deployment model. But then what's the solution?
If you look at the solution, it's pretty simple, right? Like you just
deploy the service as soon as a feature is developed,
and I'm not kidding, that is the solution, right?
Logically, you're done, just go and deploy it.
The task is done, right? And let people use it if they want
to use it. Again, I'm saying deploy the service
as soon as a feature is developed. What exactly it means,
we'll understand in a bit. But then the moment you
make this statement, you immediately get two
questions, which I generally get. One,
what happens to the release cadence? We just discussed
about it, right. I want to create a market bus. Now, what happens to
the market buzz I cannot create. Right. With this. And second,
with whatever branching strategy that you follow,
there will always be a situation where multiple developers in
the same service are trying to work on different features. They check
in the code. As a best practice, they should check in
the code on a daily basis in the branch. And feature
X is done, but feature Y is not yet
done. And I want to release feature X,
but the dev who's working on feature y, they come and say you
cannot do it. Like you cannot just go and deploy the
artifact because my feature is not ready.
And if it is not ready, end users can
see the buttons on the UI, they can just click it and things
will not work. In that case, what's going to
happen? Right? So a very fair and
important questions, I would say, and the answer lies
like this, a deployment of a service into
production is not equal to a feature release.
Now, all I had said was we deploy the service
into the prod environment, but not
essentially releasing a feature. Now,
what exactly it means is that I will
ensure that every feature which I develop
goes behind something called as a feature toggle,
a feature flag, right? It's simply whichever
language you have been programming, you always have an if condition.
So if the flag for feature x is enabled,
my code will execute. If the flag is off, my code will
not execute. And I just assume we all trust
the if continuous, right? So if something is
true, it will execute. If something is false, it will not execute,
it will escape. And if I can ensure that
every feature which I write it is behind a feature
toggle, my job is done.
So if I go to the next slide,
and if you look at this diagram, right, you have a
diagram that says a trunk based development,
the branching strategy where multiple features
are being worked on at the same time. So every developer
just ensures that whatever feature they are working on,
they just go behind a feature flag. By default,
the flags will be turned off. And if you are done with the feature
you deploy into the production,
you still keep the feature flag off only when you think,
and only when you think that it's time for you to turn
the feature on. Simply go and turn the feature flag on.
You don't have to redeploy the service and your job is
done. So now with this, you are not
waiting for all the dependencies to be available.
Even half built code can still make into the production because
the flag is turned off for everyone by default.
And if you see this is what we are
trying to do, we are trying to do a continuous delivery.
So in this case, multiple service lines,
like multiple service owners, they can work on their feature
set and they can keep the feature flag on and
off depending on whether they want to release the features
to the end users for them to use or they just want to
deploy into the production and let
the right time come for them to release a feature. That means
simpplr go and turn the feature flag on for
the customers to use. If I go to the next slide
now, this is important, like how it helps.
We are looking at the solution and there are some advantages to it.
So how it helps is you
are able to take advantage of a microservices
architecture. Now you are talking about deploying
microservice as a single unit of deployment.
So the risk is literally zero.
You have near real zero downtime deployment.
The risk has gone to absolutely to the bottom, because now
things will work. Nothing like it. If things don't work,
you know why things are not working? It is because of your service. And you
can either turn the feature flag off or for some reason if you have
to roll back, you know, you just have to roll back your service, like you
don't have to depend on other service, no coordination needed,
and all good. It gives you enough flexibility
on the feature release. So we discussed that majority
of the SaaS based companies, they have this cadence of deployment.
So they generally define that, okay, one release four weeks
down the line, all these features will go. But with this approach, if you have
got something which is readily available, like, which is ready upfront and
you want to ship it, because maybe you have a
customer churn happening, you want to make them happy or
you are getting enough push from the customers that we want this
release as soon as possible and you had a plan to
ship it maybe in the second month, but because the feature has been
developed, you can just ship it right away. You don't
even have to wait for the cadence to come. You can do dark
launches. One of the beauty I would
say of this approach is you can
do beta launches, you can do selective
customer launch. Maybe you want to understand,
you are not very clear what the impact would be in terms of the
performance, in terms of the infrastructure, rollout, the need and all.
So you can do a silent launch, maybe for one customer, two customers
who can be your test bed as well. So you understand the
feature, how this is being used, if this is being acknowledged,
the load and all, you understand, you fix the performance issues
with these customers and then your customers are actually the live testers for
you if you think the other way around. And once you really
harden it, you can just deploy it for maybe
across regions or maybe for bigger enterprise customers.
And this way you are launching something which
is already hardened by some of your real users.
You can test in production, right? Maybe in
some cases you have situations where you cannot test
some kind of integrations in a lower environment.
Maybe you do not have enough licenses, or there could be lot of
other reasons as well. So you want to deploy into production
and you want to test in production. So maybe you can just open the
feature flag only for a specific user,
maybe a specific test user. You can perform enough testing in
your production and then get a certification
that yes, whatever you have done is working, though you could
not test it in the lower environments.
You can embrace trunk based development. We all understand there
are different kind of the branching strategies. We all have
gone through the pain and hassle of all the merges and conflicts,
a lot of other stuff. Trunk based development is something that helps
you eliminate those kind of issues.
So with this approach, you can also embrace trunk based
development. And because you are doing deployment
so many times in the production, you kind
of master the art of deployment. So tomorrow, if something goes wrong,
because you are doing things day in, day out, you know, what needs to
be done, like what's the next course of action? Should I roll back,
should I fix it, should I turn the feature flag off compared to where
you are doing a deployment to prod once, maybe in a week,
maybe in a month, a kind. And then if something really goes wrong,
you generally do not know what to be done
and you land up, maybe simply doing a rollback,
right? But as the saying goes,
nothing comes for free, right? We looked at the solution,
we all understand, we acknowledge it,
but it is not going to be that easy.
Right? So there are quite a few challenges that you have to address
before you say that. No, this is something that I
want to adopt. So you need to be very thorough
with what you're getting into, which is going to be fruitful in long run.
But then initially there will be hiccups, it will take a lot of effort.
So that is what this slide talks about. So low
fault tolerance. So now if you see your entire product,
you have feature flags everywhere.
So whatever service, homegrown product,
or maybe a third party product that you are using for
leveraging feature flag, the availability
of that service or that feature flag product
has to be pretty high. If that one service goes down,
your entire product goes for a toss.
So that is something pretty important,
high testing and validation effort. So we all
understand when we write a microservice or as
a matter of fact, when you write any piece of code, we ensure that
whatever functions we have written, whatever conditions we have written,
we write unit test cases. Then we write integration test
cases, we write API test cases, we write end to
end test cases, all sort of sanities and regressions and
a lot of things that we do from an automation perspective, which itself is a
very high, effort consuming task. But with feature flag
now I'm adding more complexity. Now the
complexity is all about lot of branching that will
be happening. So let's say you have a feature.
The feature can be either on or it can be off.
Now, there could be other services who are dependent on this
feature. Now, when you execute your test, when you execute
your test suit, right, you need to ensure that
your overall product is not breaking.
If the feature flag is off or the
feature flag is on, like things should work as expected,
even if you have a half baked code lying in the production
behind a feature flag which is turned off. So there are
so many branching that would be happening. So for every test cases, you will have
lot of situations, lot of feature flags, lot of combinations
that will be there. And there comes a very important question,
right, that do I need to do all the permutations in
combinations? So every feature flag will have, let's say a state
of true and false, and I have like hundreds of feature flags.
Do I need to have a combination with each one of them? In that case,
I will never be able to release something. Forget about fast, right?
Because it will take a lot of time for you to execute it.
Now that's where you have to take a smart decision.
You need to identify some of your core use cases,
some of your core services who are behind feature flag, and making
sure that at least those core use cases, those core
services are not breaking. Things are working as expected,
and then if you have got more cpu capacity, you have
got more time. You can just write more test cases and execute more test cases.
Nothing like it, but at least have some core combination
of feature flags just to ensure that your key critical
components are not breaking and your key use cases are passing as
expected enforced governance.
When you work with feature flags, right, you need to be very careful in
terms of how you name
the feature flag. Looking at the feature flag, can I identify which
service is using the feature flag? How many feature flags you want
to keep in your product? You cannot just keep on adding feature flags, right?
Once your feature flag is released, maybe in an early stage with the
beta customers, eventually it will go ga,
right? So you cannot just keep on adding feature flags. You have to
decide upfront that the lifecycle of this
feature flag is going to be, let's say two weeks or maybe two months,
and then you will have to write enough test
in terms of identifying that
this feature flag was supposed to have a lifecycle of
two weeks. The two weeks are gone. The feature flag still lies.
Now something is wrong, right? So you have to
write what you call as a fitness functions around these
feature flags that helps you identify that the
life of this feature flag has expired. So the dev team
has to take some action to get that clean operational
ownership. Now this is something pretty important with
so many feature flags happening turned on and off for
some of the customers, for some of the beta customers, who is
going to take the responsibility of turning the feature flags on
and off. So there are
many ways by which this can be done. You have got many stakeholders
who can be involved, who can take actions, who can
streamline the process. What we
have realized is, at least in the production
environment, if this ownership goes to the product
team, because they understand well in
terms of the feature is to be rolled out for which customer or
maybe which beta customer or test customer or friendly customer,
whatever name you want to call it. So they know when the feature
will go Ga and for whom to
releases, at what point of time, if a new customer is getting signed up,
is this feature to be released for the customer or not? So if
the product owners own the
feature flags, at least in the prod environment, that will
be good to have in lower environments. It depends.
You can have the pod leads or the function leads, the architects,
they can own and they can just play around with
the feature flags. Or maybe your QA architect, they can also do the same
thing. By default you are
actually accumulating the tech debt. So all the feature flags
that you add in the code, they eventually have to be removed,
right? Once your feature goes GA, and that
is what you have to be on toes to understand
which feature flag has expired and it's time for you to go
and remove the code. And again, it depends
on how you write the code involving feature
flag. And that's why there is a high learning curve. You don't want to
scatter the feature flags around your product.
With that it will be really hard to understand which feature
flag is being used, where, in which service and how do
I clean it up. So there are well defined patterns that you can
use to ensure that the feature flags
is easy to remove. I would say is easy to
add for sure, but then it's easy to remove as well. Again,
without breaking anything is a key. Now,
observability cannot be an afterthought. We all understand the
importance of observability, why it exists and why it should
be there. Especially with Microservices, you cannot take a
chance and it has to be a day zero thought process. But the moment you
add feature flags, it cannot be a day one,
it has to be a day zero thought process. Because now with feature
flags being added, different services will be
owning their own feature flags. Something being turned on, turned off
by mistake. You want to know that something is breaking and
you want to react like, you have to be proactive in identifying
something has gone wrong, rather than your customer coming to you and saying that
this service was working up till now, and now it is not working.
And then you go and okay, somehow the feature flag got turned
on or turned off, let me reverse it. And then you go back saying
that, okay, now just try it out. And things work. So you
want things to be identified by you and not by somebody
else. And resiliency cannot be an afterthought.
So resiliency is very important in
a world of microservices. A microservice has to be resilient.
But many times what we have seen is people tend
to ignore the resiliency part. They assume the
service they are dependent on will always be available and
my network will always be behaving good with me. The bandwidth
will always be enough. So they generally tend to avoid those
kind of design and handling that in the code. But with
feature flags, you really cannot take those chances. So that's why
resiliency and observability, they just
cannot be an afterthought. It has to go as a day zero implementation,
I would say. Well, the last
piece is to summarize whatever we discussed,
use feature flags and define operational and governance
model for it. This will help you to
embrace the microservice architecture.
This will help you to deploy microservice
as a unit of deployment. The moment you are done
with the development of your feature, you have tested enough
without dependent on others, without having coordination with other teams.
You can just go and confidently deploy the service
in the environment, in the prod environment again,
you can deploy, and if you think that it's time for you to
release, you can just turn the feature flag on and the
feature is released as well. You can embrace
trunk driven development and get rid of all the pains that happens
because of all the merge conflicts, a lot of other stuff,
and finally have enough
test automation in place for the key components
for key use cases,
playing with the toggle flags on and off,
and ensure that things are always working as expected.
Even you have a half big code in the production
environment and I'm sure, and I can assure you
that once you follow whatever is
mentioned here, you are all set to
embrace continuous delivery for a microservice as
a unit of deployment. Thank you so much
for joining me in this session.