Transcript
This transcript was autogenerated. To make changes, submit a PR.
Um, hello and welcome
to this session. Unleashing deploy velocity feature flags. Feature Flags I'm
excited to be able to join you today as part of Comp 42 site
reliability engineering series. And today we're
talking about feature flags and we're really seeing how we're going to be able to
use feature flags in order to not just enable faster deployments to
production, but also just enable these deployments in general
to have greater value and what we can do with them. You might call them
feature flags. You might call these toggle switches or flippers. We'll explore them in a
couple different ways today. Just before we get started, a little about
myself. My name is Travis Goslin and I
work as a principal software engineer for a company called SPS Commerce.
And my focus is really on developer experience. And so anything in the
software development lifecycle, I'm very much interested in and exactly
how we can make micro feedback loops a little bit better for engineers and so
you can get faster, more continuous feedback on whatever you're doing.
And feature flags definitely bring us down this route.
SPS commerce isn't a household name, it's a b, two b organization.
You probably haven't heard of it, but we are the world's largest retail network with
over 4200 retailers supplying data interchange
invoices, purchase orders. Between those suppliers
and retailers, Costco, Walmart, Target, academy, sports, you'll find
us all with the biggest retailers in the market. And like
many, our organization has really been on this DevOps and
agile paths over the last decade and beyond. And we think about
DevOps a lot as a culture and a state of mind, a shift really,
in how we approach and how we focus on engineering.
And for us, much like it is for you, probably continuous
and automation are really key principles of how we
approach it. The idea of continuously getting that feedback a little bit faster
through automation, and whether it be just that local
development and debugging feedback loop, or whether it be that feedback loop all the way
to production on a finished feature, we're really focusing on making those
tighter, faster and quicker as we go, and of course
sharing that. And that's why I'm excited to be with you today in order to
share our progress and our journey and some of the takeaways that we've
had in hopes that maybe it has an impact on you and it's
information that you can take away. However, like many organizations
focusing down this road, we counted a pretty major roadblock and a problem that I
wanted to share with you today. I think it's a problem that you'll be able
to relate to. And so let's dive in and look at that. First is
let's talk a little bit about our structure for development. It's pretty standard.
We typically have a source control system specified in GitHub.
So a repository that's there, we use
main branch. Our default branch, you might call it master,
is typically always deployable. We try to follow more pure
CI practices around continuous integration. We do carve
off feature branches though and use a lot of the capabilities and
the functionality that is available inside of
GitHub. So pull requests and status checks and all
that capability. So that way we can automate pr checks coming
back in through automation. So we develop these features in these short
lived feature branches and use the validation in the pull request
sequence to come back into our main branch when we're ready.
Like any good CI system, then we automatically kick off and build
that. We use git semantic versioning and releasing to
look at these git commit messages and then devise a semantic version number,
in this case one, two, x, whatever number you want,
and we go ahead and we deploy that through continuous deployment automatically
to our dev environment, which is pretty straightforward and obvious these days.
And when we're ready to go to production and replace our existing release
one 10 there we get blocked
in a lot of cases. This is a big problem in many of our teams
and we're blocked by this thing that we'll call a gate here.
And you might be asking, well, what is this gate? This gate is
many different things for different organizations and different teams,
even within SPS commerce. It could be this
gate is a product owner who doesn't want that particular feature released until
next Tuesday. This could also be these fact that
you're producing coordinated release and well it's just not finished yet. There are other applications
that need to be released first before yours, or perhaps a UI
that isn't quite ready yet. This could also be
the fact that you've discovered a bug, you've been examining it in the dev
environment or your test or your integration environment and it's just not working.
It's just not as you expected or the AC isn't quite right.
And so you get held up. You need to deploys this, but you can't to
finish deploying it to production. At the same time you have additional features that
are in the backlog and they're coming in and you're starting to develop them,
but you're a little bit nervous to deploy those back into the main branch now
because you know, you have an unresolved dependency that hasn't gone all these way to
production, so you're kind of stuck there now. Your feature two
is a little bit more longer lived than you wanted it to be, and you
really just want to get it merged in and deployed, but you're waiting,
and at the same time you discover that you need a critical bug
fix in production. And at that point you
make the bug fix and you push it all the way through and get a
version and built. But that's when you run into the problem, because your pipeline is
actually blocked. If this was an engineer maybe doing a critical bug fix that didn't
realize that there was a hold up in the pipeline, he might have accidentally gone
ahead, or she might have accidentally gone ahead and deploys that to production.
In this case we can't because we're blocked. Our green feature can't go to production
yet until next week maybe. Of course there's things
you could have done, right, we could have cherry picked off main branches.
We could have just branched off the bug fix branch from release 1.1
tag and then released that directly to production.
You don't want to release it to dev necessarily, because while we practice and
really believe a lot in backwards compatibility, forwards compatibility,
in the reverse of that is a whole nother scenario, especially if you're using
other dependencies and database migrations, it might not be
feasible to say, oh, I'm going to deploys version eleven in my dev
environment, when it could be a week later after that migration has happened.
Maybe it is, maybe it's not. And of course releasing that
main branch directly to production with your immutable versioned artifacts
is just odd and could
cause some problems. Not a great idea. At the same time,
you have other services that are waiting, right? Your service one and your service two
are waiting either for those features, those critical bug fixes, and they're
saying I can't continue on my parallel development without some of these contracts fulfilled and
some of these updates, I want to use it now for my own development.
That's likely an internal scenario. And so
we've created then a lot of confusion and complexity here in
our deployment pipelines because we've coupled together this idea
of deploys and release. And so that's where we look at feature flags as an
opportunity to decouple those in a solution. So let's
examine a solution then together in this scenario, we'll have our
main branch again and our feature branch. But when we're writing that feature code,
we'll go ahead and we'll feature flag it. And for the sake of this discussion,
the mental model we need here is just an if statement in code that is
ensuring our new code execution path doesn't actually execute.
And so here we'll automatically disable that feature if there's
no app configuration to enable it. And so that already enables
some initial value as I merge that feature back to the main or
the default branches, even others doing development as
they rebase their branch. Now they're not going to accidentally get an incompleted feature.
So I can enable myself through a feature flag to even
get some of these prs and these merges back to main and do shorter lived
branches. Or I could even commit directly to main if
I wanted to and follow some of those more pure CI CD practices.
Of course we go ahead and we do the normal versioning here,
build it and then deploy it to our dev environment. And here
where we typically be blocked before we're not now because we're no longer releasing
the feature because it is behind a feature flag that is inherently turned off.
And so we can go ahead and deploys that directly to production without any blockage
or without any dependencies in our pipeline. In reality,
what we've done is taken that feature flag, and in
a more advanced structure and architecture, we move that to a feature decision provider.
Think of a feature decision provider as a microservice or a tiny service
that exists abstracted from these environments that you can ask in
a simple API request and say is feature one enabled?
Yes or no? You might extend that to say is feature one enabled in dev?
Is feature one enabled in prod? And then each individual environment can easily
determine whether that feature is enabled and turned
on or off. And so our gate then no longer exists between
environments, but is abstracted to sit on the outside plane where it
is blocking whether the feature decision provider should enable it or not.
So this is fantastic, because now we can ensure that we
haven't deploys, or I should say we haven't released that
feature that can't go to next Tuesday until the product owner goes ahead and clicks
that button, perhaps, or updates a value in the featured decision provider to enable
it. That critical bug fix that we had no problem,
right, can release it all the way through to production because we're now keeping our
pipeline flowing without this facade of true CI that
we're actually stopping our pipeline every now and again.
And those service one and service two can also now
be used as a part of this to integrate with early
features if they need to. Using context, our feature decision provider can
offer contextual decisions. So what do I mean by that?
Perhaps service one is authorized as an internal service, we can
detect that and we can ask feature decision provider, is feature one
enabled in production for service one specifically?
We can say yes and turn it on just for them. And so we have
taken away then this coupling, as we talked about of deploy and release.
They're no longer the same and they are separated and they exist in different
parts now, not as a part of the infrastructure or the pipelines.
This allows us to have some pretty powerful capability that we're going
to talk about today. First, what is a feature flag
by definition? My favorite definition is from Martin Fowler, which is a
powerful technique allowing teams to modify system behavior, so modifying behaviors.
And the key part of that is without changing code. So we added an
app configuration file before, maybe a microservices. The key is that we don't
want to change code in order to modify it. Martin Fowler
also defines four types of feature flags to be aware of.
The first is the release type of flag, and a release type of flag is
really the kind we've been talking about the idea that something's still in development as
a feature, or it shouldn't be released yet till Tuesday, or we're
just coordinating it across a couple of different projects and deployable units.
Whereas an operations type of flag is something that's more technical
for us as engineers, something we want to do, we want to modify the system
behavior, but it's not an actual feature. Might be performance related,
might be for temporary migrations, we'll see an example of that in a bit.
Might be for traffic shaping or switches or degradation,
those types of things. And of course these third type is experimental,
which I'm sure you've heard of. The idea that I want to test variations of
a feature on different users and see what works for
them, what doesn't. Maybe I just want to test it on a portion of users
out there and see how it performs. And of course, the fourth type
is the permission type. And the permission type then designates a certain feature for alpha
testing or maybe for specific customers that are likely to
be okay with the risk of preview features, those types
of permissions. And so feature flags give
us a ton of flexibility. We talked about the branching strategy. I can now have
shortlived lived branches I'm no longer coupled to when I
can merge into my main deployable branches at
any time. I can do that and turn it off with a feature flag.
I can ensure that I don't have multiple active release versions where
I'm keeping different branches per version. I no longer need that because I have a
deployable central main branch and this
is really enabling. True, at least the way that I think it should be
done, which is we're not just integrating in isolation in our feature branches
and validating the build, we're actually integrating as multiple features are developed
in a single branches and validating your code earlier. The fact that I can
see that the refactoring that you're doing while I build my changes just
enables us to be slightly faster and have faster feedback loops.
And of course I can ship faster, right? We're enabled to be a lot
more confident in our deploys, and we do that because
while we're shipping all the time now, we're deploying
as opposed to releasing and our rollback when we do have to roll back a
feature is not a change to the immutable artifacts,
it's actually just a change to the feature flags provider to say turn this off
in a lot of cases. And one of my favorite capabilities
that features gives us is once we are in production,
we can test there. We don't need other environments. We can
easily do a b testing, we can easily use canary releases,
and we can release to a set of users that we want to,
maybe even just ourselves for testing. This enables less environments.
At SPS commerce, we had five environments
at one point in time. Now we have that down to two, using feature flags
where we can contextually give access to certain features in those environments.
There's a huge ridiculous overhead to maintaining five environments,
not just infrastructure costs, but promotional costs and overhead that is just
not necessary, not needed when we have some of these capabilities shifted into the
code base. Of course,
when we think about feature flags, we think about culture as well as a large
aspect, which is this idea that product owners are now involved more
integrated as part of the release process for us. They can make some of those
decisions independently of the deploy process.
And when we think about culture, we think about this new term, progressive delivery,
that you may have heard of. And progressive delivery is defined by launchdarkly
as a modern software development lifecycle that builds upon the core tenets of continuous
integration and continuous delivery. It was a term coined by the
folks over at Redmonk working with Azure DevOps team and exploring
a little bit how Microsoft deploys Azure DevOps using
what they call progressive experimentation and rolling through rings of
release at a time and working with them,
then putting together kind of these concepts.
Launchdarkly then of course a feature flagging
service, the most predominant one, provides us then
with a lot of information about how to use progressive delivery and exactly more what
it means by definition. And these three of them together
then define progressive delivery as not this idea that it's something
above continuous delivery. So it's not like I do continuous integration,
then continuous delivery and then progressive delivery. It's different than
that, right? Progressive delivery is a named pattern that we can use
to achieve continuous delivery, which is the breath of fresh air.
If you've been working in that space long and you know that there are so
many different ways to approach it, and how do you actually achieve continuous
delivery and deployment? Well, progressive delivery is one important way to achieve
that. It has two main tenets worth mentioning.
First is the idea of release progressions. You're familiar with that? The idea I want
to progressively roll out to more and more users at a time.
The second part to that is progressive delegation. And progressive delegation
is this idea going back to the culture we were just talking about where
I want to give control to turn on the features flag basically
to these person who owns it at that given time. It might be the
product owner is a key example. It might be someone else in your pipeline.
As you float through that, the engineer no longer has to worry about, okay,
I'll turn this on when I'm asked by so and so. Well, why would you
turn it on? Just get the person who is owning that particular feature, get them,
delegate it to them, whatever group they're in, in order to enable and roll
that features out. When we
think about software delivery and performance, in my mind, the ultimate goal
is really to deliver high quality software. It has to solve a business problem,
and we need to do it as quickly as possible. Speed to market is important.
I love the state of DevOps report from puppet.
The most recent one, 2021, and the others before it define some
key metrics in helping us understand what are high and elite performers
like in shipping high quality software.
And they talk about these four key metrics that are worth noting.
And the reason they're important is because feature flags can actually help us achieve,
in some cases, a bulk of this or a portion of some of these metrics
pretty easily. So if we look at them, deployment frequency,
how often do you deploy? Well, is that whenever you want?
Well, it could be if we knew our code base was always deployable and we
don't really have the facade of it. Well, it's always deployable. I nerd
something and now you have to wait. It's like, no, it's always deployable because
I wouldn't have something in Maine that wasn't behind a feature flag that was controllable.
So that's a pretty big enable lead time
for changes. Then can I get something from inception out
to production in less than an hour? Well, I could if I had the ability
to use some of these advanced release and contextual techniques.
Right, same with meantime to restore my service goes
down. Can I fix it in less than an hour? Well, typically with feature
flags, yeah, especially if I have operational flags that are helping me support
that and use some knobs and levers in order to restore it quite a bit
faster. And my favorite is change failure rate.
And change failure rate is interesting because in
my mind, it's not a metric that makes sense to me
based on the results of this survey. The idea is, when I do
make a change to production, basically I deploy how often do
you fail? Is it less than 5% of the time? And we know
from these report and from others that the more you deploy, the less you should
fail. Which is weird because you think I'm deploying more often. I should
just fail either the same amount percentage wise or more, maybe even
with the number of complexities that are these. But in reality, the more often
that you deploy, and especially using techniques
like progressive delivery, you fail less. And so I'm excited
that that's also proven true,
obviously with feature flags, because if you're using these techniques, you're going to be able
to much more confidently do those deploys
and degrade and not turn on a particular change on the
entire user basic at a given time. At ESPs commerce, we have a fantastic
continuous improvement team who's been tracking our change rate and
our failure rate at SPS commerce over the last many years.
Going back as far as 2014 and as part of our journey, you can see
that these is exactly proven true by our stats
as well. We do about 1000 changes to production a month now on
our platform and you can see that that change failure rate is just at two
to 2.1%, which is fantastic. Using DevOps best
practices, using patterns like feature flagging.
Enough of that. Let's dive in. Let's actually explore what a feature flag is.
And so I want to examine a really simple feature flag with you today.
And this feature flag is just setting up a new user, maybe in an identity
system or platform of your choice, and it comes in with
a user context object. And that user context object might have
username, might have first name, last name, email on it as
an example, and then it goes ahead and creates that.
Our feature flag then is going to check if Sendgrid email is
enabled. So in this case we're going to send a new welcome user email to
our users if we're using Sendgrid. And that's
our feature flag right there. Sendgrid is a popular software as a service provider
for sending emails by API, and so we want to use that as a simple
integration here. And so the new code will simply say hey, use the API to
send the email. But what does that feature flag if statement actually look like?
In this case, we've hard coded return true, but going back to
our definition of a feature flag, we know that we can't hard code this.
While it might be centralized using a couple spots in our application code
base, we can't hard code it to true
because then we can't modify it from outside the code. We have to be able
to change this behavior easily. So you might in its simplest form change
it to be app configuration usengrid. So just check an app configuration
might be a local file, might be a centralized database key value,
but in reality using it as a microservice makes a lot of sense,
doesn't it? And you could just pass that key along to say, hey, should I
be using this feature Sendgrid?
And that would work. That would tell us whether it's on or off. That could
be on a per environment kind of basis. But we also find
that with feature flags, it's not just simply a new piece
of behavior being added, but it's actually an augmentation. So you often end
up with an if and an ifelse statement like this, where we want to send
local SMTP email is actually the old code and we've
placed it inside the else statement now to separate it out as part of the
feature flag. And while this looks good, we've actually
missed out on a ton of value so far, right? We were not able to
truly test this in production the way we want. If I deployed
this out to production and I had it disabled and I want to enable it
to test well, did I configure the API key right for Sendgrid?
I would actually have to turn and enable that on for everyone. So turn it
on for everyone, test it really quick and prod, and then turn it off if
it was failing. That's not the kind of experience we're going for. That's still an
advantage to ensure I can keep my pipeline moving, but I'm not getting the value
after that. And so the key part here is that we actually need to go
back to our if statement and modify it, so that in this case
the is Sangrid enabled has a user that
we pass into the same user context we had. Then we modify our
check here, our method in the bottom to pass that user context along.
So now in production, I can say, is Sangrid email enabled
for Travis specifically? Right? And I can say, yes it is.
And then we can test out Travis with the welcome email message in production
without affecting anyone. Very, very cool.
However, if you've been following around this conversation
so far, you might have had some important realizations of some questions. And that's
where it kind of comes to this stop of our future flagging honeymoon maybe is
over, because there's some important realizations here.
Let's dive into those and explore them just a little bit.
First would be the idea that we are shifting left, right? So we're moving the
complexity that we had previously in our infrastructure, in our deployment
pipelines, and we're now moving that into the code base where we have the code
that is no longer separate branches for different code
paths, but actually in the same branch is different code paths.
But this is good. I'm actually a fan of this because shifting that left means
that we can handle the complexities of some of these releases in our code base.
We can do interesting things now at the runtime as a part of that.
That's where we get our user context. So that's okay.
But it does mean that our maintainability is affected and you need
to be aware of that. It adds complexity. Right. It's much more
difficult to reason about these state of a system at any given
time because it's no longer, is this features out, yes or no,
a binary decision? It's actually, well, was this features enabled
at that time for that user in these environment?
Much more complex question. And you need to have the observability to answer that in
your log statements. In your other things. When you go to debug a problem,
you can't assume the same things about the system. It's actually not
a binary question anymore, not at all.
Additionally, you have management of the flag.
You need to create these flags, you need to manage the lifecycle of them,
the removal of them, lots of interesting aspects. And of course,
one of the key, I guess, scenarios that I looked at when
I was coming into feature flags is I didn't understand how I could have zero
risk with a feature flag going to production when I have a code change in
my mind. Any code change can result in risk right.
Even behind a feature flag there's still the if statement, the binary decision that
happens that we have to consider is a potential
risk to production. And I have broken production with a feature flag. Absolutely. But in
reality, yes, you're making changes, but we
can unit test those changes pretty easily. And like I said, they're binary decisions.
Typically in a code based perspective, they are things that you will add to a
service that you'll have abstractions for, you'll get good at. You will
practice it and it will become second nature and it's not going to be as
big a problem as you might think. But getting started with it, for sure,
there's risk there to understand.
And the other suggestion here is there's a lot of engineering goes into
building a fully aware user contextual feature management provider.
So you want to be aware of that. And so from your perspective, should you
purchase one, should you build one? I like to stay away from undifferentiated
engineering and this is a great space that you can use one of the other
providers in. Let's take a look at some of those providers so you get an
idea of what we're talking about. Might be as simple as
the simple case, which is a key value pair inside a database or a config
file. Might be AWS like parameter store or Microsoft
Azure, app configuration or even console Zookeeper eTcD
have key value pairs that you can target and use
as a simple method of feature management where you're not going to pay extra.
But if you want to use an open source provider that does provide contextual management,
you might use unleash or flag or flipped. But of course if
you do have the option to use a full service, any of these are fantastic.
Launch directly, of course is the predominant leader and is what we use at SPS
commerce. But when you compare that across features management
providers and on this g two grid, focusing on
the feature management space, you can see that launchdarkly is your clear leader,
kind of in the top right there. From a cost perspective, there might be other
good options that you want to consider, including optimizedly, which is
near the top right there, which is great. And you can see where some of
these other providers that we talked about kind of fall into place.
It's worth noting this extract from a
recent Thoughtworks tech radar. If you're not familiar with TechRadar,
it's I believe, a quarterly release where
they talk about different technologies to adopt different technologies that thoughtworks
consultants have seen over the past few
months and whether it's something that you should consider bringing in or
just spiking and taking a look at. In this
discussion talks about the usage of simplest possible feature flag toggles. The idea
is that you don't necessarily need a full service provider to get started, and it
might just be a barrier to entry, especially from a cost perspective. And while I'm
a big fan of some of these full service providers, depending on what you need
for your application and who your audience is, and if it's an internal only service,
it's not always necessary. So you might consider what are the features are,
what are the release capability, and what is the longevity of your project before deciding
on exactly what level of provider you might need. Okay,
let's move on and take a look at a UI routing example. This takes us
to a little different position than our previous example that we looked at,
our simple example, which is kind of in a backend API.
In this example we will basically trim and change out
navigational structure in a UI web application. This is using Launchdarkly's
feature flag router, which is a react component that
we'll use, and it gives you the ability to specify then this fallback
as well as this new feature. And of course if you don't have access to
the feature, then you get the fallback and the existing feature shows up.
And so really we're just security trimming, but instead of based on security, we're just
doing it based on your feature flag status and the contextual
usage for that particular user. You can see here then how that materializes.
The top screen is demonstrating the react application took
a long time to style it, so I hope you appreciate that. And you can
see where it has the those option and the existing feature. I can click on
it and it shows me existing feature in the URL. If I turn on the
feature flag and launch sharkly turn it on, you can see it immediately materializes
on the top as the new feature without me refreshing the
page. And so another consideration that's different is on the UI
you have to evaluate and change the flags very dynamically. You might have
a web application that is a spa, and that spa could
be living on somebody's desktop for many many weeks,
even potentially without being refreshed. So having a live websocket or
long polling connection that can update that in real time could be very essential
for you also using it on the UI. You now have UI connections
from browsers potentially across the world, as opposed to a backend
API only needs feature flags from internally
and is much lower volume potentially, and much lower,
I guess, access points from
a geography perspective. So you'll need to consider the reality of
well, am I building a UI application? Is live
changes to the flag important when you're deciding on your feature decision provider,
and if you're going to use something simple and build your own, or if you're
going to use a full service provider, you can use
feature flags in many different ways inside APIs.
For example, this users API, you might have an existing V one users
endpoint. You might use a feature flag to enable early or preview
access to a V two profiles URL or address that isn't
normally accessible. You might also use a feature flag to
enable or start shifting engineers over to use V
two users automatically if they're using V one
users. If there's a major shift that's happening, or perhaps you haven't versioned in your
URL, you're versioning in a header and you want to transition some of those default
users over. You might also want to test interesting use
cases, so you might want to validate that. Well, actually in V two users,
what happens in this test case scenario? You might bake that test case as
a feature flag you can turn on to create some failure in the system,
something you don't typically think about doing in production, but you can easily do with
a feature flag. And while feature flags don't give
us the ability to really not version
our APIs, we can get away with some interesting small changes that
you might want to experiment with a bit. Perhaps you've accidentally made
your V one users endpoint with a particular request.
You've accidentally made it a 200 on a operations instead of a 204
where it has no content. You might swap that out and
fix that contract without reversioning the whole thing, even though it is contract breaking,
and then monitor the failure rates automated with it and understand that,
okay, that did not break any of our downstream internal clients. I can go
ahead and just make that change without busting out a whole new
major version on your API. So you have some of these options available to
you in different ways. You can slice the usage of feature flags.
One of my favorite ways that I've seen feature flags used is in
a monolithic pattern where we want to strangle out some microservices.
In this case, this was a monolithic gateway API and a database
behind it, and we wanted to pull out a scheduling API that could be used.
And so we built the scheduling API and the scheduling database, and it was all
new and shiny and had new technologies
that we wanted to use in there. And so what we did then is go
back to connect these two using a feature flag. And the feature flag gives us
the ability to start redirecting read traffic over to the scheduling API and
play around with it a bit, even just for internal users only as a point
of getting started. Of course, they have different databases, so we did have to run
a bi directional real time synchronization
between these models and different transformations that were there.
You might do it a different way, though. If you don't want to synchronize the
data continuously, you might actually change and use two flags
here. An interesting way to approach it would be instead of turning on the read
flag first and testing the load, you might actually just turn on a write flag
that doesn't say write either left or right. It actually says, should I write just
to the old one or should I write to both? You start writing to both,
then after one time synchronization of the data,
and then now you can kind of control that your read flag. Then you would
turn on when you're ready to shift the read traffic for certain users to one
or the other both, continuing to remain in sync as you write to both destinations.
So it's an interesting way that you can start to perform some traffic
shaping and some migrations and be very successful.
In our examples of doing this at SPS Commerce,
we turned that flag on and off several times and discovered a lot
of production load issues that you could only really discover in production.
Right. And we were able to affect very little users as part of that,
if not any, just by doing it internally.
You might also use feature flags for coordination. I might use the
same feature flag in the UI in a public API, maybe an internal
API, and then turn on together a feature
in coordination. Now there are implications to that.
Obviously, I want to be able to just turn on a feature, maybe in the
UI without turning on everywhere for personal testing and production.
So maybe I actually want these to be independent feature flags.
But depending on how you slice it, you'll have to consider whether you have a
single flag or different flags. And this starts getting into architecting how you're going
to use your flags across your organization. And unfortunately,
there's other considerations here. You might have to consider, well, what is my user context
if I want to enable this? For a user in UI and a user in
internal one API, those might have two completely
different contexts, and that makes it difficult for me to manage a single flag across
those. And we haven't talked a lot about user context
and so this is a good area to define a little bit about that user
context and what it could be. We talked earlier about how the user context
could be first name, last name, email, could be some type
of user identifier in your system, could be additional user
details that you want to include there for easy reference. You got
to think about from a delegation of that flag will
downstream delegation of it want to use first
name and last name or email in order to target users? How you want to
target it is critical. And so when you're thinking about targeting,
targeting by individual users is nice, especially for testing in
production for your own feature that you're building as an engineer,
but in reality turning it on as a role or a particular
group or an organization, or even internal versus
external employee. Sorry, internal employees versus external customers.
Or even if you have a set of beta customers, you don't have to define
that and do it in a consistent way across your organization.
And that might be something for you to think about as you're architecting it.
In this example here, you can see I'm using user id 123456.
My name is Travis Goslin. I'm initializing in one provider there,
but in another application I might initialize it totally separately. I might initialize
it. Yeah, the user id is the same. So if I target the user id,
I'll get the same consistent feature flag turned on. But if I were
to target this flag and use first name, am I going to get a
consistent enablement of it across applications in coordinating?
No. In fact, these contacts are not equal to each
other at all. Those could be even derived as two different users, and there may
even be a potential monetary problem associated
with that configuration. By that I mean if these are coming out as different users,
you might actually be paying them for twice as many users across your system as
not. So you want to think ahead, you want to provide these, the capability
and an organized strategy in your organization of how you'll use these.
As an example, in our organization we only use the
level component and so we don't actually allow targeting by individual users
unless they're internal. External users are only ever enabled@the.org
level, which is basically a company, a connection within
our network. And that made a lot of sense for how we strategically
position feature flags in our applications. As you
may be realizing, there are so many other scenarios that you can use feature flags
for the sky is really the limit you think about log level verbosity and
the idea that I could have something in production and change it to a lower
level more verbose level of logging.
You could even do that automatically and change a feature flag based on an incoming
error rate. That'd be pretty cool. You might want
to use dynamic configuration, so it's not just a Boolean value
for a feature flag. It can be JSOn blobs or multivariate configurations.
You might want to use kill switches. So the idea that I want to disable
a third party that's acting up, perhaps we are having an issue with a particular
service that's really degrading some of our performance. And so we can
turn off that service, whether it be on the front end or turn off a
feature. Having the ability to kill something is
important, especially as you're putting out there for the first time. Or you can think
about our migration scenario where we killed the new service, the scheduling
API, and shifted back and forth as we needed until we got it right.
We talk a lot about feature flags for the creation of new features,
but you can also sunset features with it in an interesting way. So as
an example you might say, well, I'll place a stake in the sand now,
so no net new customers are going to get this feature, and then I take
it away from them. And then you can pass it off to your marketing
team, to your customer success team in order to work with your existing customers to
downgrade that particular feature, remove it from them as they're able to
do so, goes back to our progressive delegation as a part of progressive delivery
capability. And of course, timed features are always
cool and interesting. You have the ability to think about a timed feature in
the sense of holiday release, something that you
want to specify for a certain schedule, for its appearance.
And while that's all the time we have for today talking about feature flags,
there are obviously many other topics to dive into and explore as you architect
feature flags across your organization. But I think you'll find that if
you separate out, deploy and release and decouple them from each other,
the value that that provides in terms of velocity, true CI
decoupling, testing and production is
just very high and very
valuable in what is provided to SPS commerce and encourage you. If you're not using
feature flags, this is a place that you want to explore
to be high or a lead performer. In DevOps,
however, we did talk about a little bit of it. It's not free,
right? There's a price to pay for some of these things. I am changing code.
There is still risk. There is additional complexities here to
worry about, especially observability. But I do believe that it is absolutely
worth it for the value you're going to get. And I do think that a
lot of those risks mitigate as you practice it and as it becomes just another
one of your patterns. So thank you.
Happy to reach out and chat some more about future flags.
Take care.