Transcript
This transcript was autogenerated. To make changes, submit a PR.
Everybody, my name is ran Isenberg and I want to talk to you today
about how you can level up your CCD pipeline
with AWS smart feature flags.
So let's start it. So let's say that you've
just deployed your new service, your new feature to your AWS
account, your production account, and everything seems fine
at the beginning. However, as time goes by, you realize
that you have a problem, something is not working. You need to
revert the feature and you need to do it as soon as possible.
What you're trying to do essentially is to change the behavior of your
service. And this capability is a very important one,
changing the behavior of your service. And I can think of
another two cases where this is very useful. One case
is canary deployments, gradually deploying
a new feature and changing the behavior of your service gradually.
Let's say at the beginning for 10% of the customers, then 20% of
the customers, all the way to 100. And during that time
if there's an error, you basically want to revert the
behavior change automatically and quickly.
And lastly, another use case is a b testing.
And in a b testing, what you want to do is basically enable
a feature change the behavior of your service for a subset
of customers. So let's say you have a premium set of
customers that you want to enable them, a premium set
of features, right? So this is how you can do it with
a b testing. So now comes the question,
how do you do that? How do you do all these free capabilities?
Three capabilities? Well, the answer is obviously feature
flags. And this is the main topic of my talk
today, and I'm going to show you how you can do it on your edibus
account. And we're going to use edibus app config and
an SDK that I wrote and contributed to Edibus Lambda
power tools. So, a little bit about myself my name
is ran Isenberg. I'm a principal software architect
at Cyberark. I'm an edibles community builder
and I maintain and write at my serverless
blog website, runthebuilder Cloud, where I share my
serverless knowledge and experience. So what
are we going to talk today? What we're going to talk about today?
We're going to talk about what are the requirements for these
capabilities. We're going
to discuss the functional and non functional requirements for a
solution. And since feature flags are configuration,
we're going to discuss the configuration types, how we're going to implement
the feature flags. We have dynamic and static configurations.
I'm going to show you in deep dive, the AWS app
config and Lambda Powertools solution. We're going to talk
about smart feature flags and what's smart about them. And lastly,
we're going to discuss the best practices for using feature flags
from development to testing to production.
So let's start with the requirements.
So if I recall, I said that you
want to have the ability to quickly roll back any feature
to change the behavior as soon as possible. We want to have
the gradual deployment of features and an automatic rollback in
case of an issue, and we want to have a b testing.
In addition, since this is an AWS solution only, we wanted to
support both lambda functions and containers. And another
requirement that was important to my company, but I think it should
be also important to you is fendrop high certification.
And lastly, there's a non functional requirement.
Any solution should be really easy to use and integrate
into my service and my CACD pipeline, and I want it to
be self managed and resilient. I don't want to worry about
backups or high availability of the feature flags solution.
So feature flags are a type of configuration,
and a configuration is essentially a collection of
settings that influence and change the behavior of your service.
And in this example, you can see a naive feature flags
implementation that I wrote. I have a simple
function, I evaluate, I have a magic function
that does evaluate feature flags for me. We're going
to discuss what it does later on and it returns me
a boolean, and then I have a simple if else if the feature
flag is enabled, I'm going to handle the new feature logic.
Otherwise I'm going to do the same old service logic
and it will not change my behavior. So this is a very naive implementation,
but it works. So let's
discuss the configuration types. We have dynamic and
static configurations that we can use for feature flags.
What is a static configuration? So a static configuration,
in this case I'm going to use the example of lambda functions
because this is what I use, but it can also be containers.
So in this case, when I upload my
lambda function, when my CI CD pipeline, my service
CI CD pipeline uploads a lambda function to the cloud, to my account,
it bundles my handler code with environment
valve. It defines the environment variables, and also it
can bundle in the zip study configuration files could be
just JSON files. So they're part of the zip files that
goes to AWS and it's deployed. And if I
want to make a change to
the static configuration, I just need to run the CICD pipeline again
and go through all the gates and the tests, et cetera,
to build the zip file and deploy it to my production account.
Dynamic, on the other hand, are a bit different. So I still have my
service CI CD pipeline and I still create my lambda files,
my lambda zip file, and I deploy it to AWS.
However, the lambda does not have the
configuration statically in its zip file.
It uses an API call to fetch the configuration from
an external resource, some configuration resource that is deployed
by another CI CD pipeline, a dedicated CI CD pipeline,
just for the configuration. Okay,
so in this case, if I want to make a change to the lambda behavior,
all I need to do is deploy the configuration CI CD
pipeline, which is much quicker, it has less tests and
less resources to deploy and it's much quicker. And then when
the lambda checks for the new configuration, it's going to get the new values and
it's going to change the behavior accordingly.
So let's sum it up, static versus dynamic.
So static again, we're reading the configuration from the bundled
resources, the JSON files in the zip or environment variables.
In dynamic, we're using an API call in
static, if you want to make a change, you need to rerun the service CICD
pipeline. And in the dynamic we need to run
the configuration CI CD pipeline, which is quicker.
We do have the complexity in dynamic of
another pipeline to manage, but since it allows for
really quick changes in service behavior, this is a winner.
We're going to use dynamic configuration for our feature
flags implementation.
So now that we understand how to do the feature flags,
how to implement them, let's go other the solution
we're going to use a JSON configuration file as part of
the development stage. We're going to deploy it to AWS app config with its
own CICD pipeline. Like we said, it's a dynamics configuration
file configuration. And then we're going to use the SDK
in lambda power tools for feature flags to evaluate in runtime
and get the feature flags from ADLs app config.
So this is a sample JSON file with just a premium
features where default value is false. The feature is disabled
by default in this case.
Now we're going to show again
bring up the dynamic diagram from
before, and here we can see that now we're deploying a JSON file
that is translated into an AWS app config configuration
resource. And my lambda is going to check new
configuration from app config and fetch the
values in runtime with an API call.
So why did I choose AWS app config?
What's so great about it. Okay, so first of all,
it's an AWS integrated service. I don't need to add another
third party service outside of AWS account. I don't need
to have any traffic going outside my account, so it's more
secured. I don't need to using into go into
the process of security evaluations and
all those corporate processes
that go into when you're adding third party integrations.
It's part of AWS and I can just use it. It's one of the few
solutions, if not the only one I believe, that has fedrump
high certification for feature flags. It's fully managed,
so I don't need to care about backups and high availability.
It's always there, it's always working. It has
a great feature for validating JSON schemas, so I can define
a schema for my configuration. So if somebody
tries to upload a malformed or some problematic
schema, it will just fail the deployment and
my environment will be just fine.
And it has deployment strategies. So when you deploy configuration,
you can choose canary deployments, which if you recall,
is one of our functional requirements. So it has it out of
the box. So it's great. I can do canary deployments
and define AWS Cloudwatch
alarms that if they trigger during the canary deployments,
I'm going to have the automatic rollback and go back to the previous
version of my configuration. So all in all, it has great
features that answer many of my requirements.
So this is how the console looks like in app config.
You need to define an application. An application can
be just your microservice or service. In this case, it's called
a test service. And each application has
an environment, and environment
can be dev test, production, et cetera.
And each environment has the configuration, which on
the bottom right, you can see it has a version, it has a name on
the left, and it has a deployment status if you chose canary
deployments. So now that we
know how to deploy the configuration, we're going to use app config
dynamic pipeline. Let's talk about the
evaluation of the function of the feature flags in runtime.
We're using to use AWS Lambda power tools,
we're going to use Python, which is what I
developed. But it's a very simple solution,
so you can really write it in your own language of
choice. We're using here edibles, APIs, and some Python
code. So the examples are going to be Python, since the solution is Python
based. So for those who don't know edibles
Lambda Powertools is an amazing repository. It basically defines
all the best practices for AWS lambda logging,
tracing, input validation, and feature flags
are defined and you can use their utilities to do
that. It has over 1 million downloads per month, so it's
very popular. And we're going to use the feature
flags utility, which I designed and contributed to edible
Aslam powertools. And what it essentially does, it fetches
configurations from app config. It stores it
in an in memory cache, it evaluates the feature
flags value for you, and it has something very interesting. It has
a support for regular and smart feature flags. And I'm
going to discuss smart feature flag later on. And just
to clarify, it's not just for a lambda function, even though
the name says lambda, you can use it also in containers.
So let's go back to the simple use case.
We have a regular feature flag,
a 10% of campaign, and the default value is going
to be, let's say the feature is enabled by default.
And this is how you're going to use the code. In line three,
we're going to define the app config configuration, the environment,
the application, and the configuration name. In line nine,
we're going to define the instance of
our SDK with the in memory cache. We're going to initialize
it. And then in line twelve we're going to evaluate, right, this is the magic
function. We're going to evaluate the feature flags,
10% off, and we're going to get a boolean value back,
apply discount, and then you can see the navy implementation again
in line 15. If apply discount, change the behavior,
do something new. Otherwise do the
old behavior. And something
important to note that in line 13 I'm using the default value
equals false. Why is that?
Well, what if somebody deployed
a new configuration and just removed the feature 10%
of campaign from the configuration? I don't want my
code, my lambda function to crash,
so I'm going to have a fallback, a default value. So in case it
doesn't find the feature flag in the
configuration, it's going to have a default value.
So now I'm going to show you smart feature flags,
which are very cool. First of
all, they enable you a b testing, which is the final requirement
that we didn't answer yet. So how does
it do that? Basically, the feature flags will
change value according to your input. You have a context input
that you provide, and it has a rule engine that checks
if the rule matches. And if they do, they return the value that the
rule defines. So you can have for one input the
value can be the feature flex value can be false.
But if you provide a different value input, it can
be true. So one configuration and different behavior and
it allows you to do a b testing and I'm using to show you how
in a second.
So let's take a look at this sample configuration. Let's assume
that we have on the left our input event to our lambda,
we have usernames and each user has a tier. In this
case the tier
is premium, but it can also be standard.
And on the right we can see the configuration that we have.
In line 17 we have the regular feature flags. And in line two
we have the smart feature flag. So again it has a default value of
false in line three, but then it have the smart
rule engines. It has the rules in line four defined. It has one
rule, it says customer tier equals premium.
And if the customer tier is premium,
then line six says then the feature flag is going to be
true. Right? And in order for the rule to match,
all the conditions need to apply need to match,
need to value it true. So here we have a set of conditions,
just one. And it means that the
tier, which is the key in the input needs to have
a value of premium. And the key tier
needs to equal to the value premium, right?
Because the action is equal. So tier and
the value need to be equal.
So let's see it here in this example.
So the same code applies here. It's the same thing
as we had before, but we have the context in line 13
where we were building the input context. So we have the key
tier and then we have the value. It can be standard or premium.
So if you recall, if tier, the key is
going to be standard, then the feature flag is going to be the rule does
not match, it's going to be default, false. If tier has a value
of premium equals premium,
then the rule is going to match. And the has premium features
in line 17 is going to be true. So in line
17 we just call the same evaluate function, but we provide
the optional context. Okay, so then
if it's premium tier, line 19 is going to trigger and
we're going to enable the premium features. Otherwise for
another user we're going to have different behavior.
So that way you can do a b testing between different users
with the same configuration.
And there are over ten actions that you can use.
You can see more in the website. You have start with
keen value, et cetera, over ten actions.
And also you have non boolean feature flags.
You can use any
valid JSON value can be a list of strings, integers, et cetera. In this
case I'm using a list of strings where I
want the premium tier to have special actions that I
do on their account, like remove limits and remove ads.
But the default for the non premium users is going to
be no special action is going to be applied.
So you can use this for all sorts of sample rules. You can enable
it for a specific customer, maybe an admin of a customer,
apply discount for specific types of products, offer free shipping
if the cost is higher than some number. You can
have so many possibilities here, and it's very flexible.
So like I said, we're going to use it for a b testing,
and you can have different user experiences for different users with
just one single configuration which does not change.
So if I recall, I've mentioned that there is an
in memory cache. Why is that important?
Because each call to AWS app config to fetch configuration
costs money and we want to save some money.
So the in memory cache says that if the cache does
not expire, we do not fetch the new configuration and
we save money. And you can define what number of seconds you want to
have. And it's important to remember that it's
a balance between cost saving
and having the service change its behavior
as soon as possible. Because if the cache doesn't expire,
the service will not fetch a new configuration.
And by the way, I'm adding very soon,
hopefully this month I'm adding time best rules where you can enable
rules and feature flags at specific times,
enable features for a specific duration,
or enable them during specific days.
And now lastly, we're going to discuss the feature flags
best practices that we're going
to use across all the stages of our pipeline of our
development, from the build to
the testing to deployment and production.
So in my eyes, the development team needs to own the process from
start to end. They need to write the
configuration JSON files, they need to write the code that
evaluates it and behaves accordingly.
And they need to start where the features are enabled
in best and dev accounts, but disabled in production.
And when it comes to best, well, we're going to use mocks,
we're going to mock the configuration in our tests so we have better
control on the outcome. And obviously we're
going to mock the feature when it's enabled and tested.
All the side effects and everything is working just fine. But it's very important to
mock the feature as also disabled, because sometimes
you don't have a simple if statement, if feature is
enabled, do something sometimes it's more complicated and it's really important
that that part of logic is tested.
We want to assert that the logic,
the function that handles the feature flag when it is enabled,
does not run when the feature is actually disabled.
We actually had a bug where our feature was marked as
false, but due to a bug in the if statement, it was
a complicated one. The feature actually ran and we had
some problem in production. So it's very important to test that.
Then once you decide that the feature is stable in the non production
environments, you can go ahead and run a deployment strategy
to production and use canary deployments.
Epcofing has you covered for
that. And you should define cloud watch alarms on
errors for your service so you can auto revert
sorry, you can auto revert your configuration if there's
an error.
Now, what happens if for some reason at some
later time you do have some errors in your feature,
things that you didn't find in the tests? Well, you should disable the
feature as soon as possible and run the configuration CI
CD pipeline again. You should update the tests and
add the missing use cases and just do the
whole thing again. Just deploy and re rulebased again. And I suggest that
you also do a retro meeting where you identify why,
how come you missed those use cases in the test, how come you had this
bug in production and eventually
you need to retire the feature flags. And why you should do that? Well,
because feature flags, they add code complexity, you have more best
around it, you have more mocks, you have more if statement and branching
in your code. It's more complicated. So at
some point we want to retire the features and remove the code. How do we
do that? How do we do that? We're meeting once a month
and then we can discuss and select candidates for removal
for feature flags to remove. And then all we need to do is just run
the configuration CI CD pipeline again and monitor
that everything is okay. How do we
select candidates for removal? Well, if the
feature has been enabled to all the customers for several weeks and it's been
stable, there are no bugs around it, the feedback
of the customers has been very positive and you don't have any open
issues. And if you don't expect any changes in
the code around that area, then you should totally just retire the
feature and make your code simpler.
So let's sum it all up. We created feature flags,
smart and regular. We deployed app config. We used lambda
power tools to fetch and evaluate the configuration feature flags.
We had canary deployments. We learned how to do a b testing,
and we learned how to do what are the feature flags?
Best practices in the development stages
all the way to production. So thank you
very much. That's been my talk. And you can follow me
on my twitter and my, my LinkedIn and check out my website,
runthebuilder cloud, where I talk about all things serverless.
Thank you very much and have a good day.