Transcript
This transcript was autogenerated. To make changes, submit a PR.
Alright, welcome to Conf 42 Cloud native and
welcome to my talk about the serverless development and some
patterns for having an efficient development workflow for
serverless. So let me start the slide right
here. So yeah, this is the patterns for efficient serverless
development. A quick introduction my name is Yen Trey.
I've been AWS user since 2010.
Nowadays I work as a developer advocate for Lumigo and
Lumigo is probably the best observability platform for
serverless. The other half my time I work as an independent consultant
where I help other companies go faster for less by helping
them adopt serverless. And when it comes to having an efficient
development workflow for serverless, I think you really need three
things. Number one, you need to have a good testing approach and
you need to be able to deploy your application easily, and you
also need a way to manage your environment efficiently.
And all three of these things should really help
each other. And as a consultant,
the number one question I get is how do I
test my serverless architecture? And the number
one way that I prefer to do this, especially when
it comes to testing my lambda functions, is what
I would call remote code testing, which is a
big part of my, I guess the day to day workflow
when it comes to serverless architectures and how I test them.
When you think about the local testing where you run your code
locally against mocks, it lets you use debuggers to
step through the code, which is very useful, and you can
test your code without having to wait for full deployment
cycle, and that gives you a really fast feedback loop.
But a problem with testing against mocks locally is that
what you're asking is, is my code doing what I expect it
to do? Not if my code is actually working.
There is a subtle difference there in that one is
based on reality and the other is based on your expectations
and assumptions, which unfortunately,
sometimes your assumptions about how something works is just
wrong. And I've seen lots of examples of applications
that pass all the tests only to fail the first
time it runs in AWS because you're using
the same flawed assumptions to write your
code and your test. And so your test
is not checking your expectations against reality,
which has just a habit of slapping you right down to earth.
And so local testing is prone to false positives
and gives you a fairly low confidence. And it doesn't cover
a big part of your application like iem permissions,
API gateway authentication, and basically any direct
integration you have between different services, amongst other
things. All of these are important because your application
is more than just your code. And your job
as a software engineer is to make sure that all of it works. Because the
whole idea is that at the end of it, user gets something that works.
Not just that, oh, my code works, but some other integration that
I didn't test is not working. Customers are not going to care
if the problem is with your configuration and not your
code. And that's where remote testing comes
in. You're testing against the cloud, so it's far more realistic,
and it covers basically everything along the way,
including IAM permissions and your configurations and security
and so on. So you get far more confidence from this test
that your application actually works the way you expect them to.
But having to wait for a full deployment cycle can
be very slow and painful, and it can really slow down your
feedback loop. And especially if you are making small code changes
constantly. And every time you make a small code change,
you have to wait for a full deployment that can be really, really painful.
And there are some frameworks that give you some help with this,
but they're fairly limited. And I really don't like the idea of
having to switch frameworks just so that they
make testing a bit easier. And as a consultant, I work with
clients who use all kind of different tools and languages,
so I prefer approaches that anybody can use with
a minimum amount of custom adaptation. And so the approach
that I take is to combine local testing with remote
testing. Hence the name remote testing,
where you run your code locally, but you're talking to real AWS
services as much as possible. And you can
still use debuggers to step through your code because you're running them locally.
And the tests are far more realistic because you are testing your
code against the real AWS services, but you
don't need a full deployment every single time you make a code change,
because again, you're running tests against your code that are executing
locally but talking to remote services.
But this kind of testing is still just only looking at your lambda functions and
how they interact with your downstream systems,
like your dynamdb tables or SNS topics or queues
or whatnot. But what about all the stuff that are in front
of your lambda function? A lot of things can still go wrong there,
like im permissions, or absent resolver
templates or eventbridge event patterns.
Your code might be fine, but if any of these configurations
are wrong, then your application is not going to work properly.
And remember, all of these configurations are still your responsibility,
and you need to make sure that all of them are working together.
And so if I was to look at an API gateway API,
then there are multiple things I want to be able to test for
my lambda functions. I can encapsulate all of my
domain logic into separate modules, and I can unit test them
individually. And then I can use remote code testing to check
my code integration with other things like other AWS services.
And I can also use end to end tests to exercise API
endpoints directly once they've been deployed, which would
invoke my lambda function and check the IAM permissions as part of
those test cases. And when I'm using API gateway
to say validate or to transform the request or
response, or to call other AWS services directly,
then I can still use those end to end tests to check that
any configuration in API gateway is correct.
Similarly, if I'm asking API gateway to do all the authentication and
authorization, then they'll be checked as part of my end to end test.
If I'm using lambda authorizer, then I can also test that authorizer
logic independently with unit and the remote test where
appropriate. So that's the theory. Let's look at
how this can work in practice by looking at a really short demo
and let me switch over to a different view.
And so I've got a repo set up here. As you can
see, I've just deployed my application, and in this case this is
using a service framework, and I've deployed my application to
a new stage called the conf fourty two.
And here I can execute some tests. I've got some test script
prepared in my package JSON, which allows
me to run my tests against whatever environment that
I've just deployed to. And I've got some test cases set
up for my application, which consists of an API with API gateway
and some direct service integration. The API gateway is going to talk to
dynamdb directly without a lambda function.
But I also got a bunch of lambda functions down below here.
Let's look at one that just adds some restaurant data
to a dynamDb table called Restaurants table
in this case protected is hosted by
APAC gateway, and it has got IAM authentication
and authorization. And there's some schema for
the request so that once it's deployed in APAC gateway,
API Gateway is responsible for checking the
schema according to this json structure I've got here,
which says the body of the post request must have
a name property which is a string.
So some of this we can test in our code by checking
that our code in the lambda function is working correctly. In this
case, it's just writing to a dynamDB table to
insert a new restaurant data and return 200.
But to make sure that the whole thing actually works, I also need to have
some coverage for what this API
gateway is going to do to validate the schema of the request.
So to look at my test, I've got some test cases set up.
I've got a test case here that just checks my code,
my lambda function, which I can invoke as part
of remote tests, or in this case I'm calling them integration tests,
but they can also be executed as part of an end to end test.
And I've also got a schema test which just checks the schema
validation is working. And because this can only be done in API
gateway in the ABIS environment,
so this will only be triggered as part of an end to end test that
will be run. So if you look at the test
here, that allows me to test my code locally. So the
magic here happens in the when module, when we say,
when we invoke the add restaurant endpoint. So it's
going to look at some environment variable called test mode,
and based on whatever the value is, when we say when
we invoke add restaurant with some payload, it's going to
do something slightly different. So we want to see the case when
we want to run the remote tests that allows us to make
changes on my code locally and then execute it and
potentially even put breakpoints in the code so that we can step through
any bugs or errors that we capture. So what this is doing
is via this via handler helper function, which just requires a
handler module locally, execute it with a stubbed
context and event, and then it's just going to return
and do some transformation in this case so that we are dealing with JSON
lessons body as opposed to a string.
So this just allows us to write our test so that we can check the
status code that comes back is 200 and that ID
that comes back is not. No. And we're able to do some validation
to say that, okay, given the ID that comes back from the
add restaurant function that we invoke locally,
this new restaurant ID is going to exist in the DynamDb
table. So we can say then the restaurant exists
in Dunhamdb by looking up that ID in the
restaurants table and make sure that the restaurant item
actually exists. So in this case, if I was to use
the JavaScript debug terminal, I can run the
script I prepared to say test integration.
Then we can see that that test case is going to run. And if
the demo got up with us today, then yeah, all the tests are passing.
And what's also nice about this is that I can make some code changes
and I can quickly iterate that. Okay, maybe let's
say if I just comment out this bit of code and we run a test
again. Now the test should fail because the restaurant is no
longer being added to the database.
Okay, so that was unexpected. So let's try and debug this.
So we've got a code that's commented out, but it's still somehow passing
the test. So let's check. Okay,
go down here, put a breakpoint in our test, and let's
run the test again and see what's
actually going on. Okay, so the restaurant came back
okay, right. So this is because I've got a bug in my test, not the
code itself. So the restaurant did came back as undefined, but our
test case wasn't handling that. So let's just say
this to be truthy.
And now let's rerun our test case again. And now
the test should fail because this should now come back
as undefined and should fail the test. And so we
can quickly make changes, test them, and in this
case, let me just revert the change and make sure now the test is passing
so we're able to put breakpoints in our code and in our test so
that, okay, let me run it this again so you can see the breakpoint hitting
my lambda function. So we're able to step through the code step by step
and quickly debug problems and fix them without having
to wait for full deployment to AWS, I think this time is
just going to be timeout because I sat there for a little while.
Yeah, timed out. So if you do need to raise this timeout
for when you need to debug through the code line by line, you can do
that as well. So as you can see, I hope you can see that this
is going to make it a lot easier for you to troubleshoot the failing test
and iterate on your code gradually. And because
we're using a temporary environments as well, so that any changes
we're making is only going to be going towards this dev Conf
42 environment I've just created. And as I mentioned earlier,
that there are some changes, while some of the business logic can only be tested,
such as the schema validation, if we run against the real
deployment AWS services, in this case the API gateway endpoint.
So we can also run the end to end test that I've prepared
and in this case I've got a separate script here called test
end to end. And this is actually going to call the API endpoints
that's been deployed in the AWS environment by
putting a post request. And in some cases we have involves
creating a new user in the cognito because some of these endpoints
are protected by cognito. Let's see,
Cognito user pool. So also involves creating these users
as well. So we can look at some of these tests there where
we have to create an authenticated user
if the test mode is end to end, and then we invoke the
endpoint when the environment
variable says it's going to be end to end. So we have to then authenticate
the user and then take the user's ID token and use that to invoke
HTTP endpoint in API gateway.
So having test cases like this allows me to have both
reusable test cases like this one here, where we can
run this test either as a remote test while
we are working on the code, but once we are happy with the code changes,
we are happy that it should work, but we want to test the whole thing,
make sure that all the configurations are working together. Then we deploy our
changes and then we run the test case again by
running it as the end to end test, because the test is written in such
a way that it's not tied to the implementations and can be toggled between either
remote or end to end test. So by
just switching the environment variable that we configure in the test
script test mode, so we're able to reuse
some of the test cases. So it takes less effort for us to maintain a
large suite of different test cases and we can use the remote
test to iterate quickly, but still have the entrance test
to make sure that we have full coverage of everything that our
application is doing. So as you can see, the main downside
of remote testing is that you do need to production the
AWS resources that your code is going to depend on,
like the dynamdB tables, et cetera, before you can run your tests.
So this open up problems where multiple people
need to work on the same project at the same time, and people
are going to be stepping on each other's toes and that's where the
use of fmero environments comes in, which is easily
the most impactful practice that has evolved with server
technologies. But I'm jumping ahead of schedule here.
Before we can talk about Fmaro environments, let's talk about deployments.
Specifically why you should keep your
deployments as simple as possible, but no simpler.
I wish we don't have to talk about this, but unfortunately I see
a lot of clients and the students suffer from self inflicted wounds
when it comes to how they deploy their serverless application.
You see, the lambda service carries some of the blame
here because it's no longer this simple thing,
and nowadays it has a lot of additional features like lambda
layers, or the option to package your function
as container images, or the ability to create your own
custom runtimes, or to use a provision concurrency to
keep a number of lambda workers around all the time
so that you can mitigate costarts. All of those options
are great, and they're useful in some use cases, but I
think just because they're there doesn't mean that you have
to use them. In fact, I'll go as far as to say that
for 90% of use cases you shouldn't use any of these
options. And for lambda layers I will go
even further and say that you shouldn't use them to share code between
lambda functions at all, because they complicate your deployments
and make things more difficult than they need to be without
really giving you any meaningful return on investment.
They don't support semantic versioning, and because they exist
outside of your language's ecosystem security scanners
don't know about them and can't automatically scan them
and check them against their database of known vulnerabilities.
And you're limited to just five lambda layers per
function, and they still count towards your lambda functions
250 meg size limit once it's been unzipped.
So they don't help you sort of mitigate some of those lambda
limits, and again, because it exists outside of your
language's ecosystem, so it's going to make it harder for
you to test your code locally as well if some of
your dependencies exist outside of your local
execution environment and only exists in AWS as lambda layers,
and because your language runtime doesn't know about
them. So it's then up to you to find a way to bring them into
your local development environment so that you are able to execute your code
locally. And they were designed to
help with the likes of Python and JavaScript, and they
don't really work for static languages like Java or
net that requires your dependencies to be available at
compile time. And honestly,
compared to package managers like NPM,
it's just more work to publish updates to your shared code
and then to bring them into where they're needed in your local development
environment, as well as into your project, and for anyone who's using
JavaScript, because again, they exist outside of your
NPM ecosystem, they don't really work with bundling
and tree shaking as well. So that's a whole laundry
list of reasons why lambda layers just don't add any value.
So instead, if you're sharing code between functions in the same
project, I will just put them in a folder and reference them directly
during deployment. Just make sure that both of
your lambda functions handle the module as well as
your shared library. Modules are included in the same
zip file, or is bundled into a single file if
you're using a bundler and this is supported by most frameworks
like SAM or serverless framework or CDK.
And if you need to shared code between different projects, then publish
them to NPM and use a private NPM registry if you
need to. It's simple and it's what we do already outside
of the context of lambda, and it works just fine
when you're writing lambda functions as well.
And the next thing I'll say about deployment is that as much as
possible, you should just stick with using zip files and manage
runtimes. There's a time when you need to package your function
as a container image, for example when your application is bigger than
the 250 meg size limit, but it also means
that you become responsible for the runtime as well as
your code and your dependencies. And wherever possible,
I want to delegate responsibilities to the cloud so
I can focus on just my application. Which is why
for an efficient development flow, I recommend that you
stick with using zip files and manage runtimes, and stay
away from using lambda layers.
And that brings us to how do you manage your AWS environments?
I will start by just saying that, okay, you should have at a minimum
one account per stage so that you have one account for
Dev, one account for test, one account for staging,
and a separate account for production, so that if there's any problems in
terms of throughput or in terms of a security breach,
they are contained into a single account, so that if your dev
account gets compromised, at least the attacker won't be able to
access your user data. In production and for large
organizations with many different teams, then I would go as far go
a bit further and say that you should have one account per team per stage,
so that different teams are also insulated from
each other, and if one team makes a mistake or they have a really busy
service, they're not going to use up all the available throughput in the
account for everybody, and saying that if
you got say one team with different workloads,
some of them are more business critical than others, or they
have higher throughput than others, then it's also recommended.
Well, I recommend you putting those business critical workflows
into their own separate accounts. So for a particular service you may
have dev test staging production account just for that service,
but for the team and all of the other services that you maintain
as a team, I will have dev test staging and production accounts for
all the other serverless shared together. That way,
for the less critical workflows, you have a separate dev
test staging production account for though then for the things that
are really business critical or have a
much higher throughput, you have a separate set of accounts for those.
And as I mentioned earlier, the use of FML environments
or temporary environments is perhaps one of the most impactful
things you can do that's going to improve your development workflow when it
comes to working with serverless technologies. And it can work as simple
as just using the server framework as example, that when you
start to work on a new feature, create a new environment.
And within the server framework you just run deploy command
with a flag for the stage override and say name
your stage after the feature. So dev my
feature for example, this way it's going to create completely new environment
with all of your functions and dynamdb tables and whatnot,
so that you are able to then iterate on your code changes,
run your remote tests against them, or end to end tests as well
against this temporary environment that you've created just for this
feature. And once you're ready, you can commit your code
changes, submit your PR, and as part of the Ci CD
pipeline, you're going to run all of the tests against your
changes. And when you're done, you can just
run the serverless remove command against your
temporary environment, in this case using the stage override in
the CLI to say use the stage name of dev my feature
and this would delete your temporary environment as if it
never existed in the first place. In this case, because all of your
tests is run against your temporary environment, so you've kept
your main stages like dev test and staging and
production cleans, and so everybody does no
need to clean up any data. And because every time you start to work on
a new feature and every developer can have their own environment
to work in, so there's no need to worry about people stepping on
each other's toes as well, because you're able to
work against your own insulated environment and basically
stay out of each other's way, and because able
to keep these shared environments like Dev and test and
staging clean, you also by
not polluting them with the test data, because all of your work is
done and all of your tests is done against those temporary environments
as well. And one of the really nice things about using
temporary environments or FML environments with serverless components is
that because you've got this usage based pricing, you can
have as many of these environments as you need, and there's no extra
cost overhead because those environments are going to sit there.
If there's no traffic, then you're not going to pay anything because you
only pay for what you use. But sometimes you have to
use serverful components, things that charges you
based on uptime, things like RDS or opensearch
where you've got a cluster, you're sitting there, you're paying for them by the
second, even if nobody uses them. So when you're using
FML environment, you have to do a little bit tweak to your
workflow to make sure that your serverless components are
not created with every single temporary environment.
Instead, you have to do some work to make sure that these
stateful resources, these serverful resources, sorry, not staple,
serverless resources are shared across this FML
environment. I've written about this before, so give this blog post a
read afterwards and you'll see why
it's not as bad as you may think. And you can also use this
FML environment for your CI CD pipeline as well, so that when
the pipeline runs, you can create a fresh environment every single
time and run the test against them. So that
once the pipeline is finished, you can also destroy the
environments as well, so that again, you avoid polluting
your main stages like dev test and staging and so
on. At this point you probably got a sense that when I talk
about an environment, I don't necessarily mean an AWS account,
because again, when you have your separate accounts for each of your main
environments or stages, you can actually have an
AWS account that hosts multiple environments running
in the same region. And normally I'll do this
against the dev account for all the temporary environments,
because that's where my developers are going to be working in most of the
time. And depending on your choice of deployment framework,
an environment might be a serverless project with a
serverless or YAML or a cloudformation stack. Or if
it's a case of CDK, it could be a CDK app that consists
of multiple stacks,
or maybe it's a combination of both cloud formation stacks and other
things that are outside of cloudformation, such as infrastructure that are created
as part of your lending zone for every single account or
SSM parameters, basically anything that you need to run your
application. So an important part of making this
work is that you need to make sure your resource names don't
clash. And there are essentially two main
things that you need to do to make sure that's the case.
Number one is don't explicitly name any resources
unless you have to, and basically let cloudformation name them
for you, which will make sure that there's some random bit at the end.
And number two is that when you have to name a resource,
which is the case for Eventbridge buses, for example,
then make sure you include the name of the environment as a suffix or
prefix in the name of that resource again, so that
when you create a separate another environment in the same account,
you're not going to have any name clashes of resources.
And with FML environments, they work really well with remote
testing, because again, remote testing requires having those
resources in AWS to be able to run your tests against them.
And when you have a temporary environment for every single
feature you're going to be working on, or every developer that
are part of the team, then it becomes really easy to create a
temporary environment for just you or that piece of feature
and do your changes, run your remote tests, and so
you're able to iterate on your code quickly without having to wait for deployment between
every single small change, and you're able to put the breakpoints
in your code and debug them and so on. But then once you're done,
you're able to then promote your code changes and delete the environment so that
again, you avoid computing the main stages you
have as a team. So those are the three things that I
think that's very important for having an efficient development workflow
when it comes to serverless technologies. And I hope I've given you some ideas in
terms of what you could do to improve your workflow to make
it more efficient and easier for you to work with serverless technologies.
This is a starting point only. There's also other things you need to
worry about in terms of actually building and running a production ready serverless
application. So if you have any questions, please feel free to
reach out to me afterwards, and I hope you enjoy the rest of the conference.