Patterns for efficient serverless development

Video size:

Abstract

Serverless computing, its paradigm shift, and lessons learned from running serverless in production will be shared.

Summary

Yen Trey is a developer advocate for Lumigo and Lumigo is probably the best observability platform for serverless. He says the number one question he gets is how do I test my serverless architecture? He prefers to combine local testing with remote testing, which is far more realistic.
Fmero makes it easier for you to troubleshoot the failing test and iterate on your code gradually. The main downside of remote testing is that you do need to production the AWS resources before you can run your tests. This is where the use of fmero environments comes in.
A lot of clients and students suffer from self inflicted wounds when it comes to how they deploy their serverless application. For 90% of use cases you shouldn't use any of these options. As much as possible, you should stick with using zip files and manage runtimes.
Use of FML environments or temporary environments is perhaps one of the most impactful things you can do when it comes to working with serverless technologies. Because you've got this usage based pricing, you can have as many of these environments as you need.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Alright, welcome to Conf 42 Cloud native and welcome to my talk about the serverless development and some patterns for having an efficient development workflow for serverless. So let me start the slide right here. So yeah, this is the patterns for efficient serverless development. A quick introduction my name is Yen Trey. I've been AWS user since 2010. Nowadays I work as a developer advocate for Lumigo and Lumigo is probably the best observability platform for serverless. The other half my time I work as an independent consultant where I help other companies go faster for less by helping them adopt serverless. And when it comes to having an efficient development workflow for serverless, I think you really need three things. Number one, you need to have a good testing approach and you need to be able to deploy your application easily, and you also need a way to manage your environment efficiently. And all three of these things should really help each other. And as a consultant, the number one question I get is how do I test my serverless architecture? And the number one way that I prefer to do this, especially when it comes to testing my lambda functions, is what I would call remote code testing, which is a big part of my, I guess the day to day workflow when it comes to serverless architectures and how I test them. When you think about the local testing where you run your code locally against mocks, it lets you use debuggers to step through the code, which is very useful, and you can test your code without having to wait for full deployment cycle, and that gives you a really fast feedback loop. But a problem with testing against mocks locally is that what you're asking is, is my code doing what I expect it to do? Not if my code is actually working. There is a subtle difference there in that one is based on reality and the other is based on your expectations and assumptions, which unfortunately, sometimes your assumptions about how something works is just wrong. And I've seen lots of examples of applications that pass all the tests only to fail the first time it runs in AWS because you're using the same flawed assumptions to write your code and your test. And so your test is not checking your expectations against reality, which has just a habit of slapping you right down to earth. And so local testing is prone to false positives and gives you a fairly low confidence. And it doesn't cover a big part of your application like iem permissions, API gateway authentication, and basically any direct integration you have between different services, amongst other things. All of these are important because your application is more than just your code. And your job as a software engineer is to make sure that all of it works. Because the whole idea is that at the end of it, user gets something that works. Not just that, oh, my code works, but some other integration that I didn't test is not working. Customers are not going to care if the problem is with your configuration and not your code. And that's where remote testing comes in. You're testing against the cloud, so it's far more realistic, and it covers basically everything along the way, including IAM permissions and your configurations and security and so on. So you get far more confidence from this test that your application actually works the way you expect them to. But having to wait for a full deployment cycle can be very slow and painful, and it can really slow down your feedback loop. And especially if you are making small code changes constantly. And every time you make a small code change, you have to wait for a full deployment that can be really, really painful. And there are some frameworks that give you some help with this, but they're fairly limited. And I really don't like the idea of having to switch frameworks just so that they make testing a bit easier. And as a consultant, I work with clients who use all kind of different tools and languages, so I prefer approaches that anybody can use with a minimum amount of custom adaptation. And so the approach that I take is to combine local testing with remote testing. Hence the name remote testing, where you run your code locally, but you're talking to real AWS services as much as possible. And you can still use debuggers to step through your code because you're running them locally. And the tests are far more realistic because you are testing your code against the real AWS services, but you don't need a full deployment every single time you make a code change, because again, you're running tests against your code that are executing locally but talking to remote services. But this kind of testing is still just only looking at your lambda functions and how they interact with your downstream systems, like your dynamdb tables or SNS topics or queues or whatnot. But what about all the stuff that are in front of your lambda function? A lot of things can still go wrong there, like im permissions, or absent resolver templates or eventbridge event patterns. Your code might be fine, but if any of these configurations are wrong, then your application is not going to work properly. And remember, all of these configurations are still your responsibility, and you need to make sure that all of them are working together. And so if I was to look at an API gateway API, then there are multiple things I want to be able to test for my lambda functions. I can encapsulate all of my domain logic into separate modules, and I can unit test them individually. And then I can use remote code testing to check my code integration with other things like other AWS services. And I can also use end to end tests to exercise API endpoints directly once they've been deployed, which would invoke my lambda function and check the IAM permissions as part of those test cases. And when I'm using API gateway to say validate or to transform the request or response, or to call other AWS services directly, then I can still use those end to end tests to check that any configuration in API gateway is correct. Similarly, if I'm asking API gateway to do all the authentication and authorization, then they'll be checked as part of my end to end test. If I'm using lambda authorizer, then I can also test that authorizer logic independently with unit and the remote test where appropriate. So that's the theory. Let's look at how this can work in practice by looking at a really short demo and let me switch over to a different view. And so I've got a repo set up here. As you can see, I've just deployed my application, and in this case this is using a service framework, and I've deployed my application to a new stage called the conf fourty two. And here I can execute some tests. I've got some test script prepared in my package JSON, which allows me to run my tests against whatever environment that I've just deployed to. And I've got some test cases set up for my application, which consists of an API with API gateway and some direct service integration. The API gateway is going to talk to dynamdb directly without a lambda function. But I also got a bunch of lambda functions down below here. Let's look at one that just adds some restaurant data to a dynamDb table called Restaurants table in this case protected is hosted by APAC gateway, and it has got IAM authentication and authorization. And there's some schema for the request so that once it's deployed in APAC gateway, API Gateway is responsible for checking the schema according to this json structure I've got here, which says the body of the post request must have a name property which is a string. So some of this we can test in our code by checking that our code in the lambda function is working correctly. In this case, it's just writing to a dynamDB table to insert a new restaurant data and return 200. But to make sure that the whole thing actually works, I also need to have some coverage for what this API gateway is going to do to validate the schema of the request. So to look at my test, I've got some test cases set up. I've got a test case here that just checks my code, my lambda function, which I can invoke as part of remote tests, or in this case I'm calling them integration tests, but they can also be executed as part of an end to end test. And I've also got a schema test which just checks the schema validation is working. And because this can only be done in API gateway in the ABIS environment, so this will only be triggered as part of an end to end test that will be run. So if you look at the test here, that allows me to test my code locally. So the magic here happens in the when module, when we say, when we invoke the add restaurant endpoint. So it's going to look at some environment variable called test mode, and based on whatever the value is, when we say when we invoke add restaurant with some payload, it's going to do something slightly different. So we want to see the case when we want to run the remote tests that allows us to make changes on my code locally and then execute it and potentially even put breakpoints in the code so that we can step through any bugs or errors that we capture. So what this is doing is via this via handler helper function, which just requires a handler module locally, execute it with a stubbed context and event, and then it's just going to return and do some transformation in this case so that we are dealing with JSON lessons body as opposed to a string. So this just allows us to write our test so that we can check the status code that comes back is 200 and that ID that comes back is not. No. And we're able to do some validation to say that, okay, given the ID that comes back from the add restaurant function that we invoke locally, this new restaurant ID is going to exist in the DynamDb table. So we can say then the restaurant exists in Dunhamdb by looking up that ID in the restaurants table and make sure that the restaurant item actually exists. So in this case, if I was to use the JavaScript debug terminal, I can run the script I prepared to say test integration. Then we can see that that test case is going to run. And if the demo got up with us today, then yeah, all the tests are passing. And what's also nice about this is that I can make some code changes and I can quickly iterate that. Okay, maybe let's say if I just comment out this bit of code and we run a test again. Now the test should fail because the restaurant is no longer being added to the database. Okay, so that was unexpected. So let's try and debug this. So we've got a code that's commented out, but it's still somehow passing the test. So let's check. Okay, go down here, put a breakpoint in our test, and let's run the test again and see what's actually going on. Okay, so the restaurant came back okay, right. So this is because I've got a bug in my test, not the code itself. So the restaurant did came back as undefined, but our test case wasn't handling that. So let's just say this to be truthy. And now let's rerun our test case again. And now the test should fail because this should now come back as undefined and should fail the test. And so we can quickly make changes, test them, and in this case, let me just revert the change and make sure now the test is passing so we're able to put breakpoints in our code and in our test so that, okay, let me run it this again so you can see the breakpoint hitting my lambda function. So we're able to step through the code step by step and quickly debug problems and fix them without having to wait for full deployment to AWS, I think this time is just going to be timeout because I sat there for a little while. Yeah, timed out. So if you do need to raise this timeout for when you need to debug through the code line by line, you can do that as well. So as you can see, I hope you can see that this is going to make it a lot easier for you to troubleshoot the failing test and iterate on your code gradually. And because we're using a temporary environments as well, so that any changes we're making is only going to be going towards this dev Conf 42 environment I've just created. And as I mentioned earlier, that there are some changes, while some of the business logic can only be tested, such as the schema validation, if we run against the real deployment AWS services, in this case the API gateway endpoint. So we can also run the end to end test that I've prepared and in this case I've got a separate script here called test end to end. And this is actually going to call the API endpoints that's been deployed in the AWS environment by putting a post request. And in some cases we have involves creating a new user in the cognito because some of these endpoints are protected by cognito. Let's see, Cognito user pool. So also involves creating these users as well. So we can look at some of these tests there where we have to create an authenticated user if the test mode is end to end, and then we invoke the endpoint when the environment variable says it's going to be end to end. So we have to then authenticate the user and then take the user's ID token and use that to invoke HTTP endpoint in API gateway. So having test cases like this allows me to have both reusable test cases like this one here, where we can run this test either as a remote test while we are working on the code, but once we are happy with the code changes, we are happy that it should work, but we want to test the whole thing, make sure that all the configurations are working together. Then we deploy our changes and then we run the test case again by running it as the end to end test, because the test is written in such a way that it's not tied to the implementations and can be toggled between either remote or end to end test. So by just switching the environment variable that we configure in the test script test mode, so we're able to reuse some of the test cases. So it takes less effort for us to maintain a large suite of different test cases and we can use the remote test to iterate quickly, but still have the entrance test to make sure that we have full coverage of everything that our application is doing. So as you can see, the main downside of remote testing is that you do need to production the AWS resources that your code is going to depend on, like the dynamdB tables, et cetera, before you can run your tests. So this open up problems where multiple people need to work on the same project at the same time, and people are going to be stepping on each other's toes and that's where the use of fmero environments comes in, which is easily the most impactful practice that has evolved with server technologies. But I'm jumping ahead of schedule here. Before we can talk about Fmaro environments, let's talk about deployments. Specifically why you should keep your deployments as simple as possible, but no simpler. I wish we don't have to talk about this, but unfortunately I see a lot of clients and the students suffer from self inflicted wounds when it comes to how they deploy their serverless application. You see, the lambda service carries some of the blame here because it's no longer this simple thing, and nowadays it has a lot of additional features like lambda layers, or the option to package your function as container images, or the ability to create your own custom runtimes, or to use a provision concurrency to keep a number of lambda workers around all the time so that you can mitigate costarts. All of those options are great, and they're useful in some use cases, but I think just because they're there doesn't mean that you have to use them. In fact, I'll go as far as to say that for 90% of use cases you shouldn't use any of these options. And for lambda layers I will go even further and say that you shouldn't use them to share code between lambda functions at all, because they complicate your deployments and make things more difficult than they need to be without really giving you any meaningful return on investment. They don't support semantic versioning, and because they exist outside of your language's ecosystem security scanners don't know about them and can't automatically scan them and check them against their database of known vulnerabilities. And you're limited to just five lambda layers per function, and they still count towards your lambda functions 250 meg size limit once it's been unzipped. So they don't help you sort of mitigate some of those lambda limits, and again, because it exists outside of your language's ecosystem, so it's going to make it harder for you to test your code locally as well if some of your dependencies exist outside of your local execution environment and only exists in AWS as lambda layers, and because your language runtime doesn't know about them. So it's then up to you to find a way to bring them into your local development environment so that you are able to execute your code locally. And they were designed to help with the likes of Python and JavaScript, and they don't really work for static languages like Java or net that requires your dependencies to be available at compile time. And honestly, compared to package managers like NPM, it's just more work to publish updates to your shared code and then to bring them into where they're needed in your local development environment, as well as into your project, and for anyone who's using JavaScript, because again, they exist outside of your NPM ecosystem, they don't really work with bundling and tree shaking as well. So that's a whole laundry list of reasons why lambda layers just don't add any value. So instead, if you're sharing code between functions in the same project, I will just put them in a folder and reference them directly during deployment. Just make sure that both of your lambda functions handle the module as well as your shared library. Modules are included in the same zip file, or is bundled into a single file if you're using a bundler and this is supported by most frameworks like SAM or serverless framework or CDK. And if you need to shared code between different projects, then publish them to NPM and use a private NPM registry if you need to. It's simple and it's what we do already outside of the context of lambda, and it works just fine when you're writing lambda functions as well. And the next thing I'll say about deployment is that as much as possible, you should just stick with using zip files and manage runtimes. There's a time when you need to package your function as a container image, for example when your application is bigger than the 250 meg size limit, but it also means that you become responsible for the runtime as well as your code and your dependencies. And wherever possible, I want to delegate responsibilities to the cloud so I can focus on just my application. Which is why for an efficient development flow, I recommend that you stick with using zip files and manage runtimes, and stay away from using lambda layers. And that brings us to how do you manage your AWS environments? I will start by just saying that, okay, you should have at a minimum one account per stage so that you have one account for Dev, one account for test, one account for staging, and a separate account for production, so that if there's any problems in terms of throughput or in terms of a security breach, they are contained into a single account, so that if your dev account gets compromised, at least the attacker won't be able to access your user data. In production and for large organizations with many different teams, then I would go as far go a bit further and say that you should have one account per team per stage, so that different teams are also insulated from each other, and if one team makes a mistake or they have a really busy service, they're not going to use up all the available throughput in the account for everybody, and saying that if you got say one team with different workloads, some of them are more business critical than others, or they have higher throughput than others, then it's also recommended. Well, I recommend you putting those business critical workflows into their own separate accounts. So for a particular service you may have dev test staging production account just for that service, but for the team and all of the other services that you maintain as a team, I will have dev test staging and production accounts for all the other serverless shared together. That way, for the less critical workflows, you have a separate dev test staging production account for though then for the things that are really business critical or have a much higher throughput, you have a separate set of accounts for those. And as I mentioned earlier, the use of FML environments or temporary environments is perhaps one of the most impactful things you can do that's going to improve your development workflow when it comes to working with serverless technologies. And it can work as simple as just using the server framework as example, that when you start to work on a new feature, create a new environment. And within the server framework you just run deploy command with a flag for the stage override and say name your stage after the feature. So dev my feature for example, this way it's going to create completely new environment with all of your functions and dynamdb tables and whatnot, so that you are able to then iterate on your code changes, run your remote tests against them, or end to end tests as well against this temporary environment that you've created just for this feature. And once you're ready, you can commit your code changes, submit your PR, and as part of the Ci CD pipeline, you're going to run all of the tests against your changes. And when you're done, you can just run the serverless remove command against your temporary environment, in this case using the stage override in the CLI to say use the stage name of dev my feature and this would delete your temporary environment as if it never existed in the first place. In this case, because all of your tests is run against your temporary environment, so you've kept your main stages like dev test and staging and production cleans, and so everybody does no need to clean up any data. And because every time you start to work on a new feature and every developer can have their own environment to work in, so there's no need to worry about people stepping on each other's toes as well, because you're able to work against your own insulated environment and basically stay out of each other's way, and because able to keep these shared environments like Dev and test and staging clean, you also by not polluting them with the test data, because all of your work is done and all of your tests is done against those temporary environments as well. And one of the really nice things about using temporary environments or FML environments with serverless components is that because you've got this usage based pricing, you can have as many of these environments as you need, and there's no extra cost overhead because those environments are going to sit there. If there's no traffic, then you're not going to pay anything because you only pay for what you use. But sometimes you have to use serverful components, things that charges you based on uptime, things like RDS or opensearch where you've got a cluster, you're sitting there, you're paying for them by the second, even if nobody uses them. So when you're using FML environment, you have to do a little bit tweak to your workflow to make sure that your serverless components are not created with every single temporary environment. Instead, you have to do some work to make sure that these stateful resources, these serverful resources, sorry, not staple, serverless resources are shared across this FML environment. I've written about this before, so give this blog post a read afterwards and you'll see why it's not as bad as you may think. And you can also use this FML environment for your CI CD pipeline as well, so that when the pipeline runs, you can create a fresh environment every single time and run the test against them. So that once the pipeline is finished, you can also destroy the environments as well, so that again, you avoid polluting your main stages like dev test and staging and so on. At this point you probably got a sense that when I talk about an environment, I don't necessarily mean an AWS account, because again, when you have your separate accounts for each of your main environments or stages, you can actually have an AWS account that hosts multiple environments running in the same region. And normally I'll do this against the dev account for all the temporary environments, because that's where my developers are going to be working in most of the time. And depending on your choice of deployment framework, an environment might be a serverless project with a serverless or YAML or a cloudformation stack. Or if it's a case of CDK, it could be a CDK app that consists of multiple stacks, or maybe it's a combination of both cloud formation stacks and other things that are outside of cloudformation, such as infrastructure that are created as part of your lending zone for every single account or SSM parameters, basically anything that you need to run your application. So an important part of making this work is that you need to make sure your resource names don't clash. And there are essentially two main things that you need to do to make sure that's the case. Number one is don't explicitly name any resources unless you have to, and basically let cloudformation name them for you, which will make sure that there's some random bit at the end. And number two is that when you have to name a resource, which is the case for Eventbridge buses, for example, then make sure you include the name of the environment as a suffix or prefix in the name of that resource again, so that when you create a separate another environment in the same account, you're not going to have any name clashes of resources. And with FML environments, they work really well with remote testing, because again, remote testing requires having those resources in AWS to be able to run your tests against them. And when you have a temporary environment for every single feature you're going to be working on, or every developer that are part of the team, then it becomes really easy to create a temporary environment for just you or that piece of feature and do your changes, run your remote tests, and so you're able to iterate on your code quickly without having to wait for deployment between every single small change, and you're able to put the breakpoints in your code and debug them and so on. But then once you're done, you're able to then promote your code changes and delete the environment so that again, you avoid computing the main stages you have as a team. So those are the three things that I think that's very important for having an efficient development workflow when it comes to serverless technologies. And I hope I've given you some ideas in terms of what you could do to improve your workflow to make it more efficient and easier for you to work with serverless technologies. This is a starting point only. There's also other things you need to worry about in terms of actually building and running a production ready serverless application. So if you have any questions, please feel free to reach out to me afterwards, and I hope you enjoy the rest of the conference.

Slides

Download slides (PDF)

See all 47 talks at this event!

Conf42 Cloud Native 2024 - Online

March 21 2024

Patterns for efficient serverless development

Video size:

Abstract

Summary

Transcript

Slides

Yan Cui

Developer Advocate @ Lumigo

Join the community!

Featured event

2025

2024

Info

Conf42 Cloud Native 2024 - Online

March 21 2024

Patterns for efficient serverless development

Video size:

Abstract

Summary

Transcript

Slides

Yan Cui

Developer Advocate @ Lumigo

Join the community!