AWS CDK - Best Practices From The Trenches

Video size:

Abstract

In this talk, I’ll present my take on AWS CDK best practices, gathered from almost three years of development stages to production with CDK. I’ll cover topics such as: 1. Project & stack Structure 2. Constructs guidelines 3. CI/CD guidelines 4. Resiliency & Security 5. General development tips

Summary

AWS CDK allows you to write infrastructure as code to describe your resources in the cloud. It's a very powerful tool, but with great power comes great responsibility. We're going to talk about best practices so you don't make mistakes.
Ron Eisenberg: Usually when you go ahead and start your first service, you start with CDK application. Usually it should be around one business domain and it should have one stack, not several. The balance is the key here because there is a complexity added when you have multiple repositories.
Cyberark has created an open source Lambda handler cookbook project. It allows you to create an API gateway lambda that writes to a dynamodb table. It has the CI CDK pipeline and observability and all the best practices for writing lambda functions. It really helps to reduce the cognitive load from the developers and accelerate development.
Constructs and constructs are really easy to share between teams. Usually I see them as a microservice or a nanoservice. But when upgrading and writing, you should be careful when you're changing logical ids.
How do we take an application and split it into constructs? I think that it should be by business domain driven. You still should create the aurora as its own construct because it's a very complicated construct. It makes it easier to maintain and the readability of the code.
Different environments have different configuration and that's okay. CDK needs to know how to make these configuration changes into your environment. Never ever write secrets hard coded in plain text in CDK or config files. Don't use in the lambda functions for storing secrets.
AWS is really thinking about security. Make sure that all your configurations are really use the best security best practices. Use CDK security tests that you run prior to deployment. Write your own IM policies.
CDK is very powerful, but you need to be responsible for all the constructs. You should always backup your resources. Tags are super important because you can use tags on the stack level and they're added to all the resources. We talked about resilience and security and how to share constructs.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi everybody, my name is ran Isenberg, and today we're going to talk about AWS CDK best practices. So AWS CDK allows you to write infrastructure as code to describe your resources in the cloud with code, not JSon or YAML files, but actual code it. And to me, as a developer, it feels right at home to write code. Now, I had the pleasure of talking at the first AWS CDK day about three years ago, and I came there from a perspective of a newbie with CDK. And I wanted to share the experience of working with CDK and to share how it helped to accelerate the development at Cyberark with CDK. So in this use case, I described an audit service, Poc. You can think of an audit service as a type of an ETL log service, where you take a log, you alter it, extend the information and save it into a bucket. And here you can see we have lots of serverless services, we have IoT, we have kinesis, we have lambdas, we have API gateways, we have elasticsearch, which is now Opensearch, we have buckets. And we didn't have a lot of knowledge and experience with these services back then. And since there's not a lot of business domain logic here, it's just a matter of connecting all the Lego pieces here and configuring all the event driven architecture. So it took us with CDK, it was me and two other guys, just three days, three days to get all this working, which to me was mind blowing. It really shows how CDK can really accelerate your development. It's a very powerful tool. And AWS Uncle Ben says, with great power comes great responsibility. Since CDK is such a powerful and flexible utility, it's really easy to make mistakes when you're writing code. You can basically do whatever you want. I'm using Python, the CDK variation Python, and you can do basically whatever you want. So it's really easy to make mistakes, and these mistakes can be quite costly. So this brings up the agenda of this talk. We're going to talk about best practices so you don't make mistakes. So we're going to cover CDK app guidelines, constructs guidelines, SACD guidelines, security and resilience, journal development tips. And we're going to summarize it all. So let's start so, a little bit about myself. My name is Ron Eisenberg. I'm a principal software architect at Cyberark, at the platform engineering group, I'm an editless community builder. I'm an owner of the website runthebuilder. Cloud where I share my serverless knowledge and you can see the QR code, you're going to see these QR codes along the presentation. This is the link to my website where again I share all my serverless knowledge. Okay, so let's talk about CDK app guidelines. So usually when you go ahead and start your first service, you start with CDK application. Usually it should be around one business domain and it should have one stack, not several. I'm going to explain why. And these stacks are going to have at least one construct. Each construct is going to have multiple resources with configurations between them, all their relationships, and sometimes even between constructs you can have relationship between the items. Maybe there's an event driven mechanism here. And I usually view constructs as a micro or nanoservice within the application itself. So I think it's usually best practices to have you should have one stack, one business domain maintained by one team so you don't have conflicts. One CI CD pipeline. And the reason I said one stack is because assuming you have several stacks in this application, so you deploy the first stack and then the second stack. What happens if you have a serious issue that you want to fix in the second stack, but for some reason the first stack fails deployment. So now you're stuck. You have increased your blast ready. So you have a bug in your first stack and you must solve it before you can solve the really critical issue that you have in your second stack. So I think it's really for the better to have just one stack in your application and to have a smaller blast ready. Okay. So however, in some cases you need to split the application into another application, another repository, another stack, and I can think of two use cases. One use case is when a different team will maintain the new application or it's a different business domain. It started as a small part of your service and then over time you realize hey, it's going to be its own domain, so you're going to move it to its own repository and maintain it there. And you should keep in mind that you shouldn't oversplit. Right. The balance is the key here because there is a complexity added when you have multiple repositories, because sometimes services depend one on each other. Sometimes you need to develop features that are cross repository, cross application. Then you need to develop it with feature flags and coordinate the enablement of these feature flags. Sometimes there is a deployment time dependency, maybe there's an HTTP API gateway that one application builds and the other one needs to know it. So then you can use things like SSM and cloud map that one stack publishes to the other and use it in deployment time. So it gets more complicated. Instead of just heading all in the same repository, let's talk about project structure. So I believe that you should have three folders. You should have CDK service and the test. CDK will obviously contain the application. You have the application on the root folder, CDK will contain the stack and all the constructs. Then you have the service, which is the business domain logic, all your lambda function code, things like that. And then you have tests, and tests we're going to cover. You're going to have unit integration, end to end, and security and CDK infrastructure examples that we're going to cover later on. And as you can see, I'm a true believer in the DevOps mentality. That infrastructure as code and the business domain code should reside together because the developer should have the ownership and the understanding of everything together, from the development stages to the production and the monitoring. Okay, so let's say that you've created your amazing template, your amazing application, you have all the best practices, you have an amazing CI CD pipeline, your project structure is amazing and it works. Now you want to create the second service. So what do you do? Do you just copy all the code from there, just duplicate the first repository and manually change it to create another repository? No, it's a lot of work. So what we saw, like I said, I'm part of the platform engineering group, and we saw that by creating a CDK template project, a self service project, teams can just start and create their service really fast and they can get started with something that works, that has all the best practices, all the project structure, all the CI CD pipeline, everything just as it should be. And they can just focus on writing the business domain. They usually go and we provide them internal training when we tell them about the internal sdks that we use all the best practices for writing lambda functions, how the CSCD pipeline works, things like that. And we studied it really helps to reduce the cognitive load from the developers and accelerate the development. So once I started, it really works for us at Cyberark, I decided why not make an open source out of it? So I created the editless Lambda handler cookbook project, which is found in the QR code on the top right. And it's basically a serverless service that allows you to create an API gateway lambda that writes to a dynamodb table and uses feature flags based on app config configurations. And it has the CI CDK pipeline and observability and all the best practices for writing lambda functions and the testing and everything. So you should check it out. So now let's talk about construct guidelines. So the same way that you don't write all your code in one function, right? You don't just have one file with 10,000 lines. You shouldn't do that the same in your stack. You shouldn't define all of the resources in your stack. You should use constructs. Constructs and constructs are really easy to share between teams. You can have a best practice construct that you can share between teams and save time. So use constructs. Usually I see them as a microservice or a nanoservice. And one exception to the resources on the stack is the lambda layer. If you're using lambda layer that is used in multiple constructs, I think that's okay to define on a stack level. But usually you should just have different constructs that define the micro or nanoservices of your application. So why did I mention that constructs are really great to share? Usually platform engineers will create and maintain the shareable construct. You can think about it as organization approved, security approved constructs or patterns that you can use across the organization without reinventing the will. You can create a library. In my case it's Python, because we use Python. It's a Python library of CDK constructs and you can import and use it in your CDK code and just use it as a black box, so to speak, so it saves time for developers. However, since it's a library, it has a version and you might need to upgrade. And in upgrades you need to be careful not to whoever maintains it needs to be careful not to change the logical id of stated resources so you don't get your database deleted. And we're going to talk about it later on. But when you're doing it, you should be careful when upgrading and writing, when you're changing logical ids. And we're going to talk about it later on and I'm going to explain it in several details. Okay, some examples of shareable constructs so you have maybe WAF rules that you want to use for your API, gateway or cloud front distributions. Maybe you have an SNS SQS pattern subscription with encryption at rest which is not enabled by default. You might want to have an AWS app config dynamic configuration construct. Maybe you want to have Datadog log shipper or Pii sanitizers. And you can find more example the following links constructs dev, serverless land CDK patterns and the edibles solutions constructs. So now that we understand that we need to write constructs, how do we take an application and split it into constructs? So I think that it should be by business domain driven. Let's take a look at the following service. So we have the crud API. We have an API gateway that invokes two lambda functions that write and read to an Aurora serverless database. It has its own VPC networks and all the fun stuff. There's an Aurora stream that triggers a lambda function that sends a message via SNS. There's an incoming message via the SNS to an SQSQ that triggers a lambda function and again reads the aurora function, the Aurora database. So how do you go about and split it into constructs? So like I said, I think it should be business domain driven. We have the crud part and we have the database part. I think the database part, even though it's defined in the crud, it is an internal contract because the lambdas there, they're the only one who write there, writes into the database. So I think they own the database, so to speak. You still should create the aurora as its own construct because it's a very complicated construct and it's really easy to share it across organization. So once you do it, you create it once and then you can share an aurora database across all the organization and just have a best practices and secured Aurora serverless database. And on the other hand, you have the messaging, the asynchronous part, you have the SNS and the queue and the two lambda functions. And again there's connections between the two constructs. As you can see, the lambda functions they need to access the overall database. So it's important to understand that gathered is no right and wrong in this case because it's all defined under the same stack. There isn't really right or wrong, they're going to get deployed the same way. However, I think it makes more sense to split like this because it makes it easier to find the code, to find the resources in the project itself. It makes it easier to maintain and the readability of the code. But you can choose whatever type of construct changes that you want. But I think this is a good example of how to do it that makes sense. Okay, let's talk about CI CDK guidelines. Okay, so usually you'd like to model your CI CD guidelines, stages in code. Different environments have different configuration and that's okay. And CDK needs to know how to make these configuration changes into your environment. And usually in my case we use Jenkins that sets environment variables and injects them into the CDK application. We call it a profile, can be dev test, production, whatever, and then CDK code knows how to address this parameter and make the different configuration. So you can see also in this example that I'm using different accounts. I'm using dev account test account production account and they're a different account. And the reason for that is that you want to have a small best radius in case of a breach. If somebody hacks into your dev account, you don't want to have your production account jeopardized. Another reason is to have the AWS resource quarter limits. You don't want to reach it. So by using different accounts you're probably not going to get there. So let's see an example of how it works in CDK. So in this case I want to define a table. I have the profile environment variable that I'm going to get that Jenkins sets. So in this case I'm defining a dynamodb table. And you can see at the point in time recovery table recovery argument. So if I'm at the developer environment dev environment, I don't want to enable it, I don't care about this database, it's going to be in a firmware. Users use it for just branch development and feature development and I don't want to backup the database. However, if it's production, I do want to have backups. Right. I want to be able to return to appointed time, recovery time in case of a crisis or a disaster. The same thing goes for the removal policy. If I'm in a development environment, I want to remove the database when I finish with the stack. But in production, if for some reason there is a mistake and the stack is removed, I want to keep my data. I want to keep my database. Okay, so this is an example of how you can use different configurations in your CDK code. Let's talk about security guidelines. So, secrets in, never ever write secrets hard coded in plain text in CDK or config files. You should store them in GitHub, Jenkins or some sort of guidelines, whatever you're using as an internal secret. And then you can inject it into CDK as an environment variable or parameter into the constructor of the stack. And then CDK will use this parameter to deploy it into secrets manager or SSM parameter store as an encrypted string, and then the lambda will consume it from SSM and secrets manager. And it's going to have an environment variable that tell it the secret name. And of course the correct permissions to get the secret. This is how you should do it. And don't use in the lambda functions, don't use the environment variable for storing secrets. Don't do that. So this is the proper way to do that. Okay, let's talk about resources, security configurations. So as you can see, AWS is really thinking about security. Back in January, new s three objects are encrypted by default and DynamoDB supports encryption at rest for quite a while now, but it's not all the same for all resources. What about SNS encryption at rest? You can see that it's disabled by default and you need to know it and enable it yourself in the CDK code. So security defaults differ by the service itself. AWS gets better it, but it's your responsibility in the end. You have the shared responsibility model where AWS keeps the security off the cloud, but you need to make sure that the security in the cloud is properly defined because these are your resources and you own them and you need to make them secure. It's your responsibility. Nobody else is going to do that for you. Okay? So you should make sure that all your configurations are really use the best security best practices. You should have security review, you should have scheduled a penetration test from time to time, and you should also use CDK security tests. And that's what I'm going to show you now. We're going to use a tool called CDK Nag. And these are tests that you run prior to deployment. So you're not going to deploy a stack that has security misconfiguration. So you don't expose yourself to a security hazard. You're going to run it before the deployment to your account. And these tests, what they do, they actually synthesize the cloud formation template of your stack and then they run a bunch of assertions again and security checks on that stack. So in this case we have two tests. The first test is going to check for AWS solution architects best practices for security measures. And the second one is the HIPAA standard for security checks. And if you did something wrong, like an overly privileged role or an open bucket to the world, a public bucket, it's going to tell you hey, it's going to fail and you're not going to push the code and deploy something that is risky. So that's very important. Okay. Another thing that I think is very important is to write your own IM policies. In this example, I want to define a dynamodb table and I want to provide a lambda role with the permissions to get an item and put an item into the table. So in many cases you can see that people tell you hey, you should use the table grant read, write data to your role. It's really easy, it's very readable and it works. But what happened is that I wanted to have two permissions added to my llama function, but by using this function I actually provided something like, I think there's like eight or ten permissions here that I don't need. So my role is not least privileged. Okay, so if somebody gains access to this role, he can make a lot of more damage to my dynamodb table that we wanted to have access to. So what you should do is use the CDK to write your own inline policies. And this way you understand the IM policies better. And you can see you write a policy document and policy statement. You say I want to only put item and get item on a specific resource table arn my specific table, right. You're not going to use an asterisk here and I'm going to allow it. So this way we have just the permission that we wanted and I think it's going to make you a better developer since you understand Im policies better. Let's talk about resilience. Okay, so sometimes in CDK people can go ahead and make refactor the code and move resources from one construct to another, maybe rename the construct. And sometimes they don't realize that by doing that they change the logical id of the resource. That means that CDK and cloudformation are going to delete the resource and create it new with the new logical id. And that could be a big issue, a serious issue if we're talking about stateful resources such as tables with data, actual production data, or maybe cross account trust role that you change its arn and now you don't have access to production to the other account. So you can have serious issues by doing something that seems very simple and naive. And another issue that I've encountered, only ones to be honest, is that if you have your CDK code resin exception that somehow doesn't fail the entire process of deployment, you can have entire resources deleted from your stack. So you can basically deploy and remove an API gateway or bucket and things like that, which is not very great. So one way to avoid that is to write CDK unit infrastructure best. So let's see how you can do that. So in this use case, again, this runs before prior to deployment. So you know you're going to keep your code safe. And here we're going to create and synthesize again the cloud formation template. And we're going to make some checks. We're going to make sure that our critical resource, the API gateway, the rest API, it's going to be there. The same thing for the DynamoDB table, and we can also add checks to make sure that the logical id is there and it hasn't changed. So if it changes, we know that we're going to basically create a new table with zero data there. So it's not great. So this can be a nice safeguard to prevent that. Another cool utility that you can use is CDK diff. It's an open source that you can add to your pipeline and what it does basically is it visualizes new resources and changes to your stack. You can see that a new resource is added in the green and a resource is deleted in red. So it makes it easier to understand if there is a critical change or maybe somebody is making something that they shouldn't be doing and changing critical resources. It just makes it a better visibility. Backups so in backups you should use retain policies like what we saw earlier. It's better be safe than sorry. You should have the ability to retain the database. Then you can restore the data into the new table in case you delete it, in case you created a new table instead. And you should always backup your resources. DynamoDB has a point in time. Same thing for Aurora databases. You can use AWS backups for our resources so you can recover your lost data in case of a disaster. Let's talk about some general tips and guidelines. So usually when I'm using a new service in CDK and I'm not really sure how to define it, I can go ahead into the console, the AWS console, and play around with it and just try to understand how the resources and entities play together, maybe what's the relationship between them? And then it makes it easier for me to write the CDK code because I understand the service much better sometimes. The second tip is that sometimes the higher level constructs, the abstractions that CDK provides, does not expose all the configuration that you might need. Sometimes you need to use the lower abstraction, the CFN low level resources. They're less, let's say easier to use or fun to use, but they usually expose all the cloud formation aspects and configuration and you can use them to define pretty much whatever you want. The third tip is tags. Tags are super important because you can use tags on the stack level and they're added to all the resources. So it's really easy to understand all the resources that you see in AWS, who created them, when they created them, what service they belong to, and it's really easy to manage your services, to manage your resources like that, or mermaid to understand why you have some orphan resources because they have tags on them. And lastly, I think the most important tip is that we're developers and we like to have cool abstractions and cool factory methods. And my tip for you is don't do it. This is a CDK code, infrastructure code. It should be as simple as possible, okay? It should be really readable and easy to use, and you shouldn't make it too complicated and you really should. I'm okay with more codiplication if it's really easier to read. So let's summarize it. Like we said, CDK is very powerful, but you need to be responsible. And we covered all the best practices for CDK app stack constructs, how to share constructs. We talked about the CDK template and self service mechanism, security and resilience and that's it. I hope you found it interesting and helpful. And thank you very much. You can follow me on Twitter LinkedIn and my website runthebuilder.com. Thank you very much.

Slides

Download slides (PDF)

See all 54 talks at this event!

Conf42 Cloud Native 2023 - Online

March 30 2023

AWS CDK - Best Practices From The Trenches

Video size:

Abstract

Summary

Transcript

Slides

Ran Isenberg

Principal Software Architect @ CyberArk

Join the community!

Featured event

2025

2024

Info

Conf42 Cloud Native 2023 - Online

March 30 2023

AWS CDK - Best Practices From The Trenches

Video size:

Abstract

Summary

Transcript

Slides

Ran Isenberg

Principal Software Architect @ CyberArk

Join the community!