Money-saving tips for the frugal serverless developer

Video size:

Abstract

Let’s explore some common costly mistakes in AWS and serverless and learn actionable tips for cutting down waste and reducing your AWS bill. Whether you’re looking to cut down on CloudWatch costs or improve cost-efficiency for your serverless application, we’ve got some helpful tips for you.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi, everyone. Thank you for taking the time out of your busy day to join us on this session about how to save money on your 80 percent serverless environment. so without further ado, let me just switch over to my slide and we can get started right away. So a quick word about myself. I'm AWS Serverless Hero. My name is Yen Chui, and I've been doing stuff on AWS since 2010. Nowadays, I spend half my time working with Lumigo. As a developer advocate and the other half my time I work as an independent consultant when I help other people, other companies adopt serverless technologies. And one thing I like to do is collecting tips around how to save money on your AWS environment. And so I've got lots of ideas to share with you today. So I hope you are ready to drink from the fire hose. As, as we go through a number of different ways, you can save money on your AWS serverless environment. So we're going to, so we're going to start with something that's very simple as I think that everybody should be doing is even if you are new to AWS, this is probably the first thing you should do when you create an AWS account is to set up billing alarms. they're not perfect. They're usually a few hours behind. but it's much better to find out that you've got a problem a few hours late. Then say a few weeks late when your bill finally arrived and you've got a much bigger amount that you have to pay than if you were to, you know, found out that you've got an issue a few hours into it. So Luke, and his team at the PostNL, are pretty experienced with AWS and serverless, but they were still caught by a mistake, which, triggered their AWS cost to spiral. And in this particular case, they It was, because of one line of change in their code, which caused the lambda functions to make a lot of API calls to secrets manager. And the reason for it was because that one line of change broke their caching. So instead of making a call to secrets manager, At cold start and then caching the secret, they will make a call to secret manager on every single invocation, which at their scale, they are the national, delivery service for the Netherlands. You can imagine that's hundreds of millions of requests per day, which can add up pretty quickly, which is why within a few days, they got, an alert that triggered, the, the billing alarm in the AWS, which I think was something like 2, 000 or something like that. Yeah. Which is their monthly budget for, for their whole AWS environment for their team. So luckily they were able to find out that this problem was happening within a few days, as opposed to say, within the, after a few weeks, when the, when the damage could have been much bigger. So the learning from here is that. Yes, billing is not a perfect, there are a few hours behind, but it's much better to find out that you've got a problem early on into the issue than to say much later. And the billing alarms does work. So in this case, yeah, it was, no, they did have, did suffer some damage, for a few thousand dollars, but it could have been a lot worse if they didn't find out and only found out when say the finance team comes knocking on their door to say, Okay. What else is going on guys? So your bills are much bigger now compared to what it was last month. So billing systems work. it's just that, is there not perfect? So when it comes to billing and the cost of AWS, one of the biggest offenders, something, something, something that I see probably always come up as number one or two on the, my customers. It is bill is a cloud watch. Specifically around the cloud watch logs, and it's very often as I work with my consulting clients that I see cloud watch costing maybe, you know, a few times, maybe even 10 times more than the actual application itself when you consider the cost of, say, API gateway and lambda functions and down DB tables, cloud watch is often much, much higher in terms of that cost, and also keep in mind that as your cost goes up because you're collecting more and more logs, more and more data in CloudWatch, the value you get from those logs actually goes down because now you're getting more and more noise you have to deal with, and it's harder for you to then find the piece of information you actually need to debug some kind of problems you have in production. So with CloudWatch, it's just also not very good at surfacing the really valuable and actionable information from all this data that you're collecting. So the number one thing I like to do to keep my CloudWatch logs cost under control is to make sure that I do structured logging. And I want to pay really close attention every time I write a log message as to which log level this log message should be recorded as. Because when it comes to production, you don't need all the different debug, debug logs. When there's so much requests happening at the same time, instead in production, you probably just need to record everything that's info level or above so that you don't have all of these debug logs that doesn't give you a lot of value. But at the same time, there is, you know, you're going to be paying for every single log message that is collected by cloud watch logs. However, sometimes, especially when things go wrong. These are debug logs can be really, really useful in terms of helping you figure out what the problem was. So even though in production you want to disable debug logging, you want to be able to sample some percentage of your debug logs in production such that hopefully say, Now, if you collect 10 percent or 5 percent of the debug logs for all of the invocations for lambda function, you have enough debug logs that covers every single code path. So that when there's a problem happening in production, you have got some debug logs that can tell you what to do. What the problem was to help you figure out what the issue was, so that you don't have to say, go back to your code, re enable debug logging, and then redeploy to production, wait some time for the debug logs to be collected, and then figure out what the problem was, and then disable debug logging and all of that. Instead, you want to be always sampling some percentage of debug logs, such that you've got enough information to help you figure out what Problems in production, but not so much that you end up paying a disproportional amount of money for debug logs that doesn't add a lot of value. Another thing to keep in mind when it comes to working with CloudWatch logs is that by default, log retention is set to never expire, because CloudWatch doesn't want to delete your data without you telling them that you're okay with that. This is fine from their perspective, but from my perspective, the value of the logs reduced goes down as the time goes by, and there's no reason for me to keep logs that's older than, say, three years. 30 days. especially as the application continued to evolve and change, those logs becomes more and more outdated as time goes by. So you want to also change your log retention to something that's more reasonable, such as seven days or 14 days or 30 days. And also you don't want to pay the storage cost of 3 cents per gigabyte on all the logs that you've ever produced, forever, especially when those logs are gonna. You essentially become useless after a few weeks, and also another thing to keep in mind is that many of you are not using cloud watch logs to query your data. You are shipping your logs from cloud watch to some other third party service, and then you can query them. So in this case, As the lambda function produces those logs, you afforded them to say logs IO or some other platform, and then you'd analyze them in those platforms. In this case, you still end up having to ingest logs in CloudWatch first, and therefore you still have to pay that 50 cents per gigabyte of ingestion cost for CloudWatch. And so in this case, you probably end up paying twice for the ingestion and processing of those logs, which of course is going to be a waste. So nowadays, when it comes to Lambda, you can use the Lambda extensions to ship all of your logs to some third party provider. via the telemetry API with, which basically allows Lambda extensions, which are like sidecars to your main Lambda runtime to access the logs from your function and is able to then send those logs to some third party vendor. And you can do this without having to go through CloudWatch logs first, which means once you've done that, you can then also add This bit of IAM permission to your Lambda functions IAM role to then disable your function from being able to send those logs to CloudWatch so that you don't end up paying for the ingestion for the same logs at CloudWatch as well. So this is a much better way to send your logs information to another vendor instead of having to go through CloudWatch logs first. And then process them from there. And if you're looking to move away from cloud watch logs and you're looking for another vendor that's more cost efficient and allow you to do more with your log information, then the checkout Lumigo, the log management system is a lot cheaper compared to cloud watch. And also they treat every single log message as a event so that you can then on demand to create a arbitrary metrics and then alerts on top of that. And it's all included in the price. For the ingestion of those logs so you don't pay separate for logs and alerts and dashboards and so on So staying with car wash logs Another thing I want to mention is that remember those system messages that you get So after every single invocation, you get a number of Logs in your lambda functions, cloud watch logs, which for most of you, it's probably not going to matter very much. but if you're running a scale, for example, you're doing billions of invocations per month, like the guys at the far from analytics. then those are system messages can actually end up being. you know, costing a non trivial amount of, of, of dollars per month. And of course, they don't really give you a lot of value in return for that investment. So nowadays there's a way for you to control what information is included and how much of those system log messages. It gets produced by Lambda. So with Lambda, now you have this, login config setting on your Lambda function, which you can configure through the CLI, through CloudFormation, and, through CDK and other tools that uses CloudFormation as well, whereby you can set the log format for your function, so that, by default, this is still gonna just, output plain text, so whatever your function is writing to standard out, I'm It's going to keep it as a plain text, but you can also switch to JSON so that the Lambda runtime is going to capture whatever information that you're sending to standard out and then format it into a JSON blob so that you get, you kind of get a structured login without you actually doing any structured logging in your application code. Now, where it gets interesting is that once you set the log format to JSON, you can now also configure a system log level, which controls which of the system log messages are actually produced by the Lambda runtime. This is not in the official documentations, but I did some experimentation to find out which of the log messages are produced at which of the log system log levels. So if you just want to know. Whether or not there's a like a unhandled exception. And so, you know, your application code didn't in a capture. I didn't capture any of the errors and you bubble up to the Lambda runtime and the brew up. And so you can just set the log level to one. And that way you will still get those unhandled exceptions in the in the, in the logs. but none of the other system messages for when your function starts, when When it finishes, how much memory gets used and so on and so forth. So CloudWatch is one of those services that everyone should really know if they're going to be using AWS, especially if you're going to be using Lambda. So I recommend this book by Sandra and Tobias, who does a really good job of explaining how CloudWatch works, even if you don't end up using CloudWatch for your day to day, you know, querying your logs and the things like that. It's still worth understanding how it works and all the different things that the cloud watch gives you nowadays as well. Okay, so moving on, next thing we're going to talk about is still around Lambda, but specifically around the cost of Lambda functions. And, those of you who have used Lambda before, you probably know this already that with Lambda, you have basically one level to control the performance and the cost for your function. through how much memory that you allocate to the function. More memory equals more CPU and equals more network bandwidth as well. But also the more memory you allocate, it also means that the more you're going to spend per millisecond of execution time for your function. And that cost is proportional to the amount of memory that you allocate to the function. Which on the one hand means that it's really easy to just give your function more power so that you can process whatever requests are faster, but at the same time, it's also very easy to be wrong by an order of magnitude. So even though it probably does not going to happen very often, but when it does, those over provision functions can hit you pretty hard on the finance side of things. And so this actually happened to a client of mine a while back. So they produce videos for those kind of hand drawn, sort of tutorial videos that you sometimes see on YouTube. And so they use a lambda function to do some of that rendering. Now the team understood that more power, more memory equals more CPU power. And so they want to reduce the amount of time it takes to do the rendering. And so they decided, Oh, let's just allocate, with the maximum amount of memory to the lambda function. So it gave me the full 10 gig. And so at the end of the month, they found out, Oh, wait a minute. So suddenly we've got this one rendering. So Lambda function now costing us something like 10, 000 a month. So something is clearly wrong there. And so the reason is for that is because, well, the allocated the full 10 gig of memory to the function, but it didn't reduce the amount of the rendering time proportionally. And the reason for that is because while with Lambda, yes, more memory equals more CPU. But once you get to 1. 8 gigabytes of memory, you unlock a second CPU core. And so by the time you hit the 10 full 10 gig of memory allocation for the function, you actually have six CPU cores. So to take full advantage of all the CPU you have, you have to write application in such a way. That allows you to process things in parallel using multiple CPU cores. Unfortunately for them, they were using Node. js for the rendering part of things. And so by default, Node. js runs on an event loop. And so you have to write your application specifically using workers and, and child processes to all in order to. Paralyze the way you are doing and take full advantage of the fact that you've got six CPU cores instead of one very, very big CPU core. And they weren't doing that. So even though they were spending and paying for 10 gigabytes of memory and all the CPU power that comes with that, they were only able to use essentially 1. 8 gig. Of, of the memory that they were paying for, and that's why, you know, when it comes to helping these clients, we just made a very simple decision to reduce memory allocation to 1. 8 gig, so they get a full advantage of what they are paying for, so much more efficient, you know, process for rendering, but, not for, All the actual amount of CPU and power that they were paying for, but not really using. And as the great Donald Knuth once said, that we should forget about small efficiencies, say about 97 percent of the time, that premature optimization is the root of all evil, but we should not pass up our opportunities in that critical 3%. And that's where, for that one client, 99. 9 percent of their lambda functions was not doing anything significant in terms of cost, but that one single function Was, was accounting for over 99 percent of the actual cost for the serverless environment. So identifying those critical 3 percent and really focus on how to improve the performance and efficiency for that critical 3 percent is super important. So again, in Domingo, you can see all the functions you have across all the different regions, and you can sort them by cost so that you can really, really quickly identified outliers in your environment, whether or not you've got lambda functions that are. Say, disproportionately represented in terms of your cost allocation, and you can look at, you know, how much your memory is allocated to the function, how much memory is being used on average for processing when running this lambda function, and how often this function runs as well, to give you an idea which function requires some more special attention in terms of optimization to right size of memory allocation, And the save the cost for those lambda functions in terms of actually, reducing and the right sizing, the memory setting for your lambda function. I think the best tool you can use is the lambda power tuning tool from Alex Casaboni, who used to work at AWS. And so you can Basically use the lambda power tuning tool, which is a step function state machine that takes your function and produces the different copies of it with different memory settings, and then run a number of executions against those different variants and find out based on the performance and, and then cost of those functions, where is the sweet spot that gives you the most bang for your buck in terms of the cost of your, of the function and how much performance you get. Another good way to reduce the cost of lambda functions is to use the ARM architecture instead of x86 because on a per millisecond execution time, ARM architecture is actually 25 percent cheaper compared to x86. However, Performance gonna differ depending on what it is you're doing. And, some people may report that, for their workload, ARM is actually faster and also cheaper per millisecond. But for some of the things that I've tested on, ARM can sometimes be 60 percent more, I guess slower than the X 86 for the same workload in that case. Now, if you are saving 25% per millisecond, but you're spending 60% more milliseconds to process the same thing, then you actually spend more money on arm than X 86. So in that case, that's not a good idea, but we, I find the arm is really good, is, where you've got functions, they have to talk to third party services and, maybe those third party services are. Quite slow. So you spend a lot of time waiting for IO. So in that case, well, you're just going to wait. Then there might as well switch to ARM where the cost per millisecond is cheaper. So the cost of those wait time is going to be less as well. And still staying with the lambda functions. Another, I guess, a big thing, and this is probably the number one, I guess, anti pattern when it comes to serverless environments around the Lambda functions is, you know, direct synchronous Lambda to Lambda invocations. Now, one thing to keep in mind about Lambda is that, Every single invocation of a lambda function goes through its invoke API. And there are different ways you can invoke this. Then that's why there's an invocation type attribute here, where you can say, I want a request response invocation, in which case the caller calls the lambda function, and they have to wait for the whole invocation to finish. And get the output from the invocation. And that becomes a response from the invoke API call. So that's a synchronous invocation. But you can also have an asynchronous invocation by setting invocation type as event. In which case, as a caller, I call the invoke function with its invoke API, and I'm going to get a response back right away. But the function is not going to run right away. Right there. And then it's going to run at some point, and it's going to go through an internal queue. And so I don't get the output from the function from calling the invoke API, but the function is going to actually execute asynchronously. And when it comes to synchronous lambda to lambda invocations, It's pretty much always a sign of bad design, and it has also got cost implications because when the first function runs and calls the second function synchronously, it has to wait for the second function to finish executing to get its output in the response, which means for the entire duration of the second function. The first function is still running and just waiting. So you're actually paying for execution time twice for both the first function, the caller, as well as the second function, the callee, which is again, I don't mind spending money, but I want to get value for what I'm spending on. I hate the waste. I hate the painful things that just Doesn't provide any value. In this case, I'm not really getting any value from having two functions all running at the same time, and especially when you've got, say, two functions, one calling another inside the same service boundary, whereby I own both functions, I actually don't need them to be running as two separate lambda functions. I can just. Get rid of a second function and do whatever it needs to do inside that first function because again, I own everything, so I can just reorganize things and I can still have that modularity at the code level without having to also have that modularity at the lamb function or infrastructure level as well and mean. The thing is this, things are called lamb functions, but you shouldn't have to confuse them. With Lambda functions in programming, you can still have, you know, modularity in your code level without having to again, you know, enforce the same modularity at the infrastructure level. But what if your Lambda to Lambda calls are across service boundaries? So now this would be one service providing some capabilities to other teams and other services by exposing a Lambda function that other can just call and invoke directly. This is even a worse idea because now you are Binding your consumers, your API or service consumers to implementation details that can easily change the fact that you are using a Lambda function and what region you're in and what your function is called, all of these things are implementation details, and you should be able to change any and all of them without forcing some downstream systems to change as well, so that if you want to give them some capabilities, please. Just give them an API that they can call. So the fact that you are using a Lambda function behind it, the API, is just implementation detail. maybe today Lambda makes sense for you, but maybe tomorrow you are, you know, you're handling such a high traffic that it makes more sense to take your workload and move it into, say, a Fargate container. You can do that, but it doesn't impact your callers because they are still talking to the same HTTP API. They're using the same HTTP API contract to talk to your service. So, And, no, that's a stable interface. What you do behind the HTTP API is entirely your business. It's implementation details that you can change and control. Okay, so we've established that synchronous lambda to lambda calls is the anti pattern. But what about asynchronous invocations? Because we talked about earlier that when you invoke a function, you can do so synchronously, but you can also do it asynchronously as well. Well, there are some legit use cases I think is a good idea for using asynchronous invocations. For example, if I've got a user facing API that handles the user request, and if the user request is to say save my profile updates in the database, okay, my function is going to do that. But I maybe have some other secondary responsibility that I want to do. Such as, tracking some events for business analytics and then what have you, those are not things that the user really cares about. So I don't want to do them while the user is waiting for response. So what I can do is to take those secondary responsibilities and move them into a second function and invoke the second function asynchronously. So I don't have to wait for it to finish. And so this way allows me to build a better user experience because my user facing API function can respond to the caller faster without having to wait for all those secondary responsibilities to complete. And it also allows me to build more robust error handling as well when it comes to dealing with those secondary responsibilities. Because when it comes to asynchronous invocations, Lambda functions gives you built in, two retries out of the box as well as the Delta Q support as well. So that if something goes wrong and I care about, you know, making sure those things happen, I can then use the data queue to capture any failed invocations. And so I can then retry them later when, say, maybe I'm talking to a third party service that had a temporary downtime. And so I can then use the data queue to capture the events that failed and then reprocess them when the system comes back online. And that's with the assumption that what I've got here is within the same service boundary. So whatever I'm doing in the second function are things that I would have done in the first function. That's part of my API service already. And what about when you have asynchronous implications across service boundaries? This is still going to be a bad idea. You never want to expose capabilities to other services as in the form of a lambda function that someone can call. directly, either synchronously or asynchronously, because again, you are exposing them, you're tying them to implementation details on your side. So in general, are async lambda to lambda invocations okay? well, it really depends on what it is you're doing. And as I said, there are some legit use cases for it, which I think they are a good idea. But I think it's really important, to follow a principle that, where, you know, when you look at every single component in your architecture, They should all serve a purpose and they should all provide some return on investment. And you shouldn't just do it for the sake of it. And talking about looking at the architectural components and making sure that everything has a good. I'm making sure that everything has a return on investment and then the good, I guess, a real world antidote from the, the far from guys is that, they had this system whereby, you know, they've got an ingestion API, which is backed by lambda function that will put something into a queue and then they'll process it with another SQS function. And so what they found was that as they simplify this setup and remove the queue from the, from the architecture and the second SQS function from the architecture, and just did everything in that first function at the ingestion API, they actually save a lot of cost. And they also improve the performance of the system because there's less things that, you know, less moving parts in architecture. And they also save quite a bit of cost associated with SQS and also that second lambda function. So by simplifying architecture, you can also find that you can sometimes save on cost as well, especially in a serverless environment where you're paying for every single request that you're processing. And when it comes to cost efficiency and scalability and performance, caching is probably the one of the most powerful and probably underutilized tool. For me, it's almost like a cheat code for building performance and scalable applications. And when talking about your typical, I guess the web APIs, there are so many different ways and do different places where you can apply caching. So consider your typical, I guess, user facing API. You've got your client talking to, an API behind some CDN like CloudFront and API is backed by a Lambda function and some DynamDB table. You can do some client side caching for static assets or configurations that doesn't really change. and you can also do some API response caching at the edge with a CloudFront in front of your API gateway. And then you can also know if there's anything that's the computationally expensive that your application is doing. You can also do application level caching in the Lambda function, perhaps using something like Elastic Cache. So that you can, you know, share the same, cash results across multiple instances of the same function, or maybe even across multiple functions as well. And since I don't like paying for uptime that, you know, for things that I don't use, elastic cash forces me to pay for uptime instead of paying for just what I use. actually nowadays I prefer to use Memento as well. which gives you a really good, really scalable and cost efficient serverless cache that you only pay for the number of requests you make, as opposed to for the uptime for the cluster. Another form of caching that people don't talk about enough, and at scale can also still bite you, is the Route 53, or DNS caching. With Route 53, you are charged for the number of requests, or number of DNS requests that Route 53 handles for you, and depending on your TTL, you know, if you have got a short TTL, then that means you may get more requests coming in, and so when you're running a single route, scale. those requests, again, those requests and those, those costs can still to add up pretty quickly. And so if you know, you've got a domain that's quite stable, you're not just going to be changing it the regularly, then just use a very, very long TTL, maybe like a couple of hours or a day, or even a week for things that are really stable and really unlikely to change. so. You can do that and suddenly you can cut down your 53 costs by a significant amount when you're running a scale. When it comes to DNS and rapid API requests, there's another one that oftentimes can add some hidden costs that you don't think about, which is the cost associated with making cross origin requests. So with API gateway, It's really easy to enable course support. I just basically have to turn a setting on and that's it. And API Gateway is going to generate those options endpoints for your API endpoints. But because API Gateway charges you for every single request, it's also going to charge you for those course requests as well. And, You may think, oh, okay, that's easy problem to fix. You just slap a CDN in front of it and use caching and everything's fine. However, there has been a long standing bug with how API gateways, the options and points handles the cache headers, when you're using authorization headers. So this will be when you're using say, API gateway with a cognitive authorizer. So the cognitive tokens have to come in through the authorization header. And if you're using the authorization header, then it's going to break the cache headers that API gateways options endpoints going to return, which gonna just break your caching for those options endpoints. So there's a good chance that you may be double paying for those user requests because caching is not working properly for the options endpoints. Your solution is to either to roll your own options endpoints Which of course is not great. It's a lot of actual work you have to do. Or if you have control over the DN settings and domain settings for your application. instead of, putting your applications, APIs on a subdomain like api.example.com, if you move them to a sub path or the same domain where the front end is hosted, like example.com/api, then those don't count as, cross origin. Whereas requests to subdomains count as cross origin. So by making a very simple domain configuration change, you can actually remove the need for course requests altogether. So moving on to, I guess, a more architectural level of decisions. And when it comes to AWS and in the cloud, every architectural decision is essentially a buying decision. You're deciding what services you're going to buy and use. And making the wrong choice here and using the wrong service Can be very, very costly. Take, for example, you're building some kind of event driven architecture. And so you need to use some kind of a messaging service between maybe different Lambda functions. And, you know, if you're running at the one request per second, then you may think, Oh, you know, look at that. Kinesis is quite expensive compared to SNS and SQS and the event bridge, but it's always important to take your specific contacts and stripper into consideration, because if we were to dial this up to a thousand requests per second, Then suddenly Kinesis is much, much cheaper compared to EventBridge, because even though Kinesis charges you for uptime for the shards, however, it's cost per million requests is much lower. So as the number of requests goes up, your cost for Kinesis actually goes up much more slowly compared to the cost for say SNS and SQS and EventBridge, which only charges you for. The request and doesn't have an uptime cost associated with them. Same thing plays out with say API gateway versus ALB. Whereas ALB has got a kind of complex calculation for determining how many load balancer units you use and, and charges you an uptime hourly cost for that. So at a very low throughput, API gateway is much cheaper, whether it's the, whether you're using the REST APIs or HTTP APIs compared to ALB. But as you dial up the number of requests per second, so the total number of requests you're processing in a month, suddenly API gateway is going to be a lot more expensive compared to ALB. So a good rule of thumb to have around AWS is that the services that charges you based on uptime is going to be much, much more cost efficient compared to services that only charges you for the requests. So that's a good rule of thumb to have, but you still have to understand the specific cost dimensions of the individual services that you work with. The example I showed you earlier with the assumption that every single request is about one kilobyte in size. But what happens if you were to dial it up to one megabyte of payload per request? Because The bytes process is also one of the dimensions that ALB uses to calculate how much load balancer unit you are charged for. So at a much bigger payload size, ALB is going to be a lot more expensive compared to API gateway in this kind of a niche situation where maybe you're sending large documents or video files or binary data. And so at one megabytes per request, ALB is going to charge you a lot more compared to API gateway. So it's always important for you to understand the specific cost dimensions of the services that you use. At the reInvent 2023, Werner gave us the frugal architect, and he gave us, seven different rules or laws of the, of the frugal architect on AWS. And specifically, I really like this too, where he says that, architecting is basically a series of trade offs. And also we want to find systems that aligns with the business revenue model for our application. This is so that, you know, as our business grows and our revenue grows, the cost is going to grow along the same dimension. So as we make more money, our cost goes up. we don't have a situation where, you know, we're not making money, but, our cost is still going up because the, the cost dimensions is misaligned with the business revenue, of our business. And I've built a lot of, WebSocket based applications, and on AWS, you've got multiple services that give you WebSockets, without having to manage servers. So API Gateway has got managed WebSockets, so does AppSync with AppSync subscriptions, and IoT Core also gives you WebSockets as well. And imagine you're building some kind of a social network. So as more users signs up, and you're using WebSockets to give them real time updates whenever someone send them a message, or replies on their tweets or retweets or likes or what have you. So your connection time for the WebSocket connections is going to go up as well with the number of users, but your revenue doesn't go up at the same, the same dimension or at the same rate as the connection time, because, Revenue goes up when users are more engaged. So when they're doing more stuff, then they're more likely to turn into a paying user. So revenue goes up with the amount of engagement. And so I want to find the services that I can use where as far as the WebSocket is concerned, I'm going to be paying for activity and not for connection time. Unfortunately, for all the services we looked at earlier, One element or one dimension of the cost is connection time. And that's why when it comes to building WebSocket based applications, I also very much like Memento as well, which gives you WebSocket through its Memento topics. And the nice thing about Memento topics is that they don't charge you for connection time. They only charge you for the number of messages that you send. So that, you know, compared to other services. They only charge you for messages sent and not for connection time. And therefore, as my business grows, activity is going to go up. And, and only when people are doing things and sending messages and retweeting and whatnot, that's when I get the engagement. And that's when, that's, and that's the thing that's going to drive my revenue. And so my cost is going to go up with engagement and not just for connection time. And speaking of, picking now services that are cost efficient to work with, You may have came across this last year when a social network actually blew up and went from something like a few hundred users to half a million daily active users. And they ran on Vercel and suddenly they got slapped with something like a 100, 000 bill. And this is because even though Vercel is built on top of AWS and, you know, every layer have to add more, I guess, margins. And so As, you know, you are using the cell, they're actually slapping a big margin, a big markup on top of the, their a s cost as well. And so in the, in this specific example, they actually have a seven x markup on the underlying cost of Lambda functions. So Versal is, I think a lot of, I guess the, cost overhead in terms of, what you end up paying for. And they also add in more and more dimensions in terms of, you know, different things that they're going to charge you for, including now, to Jane, the loss from your cell functions so that you can query them elsewhere in a third party platform. So when you think of a cell, do keep in mind that there's a huge markup in terms of the cost of running, the cell functions versus just underlying lambda functions. and so when you are starting to have something that's, really running at a. pretty big scale. then, it can be really, really costly. And I've seen so many stories about, a set of customers, find, being surprised by a big, bill, when they suddenly get some traction in the application. we talked about this one earlier, that, simplifying your architecture can also pay off in terms of, cost efficiency as well. And a good rule of thumb to have is to avoid any unnecessary moving parts in your architecture. We talked about this one earlier where when you have a synchronous lambda to lambda invocation, then just simplify by combining everything you're doing into a single lambda function. but what about the way you have asynchronous Lambda to Lambda invocations? We talked about some particular use cases that are, that's a good idea, but oftentimes what I see instead is, you know, I've got, folks that are putting, essentially like a SNS, topic, before, between two different Lambda functions, just so that they are avoiding these Lambda to Lambda invocation, but they're not actually using SNS to provide any fan out. It's just. Purely there for, I guess, for cosmetics where they want to avoid the Lambda, Lambda invocations. I think again, for that, you may as well just do the direct Lambda, Lambda invocations asynchronously. As we talk about, there are some use cases for that, especially in this case, if you're not doing a fan out to multiple targets, you don't really need SNS and you're still going to get all the benefits in terms of having multiple targets. built in retries but also DLQ and the Lambda destination support as well, for when those async invocations fail after a few times. So you're not going to lose data. you get the same thing with, SNS to Lambda, but again, you don't need to have the SNS topic if you're not doing any fan out. So, you know, don't avoid. Don't put any SNS between them and functions just to avoid the direct Lambda to Lambda async invocations. Because again, async invocations has got some valid use cases. It's a synchronous Lambda to Lambda invocations that you want to watch out for. As I mentioned earlier, every component in your architecture should serve a purpose. So when you look at your architecture diagram, always pay attention to what you're doing. What each component is providing is, are they adding any value? Is there any reason for them to be there? If not, consider whether or not I can just remove that, architecture component. And, as Grace Hopper once said, the most dangerous phrase in the language is that we've always done it this way. Of course, people know this quote, so they don't say that anymore. but instead, you know, it is just, it's still reflected, in a lot of the behaviors. Yeah, we know we feel more comfortable with this because it's, you know, we're familiar with this pattern. So again, be very critical when you look at your architecture and see what each part is doing and what value you're getting from each component in your architecture diagram. So in terms of, simplifying architecture, there's different things you can do. For example, lambda function URLs nowadays is quite useful. Alternative to using API gateway to build a serverless APIs. so instead of having API gateway in front of Lambda function just to serve as a HP API endpoint that someone can call, you can actually have just Lambda function exposing itself, you know, with the function URL. So this is very useful if you're not using. All of the features that API gateway gives you in terms of having a cognitive authorizer, or the fact that it has support for request models and can validate post bodies for you, or is able to directly integrate with other services. Or maybe you know you're hitting some API gateway. Specific limits, such as the 29 seconds timeout for integration endpoints, or the fact that it doesn't support the response streaming, which is becoming more and more important when it comes to supporting AI workloads and, building chatbots that can, stream response to the caller. And as responses come back from the LM model. The downside with using lambda function URL is that you are kind of forced into writing lambda lifts, which is basically a single lambda function that handles all of the routes for an API and does this internal routing for different parts of the code. and, the main downside here is that, you also lose the per endpoint metrics and the alerts. And you're also not able to implement fine grained access control and use different access control for each of the endpoints. And also a lot of frameworks that you may want to use. They are not designed for lambdas constrained execution environment. So they're often quite big and bulky and they can really affect, they can really affect the costar performance of your function. In terms of security, You can either have a public URL or you can use AWS IAM as the authentication mechanism. So I find that they're best for either public APIs or internal microservices APIs where you're going to use AWS IAM as the authentication and authorization mechanism. However, having said that, if you want to build a user facing API that's authenticated, you can still do that. You can just validate, say, Cognito token inside your Lambda function, and turns out it's actually pretty quick to do that. so in terms of performance, there's no downside to doing that. The main consideration here is that when it comes to unauthorized requests, you only find out when your function runs. as opposed to with API gateway, where API gateway can validate the token. And if it's invalid, it's going to reject the request and you don't pay for those unauthorized requests with API gateway. We, if we do that with a Lambda function, then you have to always pay for the Lambda function invocation. And however much, And however much time it takes your function to realize that this is a unauthorized request, the token is invalid, or the token is missing, and so on. Interestingly, you can also switch the other way in terms of simplifying your architecture. You can go functionless. So instead of having an API gateway in front of a Lambda function, just so that you can make a call to, say, DynamDB table, you can actually just remove the Lambda function and have API gateway Integrate directly with, say, DynamDB by making a call to DynamDB. An API gateway can pretty much integrate with every single other AWS service directly without needing a Lambda function if all your function is doing is using the SDK to call another service. In this case, by removing the Lambda function, you're going to remove any cold start performance hits that's associated with having that Lambda function, as well as any costs related to the Lambda invocation, the duration, and so on. You can actually go pretty far with this approach. for example, you know, once your API gateway, writes the, Some data to Dynam db. You can also use the event bridge pipes to capture the data changes and then send them as event to event bridge, to event bus. And, those are gonna be dynam DB events for data inserted, updated, removed, and so on. They're not really domain events that specific to your business domain. And so what you can do in that case is use a land function to transform the event payload so that you can turn, say, like an insert event into, I don't know, user created event that's more akin to your business domain. So, and, a good, adage that we had at the start of the whole serverless movement was to use Lambda functions to transform data, not to transport data from one place to another. So in that case, you know, if you've got data that's in DynamoDB that you want to make searchable with OpenSearch, you can also do that nowadays without having to write a custom code. Lambda function. You can use the no code ETL process now to synchronize data from DynamoDB to OpenSearch to make it searchable. And it's not just API gateway that can do this. You can also use AppSync and use the step functions, which can all integrate with other AWS services directly without needing you to write custom code in the Lambda function, just to make a SDK call to some other service. Again, Every single component in your architecture should serve a purpose. If your lambda function is not doing any business logic, and all it's doing is transporting data from one place to another, and if the service that's sitting in front of the lambda function can do that already, then you've got to think critically about whether or not you need this lambda function to be there in the first place. Speaking of which, what if you just let the front end talk directly to the AWS services? So instead of having API gateway from Lambda and in front, and then in front of your down db table, you can just have the front end talk to the down db table directly. And before you call me crazy, you can actually do this securely and safely and still maintain. tenant isolation as well. The way you do that is, have the user logged into, say, Cognito, and then exchange the Cognito issued token, into some temporary AWS credentials using, Cognito identity pool. And then with the temporary credentials, you can then allow the front end to talk to DynamoDB directly. And the trick here, or rather the, the, the important part here is that In that IAM role that you, that you configure with Cognito Identity Pool, you add this condition here, which means that the issued credentials only allow the caller or the frontend to interact with DynamoDB against where the primary key matches the subject in the Cognito Identity or the token that has been issued by the Cognito User Pool. So that way, the frontend can only interact with DynamoDB. With the user's own data, but not any other users data, of course there's a lot of ways that this can go wrong. So this is not an approach that I would advise for Everybody in fact for most people I would say don't do this because that's Way more reason not to do this than there are reason to do this Is that definitely falls into the high risk high reward category in terms of things you can do? and the main reason to do this would be you're building a side project or side hustle and You are really cost conscious and you want to minimize your cost to Now as much as possible and so in that case by removing all the API layer in front of your data store and allow the front end talk to the database directly and the way that's safe and secure and you know what you're doing then this is a potential solution that you can explore. So that's 12 different things you can do or start different ways to save money, or rather cutting out waste in your serverless architecture. And if you want to learn how to actually build a serverless architecture that's production worthy, then check out my upcoming workshop, production ready serverless. Which has just started and you can still sign up now for the next couple of days. And if you've got any questions, please feel free to reach out to me and respond to the comment. And I'll try to answer your questions as much as possible. Okay, thank you so much for your time and enjoy the rest of the conference. Okay, bye bye.

Slides

Download slides (PDF)

See all 58 talks at this event!

Conf42 DevOps 2025 - Online

January 23 2025 - premiere 5PM GMT

Money-saving tips for the frugal serverless developer

Video size:

Abstract

Summary

Transcript

Slides

Yan Cui

Developer Advocate @ Lumigo

Join the community!

Featured event

2025

2024

Info

Conf42 DevOps 2025 - Online

January 23 2025 - premiere 5PM GMT

Money-saving tips for the frugal serverless developer

Video size:

Abstract

Summary

Transcript

Slides

Yan Cui

Developer Advocate @ Lumigo

Join the community!