Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, everyone.
Thank you for taking the time out of your busy day to join us on this
session about how to save money on your 80 percent serverless environment.
so without further ado, let me just switch over to my slide and
we can get started right away.
So a quick word about myself.
I'm AWS Serverless Hero.
My name is Yen Chui, and I've been doing stuff on AWS since 2010.
Nowadays, I spend half my time working with Lumigo.
As a developer advocate and the other half my time I work as an independent
consultant when I help other people, other companies adopt serverless technologies.
And one thing I like to do is collecting tips around how to save
money on your AWS environment.
And so I've got lots of ideas to share with you today.
So I hope you are ready to drink from the fire hose.
As, as we go through a number of different ways, you can save money
on your AWS serverless environment.
So we're going to, so we're going to start with something that's very
simple as I think that everybody should be doing is even if you are new to
AWS, this is probably the first thing you should do when you create an AWS
account is to set up billing alarms.
they're not perfect.
They're usually a few hours behind.
but it's much better to find out that you've got a problem a few hours late.
Then say a few weeks late when your bill finally arrived and you've got a much
bigger amount that you have to pay than if you were to, you know, found out that
you've got an issue a few hours into it.
So Luke, and his team at the PostNL, are pretty experienced with
AWS and serverless, but they were still caught by a mistake, which,
triggered their AWS cost to spiral.
And in this particular case, they It was, because of one line of
change in their code, which caused the lambda functions to make a lot
of API calls to secrets manager.
And the reason for it was because that one line of change broke their caching.
So instead of making a call to secrets manager, At cold start and
then caching the secret, they will make a call to secret manager on
every single invocation, which at their scale, they are the national,
delivery service for the Netherlands.
You can imagine that's hundreds of millions of requests per day, which
can add up pretty quickly, which is why within a few days, they got, an alert
that triggered, the, the billing alarm in the AWS, which I think was something
like 2, 000 or something like that.
Yeah.
Which is their monthly budget for, for their whole AWS
environment for their team.
So luckily they were able to find out that this problem was happening within
a few days, as opposed to say, within the, after a few weeks, when the, when
the damage could have been much bigger.
So the learning from here is that.
Yes, billing is not a perfect, there are a few hours behind, but it's much better to
find out that you've got a problem early on into the issue than to say much later.
And the billing alarms does work.
So in this case, yeah, it was, no, they did have, did suffer some damage,
for a few thousand dollars, but it could have been a lot worse if they
didn't find out and only found out when say the finance team comes
knocking on their door to say, Okay.
What else is going on guys?
So your bills are much bigger now compared to what it was last month.
So billing systems work.
it's just that, is there not perfect?
So when it comes to billing and the cost of AWS, one of the biggest offenders,
something, something, something that I see probably always come up as number
one or two on the, my customers.
It is bill is a cloud watch.
Specifically around the cloud watch logs, and it's very often as I work with
my consulting clients that I see cloud watch costing maybe, you know, a few
times, maybe even 10 times more than the actual application itself when you
consider the cost of, say, API gateway and lambda functions and down DB tables,
cloud watch is often much, much higher in terms of that cost, and also keep in mind
that as your cost goes up because you're collecting more and more logs, more and
more data in CloudWatch, the value you get from those logs actually goes down because
now you're getting more and more noise you have to deal with, and it's harder for
you to then find the piece of information you actually need to debug some kind
of problems you have in production.
So with CloudWatch, it's just also not very good at surfacing the really
valuable and actionable information from all this data that you're collecting.
So the number one thing I like to do to keep my CloudWatch logs
cost under control is to make sure that I do structured logging.
And I want to pay really close attention every time I write a log
message as to which log level this log message should be recorded as.
Because when it comes to production, you don't need all
the different debug, debug logs.
When there's so much requests happening at the same time, instead in production, you
probably just need to record everything that's info level or above so that
you don't have all of these debug logs that doesn't give you a lot of value.
But at the same time, there is, you know, you're going to be paying
for every single log message that is collected by cloud watch logs.
However, sometimes, especially when things go wrong.
These are debug logs can be really, really useful in terms of helping
you figure out what the problem was.
So even though in production you want to disable debug logging, you want to
be able to sample some percentage of your debug logs in production such that
hopefully say, Now, if you collect 10 percent or 5 percent of the debug logs
for all of the invocations for lambda function, you have enough debug logs
that covers every single code path.
So that when there's a problem happening in production, you have got some debug
logs that can tell you what to do.
What the problem was to help you figure out what the issue was, so that you don't
have to say, go back to your code, re enable debug logging, and then redeploy
to production, wait some time for the debug logs to be collected, and then
figure out what the problem was, and then disable debug logging and all of that.
Instead, you want to be always sampling some percentage of debug logs, such that
you've got enough information to help you figure out what Problems in production,
but not so much that you end up paying a disproportional amount of money for debug
logs that doesn't add a lot of value.
Another thing to keep in mind when it comes to working with CloudWatch logs is
that by default, log retention is set to never expire, because CloudWatch doesn't
want to delete your data without you telling them that you're okay with that.
This is fine from their perspective, but from my perspective, the value of the
logs reduced goes down as the time goes by, and there's no reason for me to keep
logs that's older than, say, three years.
30 days.
especially as the application continued to evolve and change, those logs becomes
more and more outdated as time goes by.
So you want to also change your log retention to something that's
more reasonable, such as seven days or 14 days or 30 days.
And also you don't want to pay the storage cost of 3 cents per gigabyte on all the
logs that you've ever produced, forever, especially when those logs are gonna.
You essentially become useless after a few weeks, and also another thing to keep
in mind is that many of you are not using cloud watch logs to query your data.
You are shipping your logs from cloud watch to some other third party
service, and then you can query them.
So in this case, As the lambda function produces those logs, you
afforded them to say logs IO or some other platform, and then you'd
analyze them in those platforms.
In this case, you still end up having to ingest logs in CloudWatch
first, and therefore you still have to pay that 50 cents per gigabyte
of ingestion cost for CloudWatch.
And so in this case, you probably end up paying twice for the ingestion
and processing of those logs, which of course is going to be a waste.
So nowadays, when it comes to Lambda, you can use the Lambda extensions to ship all
of your logs to some third party provider.
via the telemetry API with, which basically allows Lambda extensions,
which are like sidecars to your main Lambda runtime to access the logs from
your function and is able to then send those logs to some third party vendor.
And you can do this without having to go through CloudWatch logs first, which
means once you've done that, you can then also add This bit of IAM permission to
your Lambda functions IAM role to then disable your function from being able
to send those logs to CloudWatch so that you don't end up paying for the ingestion
for the same logs at CloudWatch as well.
So this is a much better way to send your logs information to
another vendor instead of having to go through CloudWatch logs first.
And then process them from there.
And if you're looking to move away from cloud watch logs and you're looking
for another vendor that's more cost efficient and allow you to do more with
your log information, then the checkout Lumigo, the log management system is
a lot cheaper compared to cloud watch.
And also they treat every single log message as a event so that you can
then on demand to create a arbitrary metrics and then alerts on top of that.
And it's all included in the price.
For the ingestion of those logs so you don't pay separate for logs and alerts
and dashboards and so on So staying with car wash logs Another thing I want to
mention is that remember those system messages that you get So after every
single invocation, you get a number of Logs in your lambda functions, cloud
watch logs, which for most of you, it's probably not going to matter very much.
but if you're running a scale, for example, you're doing billions of
invocations per month, like the guys at the far from analytics.
then those are system messages can actually end up being.
you know, costing a non trivial amount of, of, of dollars per month.
And of course, they don't really give you a lot of value
in return for that investment.
So nowadays there's a way for you to control what information is included and
how much of those system log messages.
It gets produced by Lambda.
So with Lambda, now you have this, login config setting on your Lambda
function, which you can configure through the CLI, through CloudFormation, and,
through CDK and other tools that uses CloudFormation as well, whereby you can
set the log format for your function, so that, by default, this is still gonna
just, output plain text, so whatever your function is writing to standard
out, I'm It's going to keep it as a plain text, but you can also switch to
JSON so that the Lambda runtime is going to capture whatever information that
you're sending to standard out and then format it into a JSON blob so that you
get, you kind of get a structured login without you actually doing any structured
logging in your application code.
Now, where it gets interesting is that once you set the log format to JSON,
you can now also configure a system log level, which controls which of
the system log messages are actually produced by the Lambda runtime.
This is not in the official documentations, but I did some
experimentation to find out which of the log messages are produced at
which of the log system log levels.
So if you just want to know.
Whether or not there's a like a unhandled exception.
And so, you know, your application code didn't in a capture.
I didn't capture any of the errors and you bubble up to the
Lambda runtime and the brew up.
And so you can just set the log level to one.
And that way you will still get those unhandled exceptions
in the in the, in the logs.
but none of the other system messages for when your function starts, when
When it finishes, how much memory gets used and so on and so forth.
So CloudWatch is one of those services that everyone should really know if
they're going to be using AWS, especially if you're going to be using Lambda.
So I recommend this book by Sandra and Tobias, who does a really good job of
explaining how CloudWatch works, even if you don't end up using CloudWatch
for your day to day, you know, querying your logs and the things like that.
It's still worth understanding how it works and all the different
things that the cloud watch gives you nowadays as well.
Okay, so moving on, next thing we're going to talk about is still
around Lambda, but specifically around the cost of Lambda functions.
And, those of you who have used Lambda before, you probably know this already
that with Lambda, you have basically one level to control the performance
and the cost for your function.
through how much memory that you allocate to the function.
More memory equals more CPU and equals more network bandwidth as well.
But also the more memory you allocate, it also means that the more you're
going to spend per millisecond of execution time for your function.
And that cost is proportional to the amount of memory that
you allocate to the function.
Which on the one hand means that it's really easy to just give your function
more power so that you can process whatever requests are faster, but at
the same time, it's also very easy to be wrong by an order of magnitude.
So even though it probably does not going to happen very often, but when it does,
those over provision functions can hit you pretty hard on the finance side of things.
And so this actually happened to a client of mine a while back.
So they produce videos for those kind of hand drawn, sort of tutorial videos
that you sometimes see on YouTube.
And so they use a lambda function to do some of that rendering.
Now the team understood that more power, more memory equals more CPU power.
And so they want to reduce the amount of time it takes to do the rendering.
And so they decided, Oh, let's just allocate, with the maximum amount
of memory to the lambda function.
So it gave me the full 10 gig.
And so at the end of the month, they found out, Oh, wait a minute.
So suddenly we've got this one rendering.
So Lambda function now costing us something like 10, 000 a month.
So something is clearly wrong there.
And so the reason is for that is because, well, the allocated the full
10 gig of memory to the function, but it didn't reduce the amount of
the rendering time proportionally.
And the reason for that is because while with Lambda, yes,
more memory equals more CPU.
But once you get to 1.
8 gigabytes of memory, you unlock a second CPU core.
And so by the time you hit the 10 full 10 gig of memory allocation for the
function, you actually have six CPU cores.
So to take full advantage of all the CPU you have, you have to
write application in such a way.
That allows you to process things in parallel using multiple CPU cores.
Unfortunately for them, they were using Node.
js for the rendering part of things.
And so by default, Node.
js runs on an event loop.
And so you have to write your application specifically using workers and, and
child processes to all in order to.
Paralyze the way you are doing and take full advantage of the fact that
you've got six CPU cores instead of one very, very big CPU core.
And they weren't doing that.
So even though they were spending and paying for 10 gigabytes of memory and
all the CPU power that comes with that, they were only able to use essentially 1.
8 gig.
Of, of the memory that they were paying for, and that's why, you know, when
it comes to helping these clients, we just made a very simple decision
to reduce memory allocation to 1.
8 gig, so they get a full advantage of what they are paying for, so much
more efficient, you know, process for rendering, but, not for, All the actual
amount of CPU and power that they were paying for, but not really using.
And as the great Donald Knuth once said, that we should forget about small
efficiencies, say about 97 percent of the time, that premature optimization is the
root of all evil, but we should not pass up our opportunities in that critical 3%.
And that's where, for that one client, 99.
9 percent of their lambda functions was not doing anything significant
in terms of cost, but that one single function Was, was accounting
for over 99 percent of the actual cost for the serverless environment.
So identifying those critical 3 percent and really focus on how to improve the
performance and efficiency for that critical 3 percent is super important.
So again, in Domingo, you can see all the functions you have across all the
different regions, and you can sort them by cost so that you can really,
really quickly identified outliers in your environment, whether or not
you've got lambda functions that are.
Say, disproportionately represented in terms of your cost allocation, and you
can look at, you know, how much your memory is allocated to the function,
how much memory is being used on average for processing when running this lambda
function, and how often this function runs as well, to give you an idea which
function requires some more special attention in terms of optimization to
right size of memory allocation, And the save the cost for those lambda
functions in terms of actually, reducing and the right sizing, the memory
setting for your lambda function.
I think the best tool you can use is the lambda power tuning tool from Alex
Casaboni, who used to work at AWS.
And so you can Basically use the lambda power tuning tool, which is a step
function state machine that takes your function and produces the different copies
of it with different memory settings, and then run a number of executions
against those different variants and find out based on the performance
and, and then cost of those functions, where is the sweet spot that gives you
the most bang for your buck in terms of the cost of your, of the function
and how much performance you get.
Another good way to reduce the cost of lambda functions is to use the
ARM architecture instead of x86 because on a per millisecond execution
time, ARM architecture is actually 25 percent cheaper compared to x86.
However, Performance gonna differ depending on what it is you're doing.
And, some people may report that, for their workload, ARM is actually faster
and also cheaper per millisecond.
But for some of the things that I've tested on, ARM can sometimes be 60
percent more, I guess slower than the X 86 for the same workload in that case.
Now, if you are saving 25% per millisecond, but you're spending
60% more milliseconds to process the same thing, then you actually
spend more money on arm than X 86.
So in that case, that's not a good idea, but we, I find the arm is really good,
is, where you've got functions, they have to talk to third party services and,
maybe those third party services are.
Quite slow.
So you spend a lot of time waiting for IO.
So in that case, well, you're just going to wait.
Then there might as well switch to ARM where the cost per millisecond is cheaper.
So the cost of those wait time is going to be less as well.
And still staying with the lambda functions.
Another, I guess, a big thing, and this is probably the number one, I
guess, anti pattern when it comes to serverless environments around the
Lambda functions is, you know, direct synchronous Lambda to Lambda invocations.
Now, one thing to keep in mind about Lambda is that, Every single
invocation of a lambda function goes through its invoke API.
And there are different ways you can invoke this.
Then that's why there's an invocation type attribute here, where you can say,
I want a request response invocation, in which case the caller calls the
lambda function, and they have to wait for the whole invocation to finish.
And get the output from the invocation.
And that becomes a response from the invoke API call.
So that's a synchronous invocation.
But you can also have an asynchronous invocation by
setting invocation type as event.
In which case, as a caller, I call the invoke function with
its invoke API, and I'm going to get a response back right away.
But the function is not going to run right away.
Right there.
And then it's going to run at some point, and it's going to
go through an internal queue.
And so I don't get the output from the function from calling the invoke
API, but the function is going to actually execute asynchronously.
And when it comes to synchronous lambda to lambda invocations, It's pretty much
always a sign of bad design, and it has also got cost implications because
when the first function runs and calls the second function synchronously, it
has to wait for the second function to finish executing to get its output
in the response, which means for the entire duration of the second function.
The first function is still running and just waiting.
So you're actually paying for execution time twice for both the first function,
the caller, as well as the second function, the callee, which is again,
I don't mind spending money, but I want to get value for what I'm spending on.
I hate the waste.
I hate the painful things that just Doesn't provide any value.
In this case, I'm not really getting any value from having two functions all
running at the same time, and especially when you've got, say, two functions, one
calling another inside the same service boundary, whereby I own both functions,
I actually don't need them to be running as two separate lambda functions.
I can just.
Get rid of a second function and do whatever it needs to do inside that first
function because again, I own everything, so I can just reorganize things and I
can still have that modularity at the code level without having to also have
that modularity at the lamb function or infrastructure level as well and mean.
The thing is this, things are called lamb functions, but you
shouldn't have to confuse them.
With Lambda functions in programming, you can still have, you know, modularity
in your code level without having to again, you know, enforce the same
modularity at the infrastructure level.
But what if your Lambda to Lambda calls are across service boundaries?
So now this would be one service providing some capabilities to
other teams and other services by exposing a Lambda function that other
can just call and invoke directly.
This is even a worse idea because now you are Binding your consumers, your API
or service consumers to implementation details that can easily change the fact
that you are using a Lambda function and what region you're in and what your
function is called, all of these things are implementation details, and you should
be able to change any and all of them without forcing some downstream systems
to change as well, so that if you want to give them some capabilities, please.
Just give them an API that they can call.
So the fact that you are using a Lambda function behind it, the
API, is just implementation detail.
maybe today Lambda makes sense for you, but maybe tomorrow you are, you
know, you're handling such a high traffic that it makes more sense
to take your workload and move it into, say, a Fargate container.
You can do that, but it doesn't impact your callers because they are
still talking to the same HTTP API.
They're using the same HTTP API contract to talk to your service.
So, And, no, that's a stable interface.
What you do behind the HTTP API is entirely your business.
It's implementation details that you can change and control.
Okay, so we've established that synchronous lambda to lambda
calls is the anti pattern.
But what about asynchronous invocations?
Because we talked about earlier that when you invoke a function, you can
do so synchronously, but you can also do it asynchronously as well.
Well, there are some legit use cases I think is a good idea for
using asynchronous invocations.
For example, if I've got a user facing API that handles the user request,
and if the user request is to say save my profile updates in the database,
okay, my function is going to do that.
But I maybe have some other secondary responsibility that I want to do.
Such as, tracking some events for business analytics and then what
have you, those are not things that the user really cares about.
So I don't want to do them while the user is waiting for response.
So what I can do is to take those secondary responsibilities and move
them into a second function and invoke the second function asynchronously.
So I don't have to wait for it to finish.
And so this way allows me to build a better user experience because my
user facing API function can respond to the caller faster without having
to wait for all those secondary responsibilities to complete.
And it also allows me to build more robust error handling as
well when it comes to dealing with those secondary responsibilities.
Because when it comes to asynchronous invocations, Lambda functions gives you
built in, two retries out of the box as well as the Delta Q support as well.
So that if something goes wrong and I care about, you know, making sure those
things happen, I can then use the data queue to capture any failed invocations.
And so I can then retry them later when, say, maybe I'm talking to a third party
service that had a temporary downtime.
And so I can then use the data queue to capture the events that
failed and then reprocess them when the system comes back online.
And that's with the assumption that what I've got here is
within the same service boundary.
So whatever I'm doing in the second function are things that I would
have done in the first function.
That's part of my API service already.
And what about when you have asynchronous implications across service boundaries?
This is still going to be a bad idea.
You never want to expose capabilities to other services as in the form of a
lambda function that someone can call.
directly, either synchronously or asynchronously, because again, you are
exposing them, you're tying them to implementation details on your side.
So in general, are async lambda to lambda invocations okay?
well, it really depends on what it is you're doing.
And as I said, there are some legit use cases for it, which
I think they are a good idea.
But I think it's really important, to follow a principle that, where, you know,
when you look at every single component in your architecture, They should all
serve a purpose and they should all provide some return on investment.
And you shouldn't just do it for the sake of it.
And talking about looking at the architectural components and making
sure that everything has a good.
I'm making sure that everything has a return on investment and then the
good, I guess, a real world antidote from the, the far from guys is that,
they had this system whereby, you know, they've got an ingestion API, which is
backed by lambda function that will put something into a queue and then they'll
process it with another SQS function.
And so what they found was that as they simplify this setup and remove the
queue from the, from the architecture and the second SQS function from the
architecture, and just did everything in that first function at the ingestion
API, they actually save a lot of cost.
And they also improve the performance of the system because there's
less things that, you know, less moving parts in architecture.
And they also save quite a bit of cost associated with SQS and
also that second lambda function.
So by simplifying architecture, you can also find that you can sometimes save on
cost as well, especially in a serverless environment where you're paying for every
single request that you're processing.
And when it comes to cost efficiency and scalability and performance, caching is
probably the one of the most powerful and probably underutilized tool.
For me, it's almost like a cheat code for building performance
and scalable applications.
And when talking about your typical, I guess the web APIs, there are so
many different ways and do different places where you can apply caching.
So consider your typical, I guess, user facing API.
You've got your client talking to, an API behind some CDN like CloudFront
and API is backed by a Lambda function and some DynamDB table.
You can do some client side caching for static assets or configurations
that doesn't really change.
and you can also do some API response caching at the edge with a CloudFront
in front of your API gateway.
And then you can also know if there's anything that's the computationally
expensive that your application is doing.
You can also do application level caching in the Lambda function, perhaps
using something like Elastic Cache.
So that you can, you know, share the same, cash results across multiple
instances of the same function, or maybe even across multiple functions as well.
And since I don't like paying for uptime that, you know, for things
that I don't use, elastic cash forces me to pay for uptime instead
of paying for just what I use.
actually nowadays I prefer to use Memento as well.
which gives you a really good, really scalable and cost efficient serverless
cache that you only pay for the number of requests you make, as opposed
to for the uptime for the cluster.
Another form of caching that people don't talk about enough, and at
scale can also still bite you, is the Route 53, or DNS caching.
With Route 53, you are charged for the number of requests, or number of
DNS requests that Route 53 handles for you, and depending on your TTL,
you know, if you have got a short TTL, then that means you may get
more requests coming in, and so when you're running a single route, scale.
those requests, again, those requests and those, those costs
can still to add up pretty quickly.
And so if you know, you've got a domain that's quite stable, you're not just
going to be changing it the regularly, then just use a very, very long TTL,
maybe like a couple of hours or a day, or even a week for things that are really
stable and really unlikely to change.
so.
You can do that and suddenly you can cut down your 53 costs by a significant
amount when you're running a scale.
When it comes to DNS and rapid API requests, there's another one that
oftentimes can add some hidden costs that you don't think about,
which is the cost associated with making cross origin requests.
So with API gateway, It's really easy to enable course support.
I just basically have to turn a setting on and that's it.
And API Gateway is going to generate those options endpoints for your API endpoints.
But because API Gateway charges you for every single request,
it's also going to charge you for those course requests as well.
And, You may think, oh, okay, that's easy problem to fix.
You just slap a CDN in front of it and use caching and everything's fine.
However, there has been a long standing bug with how API gateways, the options
and points handles the cache headers, when you're using authorization headers.
So this will be when you're using say, API gateway with a cognitive authorizer.
So the cognitive tokens have to come in through the authorization header.
And if you're using the authorization header, then it's going to break
the cache headers that API gateways options endpoints going to return,
which gonna just break your caching for those options endpoints.
So there's a good chance that you may be double paying for those user
requests because caching is not working properly for the options endpoints.
Your solution is to either to roll your own options endpoints
Which of course is not great.
It's a lot of actual work you have to do.
Or if you have control over the DN settings and domain
settings for your application.
instead of, putting your applications, APIs on a subdomain like api.example.com,
if you move them to a sub path or the same domain where the front end is
hosted, like example.com/api, then those don't count as, cross origin.
Whereas requests to subdomains count as cross origin.
So by making a very simple domain configuration change, you can
actually remove the need for course requests altogether.
So moving on to, I guess, a more architectural level of decisions.
And when it comes to AWS and in the cloud, every architectural decision
is essentially a buying decision.
You're deciding what services you're going to buy and use.
And making the wrong choice here and using the wrong service
Can be very, very costly.
Take, for example, you're building some kind of event driven architecture.
And so you need to use some kind of a messaging service between
maybe different Lambda functions.
And, you know, if you're running at the one request per second, then you
may think, Oh, you know, look at that.
Kinesis is quite expensive compared to SNS and SQS and the event bridge,
but it's always important to take your specific contacts and stripper into
consideration, because if we were to dial this up to a thousand requests per
second, Then suddenly Kinesis is much, much cheaper compared to EventBridge,
because even though Kinesis charges you for uptime for the shards, however, it's
cost per million requests is much lower.
So as the number of requests goes up, your cost for Kinesis actually
goes up much more slowly compared to the cost for say SNS and SQS and
EventBridge, which only charges you for.
The request and doesn't have an uptime cost associated with them.
Same thing plays out with say API gateway versus ALB.
Whereas ALB has got a kind of complex calculation for determining how many load
balancer units you use and, and charges you an uptime hourly cost for that.
So at a very low throughput, API gateway is much cheaper, whether it's
the, whether you're using the REST APIs or HTTP APIs compared to ALB.
But as you dial up the number of requests per second, so the total number of
requests you're processing in a month, suddenly API gateway is going to be
a lot more expensive compared to ALB.
So a good rule of thumb to have around AWS is that the services that charges you
based on uptime is going to be much, much more cost efficient compared to services
that only charges you for the requests.
So that's a good rule of thumb to have, but you still have to understand
the specific cost dimensions of the individual services that you work with.
The example I showed you earlier with the assumption that every single
request is about one kilobyte in size.
But what happens if you were to dial it up to one megabyte of payload per request?
Because The bytes process is also one of the dimensions that ALB
uses to calculate how much load balancer unit you are charged for.
So at a much bigger payload size, ALB is going to be a lot more expensive compared
to API gateway in this kind of a niche situation where maybe you're sending large
documents or video files or binary data.
And so at one megabytes per request, ALB is going to charge you a lot
more compared to API gateway.
So it's always important for you to understand the specific cost
dimensions of the services that you use.
At the reInvent 2023, Werner gave us the frugal architect, and he gave
us, seven different rules or laws of the, of the frugal architect on AWS.
And specifically, I really like this too, where he says that, architecting
is basically a series of trade offs.
And also we want to find systems that aligns with the business
revenue model for our application.
This is so that, you know, as our business grows and our revenue grows, the cost is
going to grow along the same dimension.
So as we make more money, our cost goes up.
we don't have a situation where, you know, we're not making money, but, our
cost is still going up because the, the cost dimensions is misaligned with
the business revenue, of our business.
And I've built a lot of, WebSocket based applications, and on AWS,
you've got multiple services that give you WebSockets,
without having to manage servers.
So API Gateway has got managed WebSockets, so does AppSync with
AppSync subscriptions, and IoT Core also gives you WebSockets as well.
And imagine you're building some kind of a social network.
So as more users signs up, and you're using WebSockets to give them real time
updates whenever someone send them a message, or replies on their tweets
or retweets or likes or what have you.
So your connection time for the WebSocket connections is going to go up as well
with the number of users, but your revenue doesn't go up at the same, the
same dimension or at the same rate as the connection time, because, Revenue
goes up when users are more engaged.
So when they're doing more stuff, then they're more likely
to turn into a paying user.
So revenue goes up with the amount of engagement.
And so I want to find the services that I can use where as far as the WebSocket
is concerned, I'm going to be paying for activity and not for connection time.
Unfortunately, for all the services we looked at earlier, One element or one
dimension of the cost is connection time.
And that's why when it comes to building WebSocket based applications, I also very
much like Memento as well, which gives you WebSocket through its Memento topics.
And the nice thing about Memento topics is that they don't
charge you for connection time.
They only charge you for the number of messages that you send.
So that, you know, compared to other services.
They only charge you for messages sent and not for connection time.
And therefore, as my business grows, activity is going to go up.
And, and only when people are doing things and sending messages and retweeting and
whatnot, that's when I get the engagement.
And that's when, that's, and that's the thing that's going to drive my revenue.
And so my cost is going to go up with engagement and
not just for connection time.
And speaking of, picking now services that are cost efficient to work with, You may
have came across this last year when a social network actually blew up and went
from something like a few hundred users to half a million daily active users.
And they ran on Vercel and suddenly they got slapped with
something like a 100, 000 bill.
And this is because even though Vercel is built on top of AWS and, you know, every
layer have to add more, I guess, margins.
And so As, you know, you are using the cell, they're actually slapping
a big margin, a big markup on top of the, their a s cost as well.
And so in the, in this specific example, they actually have a seven x markup on
the underlying cost of Lambda functions.
So Versal is, I think a lot of, I guess the, cost overhead in terms
of, what you end up paying for.
And they also add in more and more dimensions in terms of, you know,
different things that they're going to charge you for, including now,
to Jane, the loss from your cell functions so that you can query them
elsewhere in a third party platform.
So when you think of a cell, do keep in mind that there's a huge markup in terms
of the cost of running, the cell functions versus just underlying lambda functions.
and so when you are starting to have something that's, really running at a.
pretty big scale.
then, it can be really, really costly.
And I've seen so many stories about, a set of customers, find, being surprised
by a big, bill, when they suddenly get some traction in the application.
we talked about this one earlier, that, simplifying your architecture can also pay
off in terms of, cost efficiency as well.
And a good rule of thumb to have is to avoid any unnecessary
moving parts in your architecture.
We talked about this one earlier where when you have a synchronous
lambda to lambda invocation, then just simplify by combining everything you're
doing into a single lambda function.
but what about the way you have asynchronous Lambda to Lambda invocations?
We talked about some particular use cases that are, that's a good idea,
but oftentimes what I see instead is, you know, I've got, folks that
are putting, essentially like a SNS, topic, before, between two different
Lambda functions, just so that they are avoiding these Lambda to Lambda
invocation, but they're not actually using SNS to provide any fan out.
It's just.
Purely there for, I guess, for cosmetics where they want to avoid
the Lambda, Lambda invocations.
I think again, for that, you may as well just do the direct Lambda,
Lambda invocations asynchronously.
As we talk about, there are some use cases for that, especially in this case,
if you're not doing a fan out to multiple targets, you don't really need SNS and
you're still going to get all the benefits in terms of having multiple targets.
built in retries but also DLQ and the Lambda destination support
as well, for when those async invocations fail after a few times.
So you're not going to lose data.
you get the same thing with, SNS to Lambda, but again, you don't
need to have the SNS topic if you're not doing any fan out.
So, you know, don't avoid.
Don't put any SNS between them and functions just to avoid the direct
Lambda to Lambda async invocations.
Because again, async invocations has got some valid use cases.
It's a synchronous Lambda to Lambda invocations that
you want to watch out for.
As I mentioned earlier, every component in your architecture should serve a purpose.
So when you look at your architecture diagram, always pay
attention to what you're doing.
What each component is providing is, are they adding any value?
Is there any reason for them to be there?
If not, consider whether or not I can just remove that, architecture component.
And, as Grace Hopper once said, the most dangerous phrase in the language
is that we've always done it this way.
Of course, people know this quote, so they don't say that anymore.
but instead, you know, it is just, it's still reflected,
in a lot of the behaviors.
Yeah, we know we feel more comfortable with this because it's, you know,
we're familiar with this pattern.
So again, be very critical when you look at your architecture and see
what each part is doing and what value you're getting from each component
in your architecture diagram.
So in terms of, simplifying architecture, there's different things you can do.
For example, lambda function URLs nowadays is quite useful.
Alternative to using API gateway to build a serverless APIs.
so instead of having API gateway in front of Lambda function just to serve
as a HP API endpoint that someone can call, you can actually have just
Lambda function exposing itself, you know, with the function URL.
So this is very useful if you're not using.
All of the features that API gateway gives you in terms of having a cognitive
authorizer, or the fact that it has support for request models and can
validate post bodies for you, or is able to directly integrate with other services.
Or maybe you know you're hitting some API gateway.
Specific limits, such as the 29 seconds timeout for integration endpoints, or the
fact that it doesn't support the response streaming, which is becoming more and
more important when it comes to supporting AI workloads and, building chatbots
that can, stream response to the caller.
And as responses come back from the LM model.
The downside with using lambda function URL is that you are kind of forced
into writing lambda lifts, which is basically a single lambda function
that handles all of the routes for an API and does this internal routing
for different parts of the code.
and, the main downside here is that, you also lose the per
endpoint metrics and the alerts.
And you're also not able to implement fine grained access control and use different
access control for each of the endpoints.
And also a lot of frameworks that you may want to use.
They are not designed for lambdas constrained execution environment.
So they're often quite big and bulky and they can really affect,
they can really affect the costar performance of your function.
In terms of security, You can either have a public URL or you can use AWS
IAM as the authentication mechanism.
So I find that they're best for either public APIs or internal
microservices APIs where you're going to use AWS IAM as the authentication
and authorization mechanism.
However, having said that, if you want to build a user facing API that's
authenticated, you can still do that.
You can just validate, say, Cognito token inside your Lambda function, and turns
out it's actually pretty quick to do that.
so in terms of performance, there's no downside to doing that.
The main consideration here is that when it comes to unauthorized requests, you
only find out when your function runs.
as opposed to with API gateway, where API gateway can validate the token.
And if it's invalid, it's going to reject the request and you don't pay for those
unauthorized requests with API gateway.
We, if we do that with a Lambda function, then you have to always pay
for the Lambda function invocation.
And however much, And however much time it takes your function to
realize that this is a unauthorized request, the token is invalid, or
the token is missing, and so on.
Interestingly, you can also switch the other way in terms of
simplifying your architecture.
You can go functionless.
So instead of having an API gateway in front of a Lambda function, just
so that you can make a call to, say, DynamDB table, you can actually just
remove the Lambda function and have API gateway Integrate directly with, say,
DynamDB by making a call to DynamDB.
An API gateway can pretty much integrate with every single other AWS service
directly without needing a Lambda function if all your function is doing
is using the SDK to call another service.
In this case, by removing the Lambda function, you're going to remove any cold
start performance hits that's associated with having that Lambda function, as
well as any costs related to the Lambda invocation, the duration, and so on.
You can actually go pretty far with this approach.
for example, you know, once your API gateway, writes
the, Some data to Dynam db.
You can also use the event bridge pipes to capture the data changes
and then send them as event to event bridge, to event bus.
And, those are gonna be dynam DB events for data inserted,
updated, removed, and so on.
They're not really domain events that specific to your business domain.
And so what you can do in that case is use a land function to transform the
event payload so that you can turn, say, like an insert event into, I
don't know, user created event that's more akin to your business domain.
So, and, a good, adage that we had at the start of the whole serverless
movement was to use Lambda functions to transform data, not to transport
data from one place to another.
So in that case, you know, if you've got data that's in DynamoDB that you
want to make searchable with OpenSearch, you can also do that nowadays without
having to write a custom code.
Lambda function.
You can use the no code ETL process now to synchronize data from DynamoDB
to OpenSearch to make it searchable.
And it's not just API gateway that can do this.
You can also use AppSync and use the step functions, which can all integrate
with other AWS services directly without needing you to write custom
code in the Lambda function, just to make a SDK call to some other service.
Again, Every single component in your architecture should serve a purpose.
If your lambda function is not doing any business logic, and all it's
doing is transporting data from one place to another, and if the service
that's sitting in front of the lambda function can do that already, then
you've got to think critically about whether or not you need this lambda
function to be there in the first place.
Speaking of which, what if you just let the front end talk
directly to the AWS services?
So instead of having API gateway from Lambda and in front, and then
in front of your down db table, you can just have the front end
talk to the down db table directly.
And before you call me crazy, you can actually do this securely
and safely and still maintain.
tenant isolation as well.
The way you do that is, have the user logged into, say, Cognito, and then
exchange the Cognito issued token, into some temporary AWS credentials
using, Cognito identity pool.
And then with the temporary credentials, you can then allow the front end
to talk to DynamoDB directly.
And the trick here, or rather the, the, the important part here is that
In that IAM role that you, that you configure with Cognito Identity Pool,
you add this condition here, which means that the issued credentials only
allow the caller or the frontend to interact with DynamoDB against where the
primary key matches the subject in the Cognito Identity or the token that has
been issued by the Cognito User Pool.
So that way, the frontend can only interact with DynamoDB.
With the user's own data, but not any other users data, of course there's
a lot of ways that this can go wrong.
So this is not an approach that I would advise for Everybody in fact
for most people I would say don't do this because that's Way more
reason not to do this than there are reason to do this Is that definitely
falls into the high risk high reward category in terms of things you can do?
and the main reason to do this would be you're building a side project or side
hustle and You are really cost conscious and you want to minimize your cost to
Now as much as possible and so in that case by removing all the API layer in
front of your data store and allow the front end talk to the database directly
and the way that's safe and secure and you know what you're doing then this is a
potential solution that you can explore.
So that's 12 different things you can do or start different ways to save
money, or rather cutting out waste in your serverless architecture.
And if you want to learn how to actually build a serverless
architecture that's production worthy, then check out my upcoming
workshop, production ready serverless.
Which has just started and you can still sign up now for the next couple of days.
And if you've got any questions, please feel free to reach out to
me and respond to the comment.
And I'll try to answer your questions as much as possible.
Okay, thank you so much for your time and enjoy the rest of the conference.
Okay, bye bye.