Abstract
Be it for data-sovereignty, latency or resiliency, organizations expanding their businesses world-wide are adopting global multi-region serverless architectures.
In this session, you will learn how to improve customer experience on your global services by deploying into multiple regions to reduce latency, and by applying event-driven architectural patterns to increase performance.
See how to use path-based traffic routing to allow for gradual migrations of legacy API operations, how to route requests based on network latency, how to separate reads from writes with CQRS, how to use a Lake House to support multiple data-access patterns while keeping them in sync between regions, and how to simplify the complexity of deploying and maintaining a global active-active architecture by using serverless orchestration across regions.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone, welcome to this session on global active
active serverless architectures. My name is George Fonseca.
I'm a senior solution architect at Amazon Web Services.
Whatever the industry, organizations today are looking to become more
agile so that they can innovate and respond
to changes faster. To do that, organizations need to build applications
that can scale to millions of users while still having global
availability, respond in milliseconds and manage
petabytes, if not exabytes, of data. We call
them modern applications and they cover use cases
from web mobile, IoT to machine learning applications,
but also shared service platforms and microservice backends
and more. One of the main factors impacting global applications
is the end user network latency. In the past,
Amazon has reported a drop of 1% in sales
for each additional 100 milliseconds in load time.
Content delivery networks, or cdns, have successfully been used
to speed up the global delivery of static content,
and these include images, videos and JavaScript
libraries. But the dynamic calls those still
need to be sent back to the back reads. For example,
if you have users in Europe accessing a backend in Australia,
those users will notice an additional 300 milliseconds
in latency, and this isn't acceptable for most popular games,
but also banking requirements or interactive applications.
Therefore, having locally available applications and
content is becoming more and more important these days.
But bear in mind, building and successfully running a multiregion
active active architecture is hard. In this session we
will address one common use case, an HTTP
API processing relational data with heavy reads
and fewer writes. Also, these writes need
to be transactional across multiple microservices as
well as third party services and on premise components.
Other use cases for multiregion deployment include
disaster recovery, where multiregion is a standard practice to
keep your disaster recovery environment on a different region.
Also, data residency, where multiregion is a solution
for compliancy and regulation when you need to keep the
data of your users within the regions of those
users. There's also software as a service applications,
where multiregion is a standard practice for tenant isolation.
And then there are the antipatterns. When should
you avoid using multiregion deployment? Well,
there is the high availability insider region. For these
scenarios you should leverage availability zones,
not multiregion deployment. Then there is the comparison between multiregion
and AWS edge service solutions, and often edge
service is enough to address the latency without complicating
your solution. But I am assuming that you have already analyzed
the pros and cons of these solutions and you have decided
for the strategy of multiregion. So let's move on. The final
goal of the session is to have our applications deployed to
several regions across the globe. These regions will communicate
through interregional secure connections, and the
on premise data centers will have a high throughput, low latency
connectivity to the closest AWS region. Finally,
our users will interact via the Internet with the
application stack at the region with the lowest latency
relative to those users. The first topic to
approach is the connectivity between the AWS regions.
For that, customers can use a managed network service
called AWS Transit Gateway. Transit gateway
connects vpcs to on prem networks through a
central hub. This simplifies networking peering and
acts as a cloud router both within the regions
and across the regions. This interregion
peering uses the AWS global network
to connect the transit gateways together, and this is a fully
redundant fiber network backbone providing many terabytes
of capacity between the regions at speeds of up to
100 gigs per second. Furthermore, data is automatically
encrypted and never travels over the public Internet.
For our use based customers can deploying transit gateway on
each target region and then connect pairs of regions with
the transit gateways. Customers will then connect each transit gateway
with all other transit gateways and the result will be
this full mesh of transit gateways across the globe.
But how about connectivity from on premise data
centers and third party networks to AWS?
For that, we recommend customers to use AWS direct
connect for all production workloads. Direct Connect is
a cloud service solution that establishes a private
connectivity between AWS and your data center or
your office or your colocation environment with
speeds up to 100 gigs per second.
Alternatively, you may use AWS site to site
VPN and AWS client VPN
to establish that secure connection over the public Internet.
To improve the end user experience, customers can also
expose the APIs using Amazon Cloudfront. This is a
fast, secure and programmable CDN. It is used by
customers like Tinder and Slack to secure and accelerate
the dynamic API calls as well as the websocket connections.
It accelerates traffic by routing it from the edge locations
where the users are to the AWS origins using
AWS's dedicated network backbone.
But when you deploy to multiregions,
cloud front might not be enough. Users will have to be routed
to the nearest AWS region and the question is how
do you do that while keeping the same URL? The answer
is Amazon route 53. This is a highly available and
scalable cloud DNS web service. In particular,
we want to look at its geolocation, geoproximity and
latency routing policies so that we can route end users
to the best application endpoints for our multiregion
active active use case customers can combine cloud from
and route 53 to achieve this domain
name translation, but also dynamic content security
and acceleration. Also path based functional
routing and finally latency based routing across
the regions. So let's recap the network architectures
for our use case customers can use transit gateway
to establish the secure low latency connectivity between the AWS
regions, use direct connect to establish a
secure private low latency connectivity between the data centers
and the nearest AWS region, and then use cloud
from with route 53 to route end users
from the edge locations to the best AWS region based on network
latency and with the network connectivity sorted, the next
step is to keep the data in sync between the AWS
regions and for that there are at least two approaches for
this cross regional data replication, the synchronous
solution and the asynchronous solution. With synchronous replication,
write requests need to successfully replicate across regions so
that they can be acknowledged back to the application. This ensures
consistency across regions, but creates a dependency on
other regions, so if one region fails, they all
fail. With a synchronous replication, write requests are
successful if they are persisted locally only
and the cross regions replication is deferred by milliseconds or
until the target regions are available.
So with this we overcome the regional outages,
but we also cause writing conflicts between regions.
Nonetheless, we will follow the async approach for
our use case and we will see how to solve these issues.
The first crossregional database engine to consider is
Amazon DynamoDB Global Tables DynamoDB
is a serverless key value and document pathbased that
can handle up to 10 trillion requests per day and support
peaks of up to 20 million requests per second.
The global tables feature provides a fully managed
automatic table replication across AWS regions
with single digit millisecond latency. Unfortunately, it does
not fit our use based because we have the requirement for
relational and transactional data. The second
cross regional database engine to consider is Amazon Aurora
Global database. Aurora is a MySQL and PostgreSQL
compatible relational database built for the cloud. The global
database feature, which is available with Amazon Aurora serverless
version two, allows a single database to span
multiple AWS regions with subsecond latency to readonly
replicas on those regions. In the case of a regional outage,
a secondary region can be promoted to primary region
in less than 1 minute. This results in an RTo
of 1 minute, which is recovery time objective and
an RPo of 1 second, which is recovery
point objective. Finally, customers can consider Amazon
s three replication. Amazon S three is an object storage
service used to store and protect any amount of data for a
range of use cases, and these use cases include
data lakes, websites, enterprise applications,
IoT, and big data analytics, just to name a few.
The S three replication feature allows data to be replicated from
one sourced bucket to multiple destination buckets, with most
objects being replicated in seconds and 99.99%
of objects being replicated within the first 15 minutes.
So let's recap the considerations for cross regions
data replication. We looked at Amazon DynamodB, Amazon Aurora,
and Amazon S three. Each of these services offers crossregional
automatic replication features using serverless technology,
so now it's time to put them to work. For that,
I will introduce the command and query responsibility segregation
pattern, or cqrs for short. This architectures pattern involves
separating the data mutation part of a system from the query part
of that system. In traditional architectures, the same data
model is used to query and update the pathbased.
That is simple and works well for basic crude operations.
But our use case is asymmetrical with heavy reads
and fewer writes. Therefore, it has very different performance
and scalability requirements between reads and writes.
Customers can use cqrs to perform writes onto normalized
modules in relational databases and then perform
queries against a normalized database that stores the
data in the same format required by the APIs.
This will reduce the processing overhead of the reads while
increasing the maintainability of complex business logic on
the writes in AWS. The CQRS pattern is typically
implemented with Amazon API gateway. In this scenario,
mutations are post requests processed by an AWS
lambda function that calls domain specific services.
A denormalized version of the data is then mirrored
onto DynamoDB so that subsequent queries are
performed by get requests reading the normalized objects
directly from DynamoDB tables. The asynchronous version
of CQRS pattern implementation adds a queue to
the writing operation. This will allow for long running writes
with the immediate API responses back to the clients,
but now reads have become more complexity because the write
duration of the requests is unknown. The solution
is to notify the clients using the websockets feature
of API gateway, which is invoked by a lambda function when
the dynamodB tables are updated. Another way
to implement CQRs is to use AWS app
sync, exposing the API as a GraphQL
API. This definitely simplifies real time updates
because it allows for no code data subscriptions
directly from DynamoDB, and it also reduces the
implementation complexity by allowing front end developers to
query multiple data entities with a single graphic
QL endpoint. When applied to our use case,
this results in a multiregional architecture where
route 53 routes requests to a regional
API endpoint based on latency, and the data is
then queried or subscribed to from DynamoDB global
Tables in sync between or across the regions
AWS. For the migrations, they are limited to the
primary region only and mutation requests on
the secondary regions are forwarded to the primary regions.
This pattern is called read local, write global
and the advantage is that it removes the data conflicts across
the regions at a cost of added write latency on the
secondary regions, but that has minor relevance to
our use case. Our last step is to ensure data
consistency across the data persistence layers of
the application's microservices, considering our use
based dependency on third party services and on premise
components for that, I will introduce the saga pattern.
This is a failure management pattern to coordinate transactions between
multiple microservices. In AWS. The saga
pattern is typically implemented with AWS
step functions. This is a serverless orchestration service
that lets you combine and sequence lambda functions and
other AWS services to build business critical applications.
For example, for a shopping cart checkout operation, the application will
first need to pre authorize the credit card that is successful.
Then the application will actually charge the credit card,
and only if that is successful will the customer information be
updated. So by using step functions, each step can
follow the single responsibility principle. While all of the plumbing
and the failure management and the orchestration logic are
kept separated, going back to our use based architecture
where we have the lambda functions performing the mutations step
functions will now take their place. As for asynchronous migrations,
customers can leverage Amazon Eventdriven. This is a serverless event
bus for building event driven applications at scale,
directly integrating with over 130 event sources
and over 35 targets, and it's now time to bring
it all together into our final architectures. So in this
application, route 53 plus cloud from will
route the requests from the edge locations to the regions based on
latency. Appsync will then query and subscribe
the normalized data from Dynamodb.
Applying will also orchestrate the mutations, but only on the primary
regions. An event bridge will serve the event driven asynchronous
mutations. Finally, the data layer will be automatically
synchronized between the regions. Upon a
secondary region failure route 53, health checks
will divert traffic automatically to other regions, so the
affected users within the lost secondary
region will only notice an additional network latency
and when that affected secondary region recovers, data will be
automatically resynced and the secondary regions
will then be able to resume services the users.
On the other hand, upon a primary regions failure,
the application will enter a readonly mode. If the expected
duration of the outage is acceptable by the business, then the
application may continue to run with limited functionalities
until the primary region recovers. Otherwise, a secondary region
should be promoted to primary at this point by activating
the failover or the disaster recovery procedures. This reference
architecture is publicly available on the AWS Architecture
center. The PDF version includes further details
and you can download it by scanning the QR code on the bottom
left corner of this. Slide it finally, there are some additional
resources around multi regional architectures at AWS
for you to explore. Take a look at these links and I hope you
have enjoyed this session. Thank you for your time.