Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. Thanks for joining the session today, where we'll be
talking about the genuine hat at Mercloud of implementing a
multitenant front end architecture for our ecommerce
platform. But let me start by introducing
myself. My name is Gilermi. I'm the CTO and co
founder at Mercloud, currently based in London,
and I've been working for 17 years in the industry, most of
those as a software engineer. If you'd like to reach out to
me or simply follow the content
I'm always sharing online, you can find here on this slide the links
to my social media. Also, let me
talk to you about Mercloud. What we do here,
we develop an ecommerce platform that specializes for the b two
B market that handles all the complexities of
sales between companies. We work with companies
across different industries to help them to provide a digital
sales channel to their customers where they can
have access to the catalog of products and also make
and manage their orders. If you're interested to
know a bit more about our company and the product we built,
we can check that out on our website, mercloud IO.
Or you can also reach out to me on my social media.
All right, let's start understanding what exactly is
a b two B ecommerce? When we talk about ecommerce, the first
thing that comes our mind is the traditional b to C model
where I'll give you an example. I want
to buy a product. I go to an online shop,
browse some products, compare different options,
prices, add to my basket, make the payment, and after a
few days I get it delivered at my doorstep.
On a B two B scenario, the actor behind the
purchase, it's a person on
behalf of a company. Let's say you
have a store where you sell electronics. So you need to constantly
reach out to your suppliers to buy
products to refuel your stock as you're selling your products.
That's a B two B ecommerce scenario,
where the goal here is often to establish
a long term relationship with your customers rather
than a one time transaction. If I
go online and buy a television,
I won't be coming back to that same online store
to buy another television in a couple of years.
But if I'm refilling stock for my store,
I'll be using that very often.
On a b two B ecommerce, we need to
handle complex topics like
customized pricing models based
on customer profile. And the profile
here can be the geolocation, the customer tier.
You might have a customer that's vip where you apply special
pricing to it. There's also complex tax regimes
where the policies and the rates are based
on the product and the customer who is buying.
I might have a product that has a reduced VAT
rate, for example, or it might have
extra taxes applied on top of that. And also
because we're talking about refilling in stock, we're talking about
big transactions. The volume of these orders are quite big
and they usually involve multilayered approval
process. People need to approve each transactions
in the chain. And who exactly
are these customers of such a solution? And to understand who
they are, first we need to understand how the product supply chain
looks like, from getting the raw material
to the manufacturer and then distributing
the products to retailers and wholesaler suppliers
who will then sell these products to the final customers.
And all these transactions we see here,
before reaching out the final transactions, they are b
two b transactions. And here is where b two B ecommerce
is used. The final one is the traditional one that we
know that's a b two c. So that's another
type of ecommerce that we're talking about.
All right, so let's see where we started.
The first version of our solution was a very traditional react
application where we wanted to take advantage of things like
server side rendering and caching on the CDN
level. And the architecture of
our mvp looked like this. We had it
deployed on AWS. The react
application was running on a Fargate cluster,
and in front of the load balancer we had a CDN.
And also we had our static
asset, not only Javascript but also images
of products hosted on s three that we could serve
to the users. That was a quite simple architecture,
but there were constraints. We had to
replicate this stack once per customer
and that was causing us to have high costs. Each tenant,
they had to have their own infrastructure with their own resources,
and each of these resources they had to be sized
to accommodate the demand of each of those customers.
This was also making our onboarding process rather complicated,
not only deploying a new stack per customer,
but also configuring it. And you can see that this
was getting harder. By the time
as this architecture was growing, it was harder to get it
fully automated and also was challenging us to deliver new
features with a very slow deployment process because
we had all these tags to maintain to update
and on top of that was also difficult to
monitor. And we also had some poor performance numbers.
Even though this was not perfect,
it was pretty okay for our mvp and helped us to onboard our first customers.
But the next step was for us to think
about the next generation of this application and reimagine
the architecture. And we wanted
to rebuild an application, not only to modernize it,
but also to support increasing customer base.
We had, and the main points we wanted to
focus on this re architecture were increasing
the scalability and reduce the management overhead,
have less things to maintain and
configure. And also we wanted to have an architecture
that would allow us to have quicker deployments and also
a faster onboarding experience.
We also wanted to reduce the latency of
our customers. We had customers spread across
different regions in the globe. We have many customers
in South America, us, and also few in Europe.
So we wanted the data and the
service as close as possible to our customers.
We also wanted to increase the observability
to understand better what was going on on our application.
The solution to simplify this architecture was
then to transform it into a multitenant one.
Okay, what exactly is a multitenant architecture?
It's an architecture where you have a single instance of your application serving
multiple customers. And these customers, they are known as
tenants. This model will help you to maximize
the resource utilization in an efficient way,
because you're sharing the same infrastructure with all
your customers. Here we have an example where
we can compare the two different models. The single tenant one on
the left, where each of our customers are
tenants, they have their own installation, their own instance of the application,
and that's talking to their own data.
On the right one, we have the multitenant one,
where you can see that we have one single instance of the application
serving everyone, but we can still keep
that data isolated from each other.
And talking about data isolation, this is a quite complex
topic, but it's something very important that
you need to consider when building such a solution.
We need to pick an isolation strategy,
and you can start from a fully isolated
model where everything is
isolated from each other, they don't talk to between
them. Or you can go to a fully shared one
where you have instances of, for example, your database, your data lake,
and you're sharing that with all your customers. In the middle,
you can have also a hybrid model where you share
only a few of these resources. You have
to pick the best strategy. But this is not only based on your
own needs, this will also be based on your tenants
needs. For example, compliance,
GDPR. So this is something you really need to consider
and pick the better strategy for that.
Let's talk about the benefits of a multitenant architecture.
First, it's cost efficient because
we have resources that are shared among
those tenants. We no longer need to deploy
a different stack for each of them. It's also scalable.
You can scale horizontally to accommodate the increase
of demand, not only for new
tenants, but also the increased demands of your current
ones. Is you also
have usually one single pipeline that will
handle the deployment of this whole architecture. So this
allows you to be more efficient and deploy quicker your changes
and also helps with security and compliance
because you end up having a centralized management
solution with uniform policies that are just applied
across all your customers.
This kind of solution also increases your developer productivity.
Like I said, you have one single pipeline that you
handle the full deployment and in most cases
you also end up with a single code base. It's easier for you to iterate.
That also increases the business agility. It will
allow you to adapt to new demands in a
rapid way and also quickly launch these new features
in a short period of time. Okay, so let's
talk now about the technology choices we had for
this project. To develop it, we chose
NextJs. And the reasons why we chose next JS
initially because it has a great
dev experience with zero config
in a matter of minutes. You can clone a template
repository and start coding and deploy it
in a matter of few minutes. That's great. There's not
much management involved to it. It also comes
with a simplified routing solution building
and also other tools that helps on your
day by day as a developer. For example hot code reloading.
It also comes with rich building features that helps you
with server side rendering and also static generation.
Also because we already had react expertise in house,
was easier for us to stick to the ecosystem and
just keep using react.
Next JS also comes with some
performance optimization out of the box. For example,
automatic code splitting. That's something that you usually
would need to do manually with webpack. It also
comes with image optimization and
URL prefetching. And not to mention
that they also have a big community and great documentation.
It's very easy to find resources online and
examples. Okay,
so we chose next JS to build it and
to run it. To deploy it, we decided to
use Versaille. First reason is that Versaille is the company
behind NextJs. So we can expect
this marriage to take the most advantage
of both solutions. But not only that,
we also wanted to take advantage of compute
at the edge. The global edge network will allow
us to deploy to multiple locations and
this will come with multi availability zones and
automatic failover out of the box.
So this way we can ensure that our application
will be running as close as possible to the geographical
location of our users. There's also no infrastructure
as code to maintain, it's just code. All you need to do is to connect
your GitHub repo to Versaille, and then you get automatically
deployments with cache invalidation. And my favorite
one that is the preview deployments. How do they
work? Every time you create a new
branch on your repo and you push changes to it,
Versailles will create a new isolated environment,
deploy that code and provide you a temporary URL
where you can use to validate your deployment and also for testing.
And also, another thing that Versailles
helps us is to stick to our serverless first approach
that we had. Mercloud. For those
who are not familiar what serverless is, it's a
way to run your application in the cloud without the need of servers.
There are servers, of course, but you just don't need
to manage them. You have your small units
of code that are your functions
and they are triggered by events, and you only pay
for what you use. So if you have an idle application
because you have few customers, it's overnight, the weekend,
so you're not paying for it because you're not consuming.
And serverless comes with some good benefits.
It helps you to focus on business logic and let the
cloud manage the infrastructure for you. This also
increases your team agility, and not
to mention that you have automatic scalability.
The cloud will manage that for you. You don't need to worry
about this. Usually, of course this
is not a universal solution, but works pretty well in many situations.
All right, so after the technology choices, we came up
with a draft idea of how our new architecture
would look like. So we can see here that we
have a frontend sitting behind the CDN.
So this is running as close as possible to the users.
And this next JS application will be
communicating with our API. So this API
was already built and deployed and
hosted to AWS. So all we had to do was to consume
it. But before starting to code anything, we had
to solve some challenges. And the first one was how could
we identify each of our tenants?
It's pretty common to see out the SaaS products
to handle multitenants by providing each of the tenants
a subdomain. So every time you need to identify which tenants belongs
to a particular request, you parse
the host header of the request and then you can
simply extract the tenant identification from
that domain. But in
our case, our customers are exposing this application to their
own customers. So we wanted to allow them to configure
and run it with their own domains. So this approach
will no longer work because we cannot just parse the URL
and extract that tenant identification from it.
We also have the situation where we need to handle
multiple domains pointing to the same tenant.
And to handle this we will need to have a mapping
table where we can correlate which domain
belongs to what tenant. So every time I need
to identify what tenant request belongs
to, I can simply do a lookup on this table and
get the correlation there. And the
way our tenants they can configure their custom domain for
this setup is by using your admin application.
Once they configure a domain, we save this information
on that mapping table I just showed you and this
will trigger a routine that will configure this custom domain in Versailles
using the domains API. What exactly is this domains
API? If you have used Versailles already
and you go to the settings of your project, you see there are a tab
where you can configure custom domains for
your project. And this is the same API we're
using on the solution. And once you link
a new domain, you need to somehow validate
that you own that domain. And the way you can do this with Vercel
is by creating a CNA entry to the DNS of configuration
of the domain and then you have your traffic redirect
to that installation to that project. You might
be wondering if this API, these domains APIs of
Versailles, has any limit. And if you haven't used this in the past,
you probably have heard or faced the issue
where there was a limit of the number of domains. You could point
to a single Versaille project, but this is no longer the case.
It's been almost two years now that this
limit has been removed. So now you can use unlimited
domains on a single project.
All right, so here we can visually see what happens when a user
makes a request to an application. The user
will type the URL into their browser and then the browser
will reach out to the DNS server over the Internet
to find out which IP is linked to that
web address. And then once it knows,
the IP will do the request to the correct server.
This is a very simplified overview
of this process and we know it's in reality it's
a bit more complex than this, but this illustration help us
to understand the basic flow of this dense
cool. But on a multitenant scenario we will
end up with multiple domains resolving to the same ip address.
And then once our application receives this traffic, we start
questioning all right, so what tenants belongs
to that domain? The domain of this request I
just received and to solve this problem,
we need to add some intelligence to our application to be
able to resolve this information and tell what tenant belongs
to that particular request.
For this we use the nextjs
middleware middleware. They allow you to
run code before a request is completed.
So this sits in front of your application, runs in the edge,
and you can use it to modify
the request and the response by doing, rewriting,
redirecting or simply modifying the request headers.
And here's an illustration of how we
do this in Mercloud. So our middleware will be
responsible to extract the host header of the
request and then do a request
to our API. And this API will do the
lookup on that mapping table I showed you before.
And once it knows what tenant that domain belongs
to, we inject a header on that request.
So now every time my application needs to know
which tenant that request belongs to, all it needs to do is to check the
header on the request. And this is how
our middleware implementation looks like. You can see
here that we extract the host of the request, we make
that fetch request, and then if that result that succeeds
we simply inject the tenant in the header
online. 13 one of the questions that people
usually ask about this is is it really performatic?
Is a best practice to do fetch requests on middleware?
Of course, everything you do here just
adds to the latency of that response
and something that we recommend here.
If you're doing any API fetching requests
here, you should be caching this response
so you can use something like a key value store
on the edge as well. So you only reach out to the
real API if you don't have that information cache. So this
is a good performance tip I can give you here.
All right, and the next thing we had to think about was
how to do the routing of our application. But first let's
understand how the built in router of NextJs
works. Next uses a file system based router
where folders are used to define the routes,
and files are used to create the UI that's shown for
that route segments. We can also use some
special notation to define dynamic route paths
that are based on a path parameter.
So as we can see here on the products,
and once that is compiled you
get nested routes with a path parameter.
This router is pretty simple to use and allows us to
do caching as well. And the way we can do
caching is a user makes a request to
a page and that gets rendered on the server side.
And before we return this response to the
user, we will cache that application, that response, that output.
So the next time a request is made to that same URL,
we can serve that cache content. So I
don't need to regenerate the page. But here on
the cloud we do something more sophisticated.
It's called incremental static regeneration.
The principle is pretty much the same, but you can also set a
TTL on that cached response.
So next time a user makes a request to that
same URL and this TTL has expired,
we will still serve that old stale version
of the page built in the background. It will trigger a process
that will refresh the cached
content of the page. And next time a user comes and
make a request to that same URL, then you'll
be served with the new version. This works
pretty cool, it's great. But let's bring this to the context
of multitenancy. So I might have
user one here that belongs to tenant a and you
make that request, so you get served
the old version of the page and the background process
will be triggered. And now I have a user two that comes
and access that same URL. And the
question here, what version of the page will
be served to user two?
The answer here is user two will be served the version
of the page that was generated for user one that belongs to tenant
a, a different tenant. So this is
not good, because now we're mixing content of two
different tenants so they have private
data, they shouldn't be shared, they should be isolated.
But we having the risk here of sharing the
wrong content of that page to the wrong user.
How can we fix this issue? Is there a way to fix it?
The first step here is to look again how we
structure the routing and think, how could I
make each route be tenant aware so it knows
which tenant context it
belongs to? And the solution here is
to add a dynamic path segment
to the very root of our router,
so every route underneath it,
it's under the context of that tenant.
So now I can say that safe
to cache any content, because even if a
different tenant ends up with a
request on the same URL,
I know that my content is cached in a
different path segment in a different context.
So we will avoid mixing cached
resources from multiple tenants.
This is how this routing configuration will
look like in the URL. We can clearly see that
now that we're adding a new path parameter
to a route. The tenant identification will
be shown here. And this is not
something we really want because remember,
we giving our tenants the possibility to use their own domains on
the platform. So why do we still need
to identify put an identification
on the URL? For sure we can improve this. There is quite
a long discussion about this topic on the next Js GitHub that
took quite a while to get an official answer on how to
solve this, and the recommendation is to use
some sort of identification on the route, like how
we did. And here on this example they're suggesting
you to use the hostname of that request.
And this does exactly the same way we
do with the tenant. But here they using the hostname
of the request. It works exactly the same
way because it's a unique identifier for each
tenant. And also yes,
you see that on that thread that they mentioned that this
will be reflected, this will be shown on the URL,
but luckily there is a solution for that. We can
use URL rewrites to handle that dirt job.
So the rewrite will be responsible for adding that
identifier to the router, but we will also mask the
URL that's presented to the user.
So the request is still routed
to the correct segment, but it's simply not
shown on the browser for the user. The way you can do
URL rewrites in nextjs is by setting
these rules on your next config
file, or you can also use midos
to do that if you want. And after we
apply these changes, the rewrite changes. Here are the results.
So we no longer have the tenant identification
on the URL path, but the
request still being routed to the correct path segment.
I've prepared a quick demo here to show you this working,
so let's hope everything works
fine.
So let me change my screen.
So we have a repo here with
a very simple nextjs application.
You can see here in our router we have that
dynamic route that represents the tenant,
and we have a page here. And all this page
does is it makes a request to this time
API that will return us what the current time is.
And you'll print this on the screen.
So we'll print hello, we'll say
which tenant that
session is and we'll print what time this page has
been generated. We also have a middleware
here where based on the host header
of that request, we identify which tenant is.
So this is a very dummy example here. I'm just checking if we
have tenant a or B, and then we're setting it,
and otherwise if it can resolve that we'll just set
as a default tenant. And also in
our next config file we have
the rewrite rules for it.
So you can see here that we get pretty much
anything on the request and then we'll proxy that
to a tenant path and the
tenant will be extracted by this header,
the x tenant. And this is exactly
the header we're setting here. So if
I run this application now running
and I go to my browser and I do localhost,
you can see here that, okay, I got a hello word,
the default tenant and the
date and time that this page has been generated.
And if I refresh, you can see the time is
not refreshing. So this proves that I'm serving that cached version.
But how can I identify multiple tenants here
based on a domain if I'm using running
this on localhost? So what I've
done here on my machine, I created two
local domains that they point to local
host, so I can use them to simulate
other domains. So if I access that,
my application, okay, I forgot to
set a port. All right, so you can see here that
now it's able to identify which tenant
that request belongs to, the domain,
and the time that page was generated.
And then if I do this the same with the other
tenant, you see now that I have tenant b
and this is the time that the page was generated. So you
can see here for each of the tenants, including the default,
it's a different time that the page was generating.
And if I keep refreshing this, I'm getting
served that cache version of the
page. And on this
solution as well,
we setting a TTL of 60 seconds.
So if I come back to this page after 60 seconds
and do a request, that background
process of regenerating the page will be
triggered. And then if I refresh the browser again,
I'll get a new version of this page.
And it's what happened here with the first request we
can see that it just got updated again.
All right, so that was the demo. And this
is a public repo. You can find it on
this URL or QR code and you can use
it to create a multitenant application as a template to
create this kind of application.
And there you'll find two branches on
this repo. The one I showed you is called using
middleware. So we solve this problem
by using the middleware like we do at Mercloud, but you also have
on the main branch you find a solution where we
simply use the host header of
the request and identify different tenants. So we don't need a
middleware there. Feel free to use
this repo and raise any pull requests of improvements.
If you have any contribution will be very appreciated.
All right, but if today you do a research
on how to built a multi tenant application on NextJs,
we'll quickly hear about the Versailles platform starter
kit. It's a template for a
full stack next js application with multi tenancy and
custom domain support. This is great,
but well, a bit too late for us.
When this came out we had implemented
our solution, but even though we said okay,
let's check it out and see how they implement it and how
they solve these problems we had. And then we
found out that they do the domain
based routing.
That's pretty much what we do with the tenant.
Slightly different way they do URL
rewrite using the middleware. We use the
next config file for that. And one
of the reasons we don't use middleware for that is
that the first iteration of this application,
we made it with next ten version
ten midos weren't a thing yet.
So once we migrated to the latest version of NextJs,
we didn't bother refactoring this part of the rewrite.
That's the main reason why we don't use middleware. And we
found out on this solution that they also use the Versailles domains API.
So we're pretty happy and
we thought, okay, we did a great job here. We didn't
do anything that was completely different from their solution
and where we landed. So this is a high overview
of our architecture today. So you can see that the front
end is hosted on the Versaille infrastructure where
we didn't cover on this talk, but we handle authentication with
off zero and the middleware talks to
our API on the back end. We have everything hosted on
AWS, but we
have multiple versions of this API hosted in different
regions. That's for compliance and data isolation
for our customers. But we have one API,
that's the tenant API that runs on a global region, and that's the API
that the midower uses in order to
do the correlation between the host header
of a request, the domain of that request, and do
the mapping with attendance. All right, the outcomes of
our implementation we got great improvements in performance.
We're taking advantage of the edge networking
and the CDN caching.
Also because we're running on the edge now, we reduce a lot the latency
so the servers are much closer to our users. We also
increase the agility of our dev team.
We no longer need to maintain a very compliant infrastructure
and multiple deployment pipelines. It's much easier today
for us to build and release new features without
any overhead. And today is also much
easier for us to onboard new tenants and this process is fully
automated. All we need to do is add
a record to an admin
system. We had that and automatically we just
create all the resources required for that tenant.
And the lessons learned from this journey for us was
the first one. Always look
into adapting tools and technologies that will help you
to focus on business value rather than
having to spend days, weeks of
the time in the beginning of the project just to set up
a very complex infrastructure and structure
of your code. Look into adapting these
tools that with very minimal effort will
allow you to jump straight into coding
and you can easily deploy them.
Also, think about your users,
they want the best. Experience and performance
is the main thing you need to consider to achieve this.
You want to serve pages as fast as possible to your
users. And for this you need to take advantage of
things like gen stack and incremental static generation
like we do here, mercloud. And just be careful
about server side rendering. Anything you do that will
slow your page load because it needs to be re
executed on every request. And this will
make much difficult, much more difficult for you to
cache the response. Also,
observability, it's a must and it
also needs to be tenant aware. So create
consumption metrics that will help
you to identify who's
using what and how much of that are they using.
So when you're monitoring the health of your application, you can
easily identify who's using more resources.
And remember, each tenant
you on board will bring with them a different Persona
and also different usage patterns. So you
have very small tenants that
they don't require much,
but you also have big ones that will bring
a huge demand to your application. So you want
to easily identify who the noisy tenants are.
Let's say one of your tenants is going under a DDoS attack
and suddenly the performance of your whole application is
being impacted. And this is impacting other tenants.
So you want to identify who the noisy tenants are there.
So you can quickly identify and mitigate any
bottlenecks that are being caused by them.
And also to wrap it up. A very important
recommendation we can give you is don't do early
optimization. You probably get it wrong and you
have to redo it later.
Use metrics to drive it. So first have
the problem, and then the metrics will tell
you where your problems are, where your bottlenecks are,
and then you can use this information to attack the
problem once you have it, instead of trying to
guess what your future problems will be.
Yes, and that's the end
of the session today. Hope you have enjoyed.
Please feel free to reach out to me on my social media and
also check our website at Mercloud IO. Thank you very
much. It was a pleasure to share this with you.