Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi and welcome to our presentation and demo about building
a self service database as a service for your internal
developer platform. My name is Dan McKean and I'm a product manager at
Mongre DB, and I'm joined by George Hanzaris, who's an engineering director
also at MongoDB. We're responsible for enabling our customers
and users to run MongoDB in two ways, self hosted in
Kubernetes, using our enterprise or community Kubernetes operators,
or using our Atlas Kubernetes operator, which is designed
to manage and configure our atlas developer data
platform. In this we're going to cover a range of non
MongoDB specific considerations, starting with Kubernetes as
the platform of platforms, building and managing a dbas
in your internal developer platform, also known as an IDP.
Why? To build a DBaas and the risks and the criticality
of enabling self service in a dbas. Then we're going to use our
own Atlas developer data platform and our Kubernetes operator
to demonstrate how this can work, covering what Atlas is and how
our operator works and how to put it all together
in theory and in a demo. We're going to start with Kubernetes as a platform
of platforms. According to internal developerplatforms.org,
95% of internal developer platforms idps
are built on top of Kubernetes. Many of you already know of
Kubernetes as an open source container orchestration system for automating,
application development, scaling and management. In recent years,
Kubernetes has become nearly synonymous with container
orchestration and with so many services being built as microservices
and designed to be automatically scaled containers and kubernetes
are reigning supreme. So what does Kubernetes offer? It offers
highly flexible networking, including options like directly exposing
pods, load balancing connections and ingress services.
Storage orchestration to provide either ephemeral or
persistent storage, high availability and high levels of resiliency
by making it easy to deploy many copies of a service across many physical
or virtual machines. Self healing by monitoring the state
of objects in Kubernetes and keeping those aligned with the declarative configuration
and a low degree of vendor lock in. Thanks to the many standardized
flavors of kubernetes available for either self hosting or
as a cloud based platform, as a service and finally, and arguably
most critically, when it comes to an internal developer platform,
it provides a high degree of customization and extensibility,
particularly in the form of Kubernetes operators and custom resources.
An operator extends the native Kubernetes control plane
with custom logic that helps manage a lot of the essential tasks that
are bespoke to a specific product. Like MongoDB, it's usually paired
with custom resources defined in Kubernetes using
custom resource definitions. These custom resources
allow for the creation of new types of kubernetes objects, which can be
monitored by the operator and allows the operator to take action.
The actions taken can vary massively, but some of the common ones include
deploying an application, taking a backup, upgrading an
application, or exposing a service to applications that
do not support the Kubernetes. API. Operators can also be
used to manage resources external to kubernetes.
Most commonly, this is done by using the APIs of the external service
in which the objects are actually being run. Custom resources within
the Kubernetes cluster can be used to represent the desired configuration of
those external resources, allowing the operator to monitor the custom
resources and then use the external service APIs to make
the required changes. The value here is that those custom resources can
then be managed in the same way as the services in the local Kubernetes cluster,
benefiting from using the same tooling, processes,
permissions and automation. Now we're going to dig into
databases as a service within an internal developer platform,
but first a brief recap on what an IDP is and offers.
Internal developer platforms are built to enable developer
self service of platform infrastructure.
They're typically built by an Ops team and used by developers.
They provide a common process and method of engaging with the platform,
often via templates. This automates recurring tasks such
as spinning up environments and resources, and helps enforce
standards such as security requirements. IDPs often abstract
away the complexity of the underlying platform technologies,
saving everyone from needing to be an expert. Development teams
can gain autonomy by being empowered to spin up fully provisioned
environments and manage them with a minimum of effort or complexity.
Idbs can be built or bought, or some combination of
both. A dbAs or database as a service is often
one of the most critical components of an internal developer platform.
Most applications need a database at some point, and databases
can be some of the most complex services to deploy and manage.
A company's choice of database can make a dramatic difference on
not only the success of the application, but also on the speed,
success and happiness or unhappiness of a development team.
All this makes simplifying the consumption, use and management of databases
incredibly valuable. This is especially true for day two operations
such as upgrades, where developers can be spared a huge amount
of ongoing work through the centralization and automation that a DBAs
can offer, especially when a Kubernetes operator has been used to
handle those sort of day two operations. But building a
database as a service is not without risk or complexity.
Databases can vary a lot, even from a single manufacturer,
and one of the key questions to identify is how much customization and
configuration to expose to development teams. Security sizing,
performance, backup, sharding and resilience are all major considerations,
and that's without taking into account any of the specifics of the underlying platform
technologies that underpin the IDP itself. We see
many companies turning to fairly strictly defined templates that predetermine
many of those things and give minimal customization to the end users.
An example of this could be t shirt sizes for the database deployments,
with guidance about which sizes suit which use cases,
troubleshooting is often a challenge. Security best practices encourage
a minimum number of people to have a minimum level of rights
and permissions. But how do you avoid a central team being a
blocker to development teams? Many companies opt to have
far fewer restrictions on preprod environments. This enables
developer to try new things and have some hope of fixing it when
it goes wrong. But for production environments, this is often heavily restricted
as far more damage can be done by a wrong move. This divided approach
works well to allow a balance of self service whilst protecting
production services. Both of the above items touch on the topic of
balancing developer empowerment with central oversight.
Self service is nearly always faster, as we haven't got to wait for
someone else to become free to do what we need them to. It frees
up any central teams to deal with support and improving the services
of the IDP or DBaas. Self service empowers users,
particularly by allowing us to try new things without worrying
about wasting someone else's time. There's a few common methods for
achieving this publishing assets such as helm charts that users
can then customize and deploy themselves a Gitops workflow
where the configuration of all resources, whether local or remote, are stored
in a git repository and tools such as ArgoCd or flux are
used to deploy those resources in kubernetes, a portal or marketplace.
Even further abstracting the complexity and allowing users to see what's possible and
select what they need. All of these have tradeoffs, in particular
in terms of the level of investment and maintenance for a
central team versus the level of knowledge needed by the end
user. So now we've seen the
importance, we've seen the value of building a database
as a service offering for your internal developer
platform. But we've also seen the difficulties and
we've also seen the importance of making these
platform features available through a self service approach
and we've seen possibilities of how you can do that.
So let's now explore the tools and
the architecture of how we can actually go on and
implement this.
So initially the first
step on a really high level, the first step is that
we want each developer to be able to define what
database requirements they have. The second
step is we want this definition to be
translated into resources that our platform can
understand. And then finally we
want to give our platform the ability of deploying and managing
and managing databases.
Now let's start looking into the tool. So initially we're
going to look into the Kubernetes operators we're going to be using.
So on a very high level, the user defines
what type of database deployments they need
through a Kubectl command. They apply this to a cluster and then
the operator in that cluster makes the necessary API calls
in the Atlas API node to deploy those managed
databases.
What happens now under the hood is
that you define a new custom resource and we're going to see
in a bit more detail what that is exactly. And that custom
resource is managed by the operators. And then the operator
interacts with Kubernetes and makes the necessary
call, sees what is the current state, what kind of
adjustments the Kubernetes states needs, and then goes on
and creates this custom resource.
And the custom resource here we can see is the Atlas
deployment resource. And to define this, it's pretty straightforward.
You can just add the name you want for the database, the instance size,
the provider and the region you care about
and that's pretty much it.
This is a good way, this is an easy way to deploy and manage databases.
But still, in this scenario, using just the
operator, you would need the developer, you would need anyone who needs
to spin up a database to have this definition
file locally, this yAml file locally, and you would have them to run
Kubectl manually and deploy the database.
So what we want to do is to automate this process.
And instead of having this YAML files locally,
instead of running commands locally, we would like to do
this in a kind of a different way. So we
have a developer, the YaML is developed and
then the YaMl is pushed to a repo which is
specifically designed to have our infrastructure as
code files. From that point what
we do is we pull the files from this repository, ArgocD pulls
those files and ArgoCd is responsible for
applying the changes in our Kubernetes cluster.
And the way it does that is by creating a simple application
like this, we define exactly the repo URL that we
want Argocd to be watching, exactly which revision,
and we see again the sync policy, whether we want automated
sync and some other conditions.
So let's kind of put all of these together and see how this
is going to work. So to get started,
you need to set up some prerequisites. Initially you
would need an Atlas account and an API key.
You want to run in Kubernetes cluster and you want to
install the Atlas Kubernetes operator in that cluster.
You would then go on and install Argocd,
create a dedicated infrastructure as code repository,
and then create an ArgoCD application towards that repository.
And finally, this is what our
service database as a service looks like. So initially the
developer develops the file, pushes it in the
git repo, when usually we would have a pr open.
At that point when that's merged to the specific branch that we
have Argo looking, then Argo would be triggered,
it would pull the changes, apply the changes in Kubernetes,
and then as a new custom resource
is deployed, then the Kubernetes operator would take on
and of call the Atlas API and create the resources,
the users, the databases that we need and so on.
And this is pretty much how our database as a
service offering for our internal developer platform
is going to look like. Thank you for watching.