Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone.
My name is Piotr Zaniewski.
I'm a head of engineering enablement at Loft Labs.
And today I will talk to you about architecting.
developer platforms.
We'll focus specifically on infrastructure developer platforms.
We will talk about design principles and tools used in building those.
This will be heavily Kubernetes centric.
And I also have a really cool demo.
So a little bit about me.
I specialize in all the cloud native ecosystem, Kubernetes, Docker.
Linux and so on.
I typically spend way more free time on tweaking my dot files and
working with NeoVim than I should have, but it's a lot of fun.
As I mentioned, I work at Loft Labs.
Go ahead and check them out if you would like to Work with the
cluster or other cool projects.
the easiest way to contact me is to go to my web page cloud rumble dot net or
simply, send me an invite on LinkedIn.
So before we jump into the building of the platform architecting it, let's
actually talk about what platform is.
And the way I like to do this is like contrasting it or putting it in
the context of existing knowledge.
So for me, there are three kinds of platforms.
One is a business platform.
If you have used an application like Uber or DoorDash, those are
really platforms for consumers.
They enable both consumers and vendors to connect and sell their services.
So that's it.
That's a platform that is essentially a product that is consumed by the end user.
Another type of platform is the domain specific platform.
So those are typically services that encapsulate cross cutting
functionality for user facing services.
A good example is if you have, let's say, An application that has
a map functionality would like to put this map functionality like
geofencing, translating addresses and so on into a separate service.
And then this would be your platform.
The developers can use.
And finally, we have a third kinds of platforms, which are
domain agnostic platforms.
So this is the one that we are going to talk about today.
And the way to define those is those are really building blocks
that provide essential tools.
They can be infrastructure focus, like the one we'll talk about today, or they can
be older, domain agnostic, something that doesn't fall into your business domain,
security related platforms and so on.
All right.
So with this definitions out of the way, Let's talk about why this is useful.
What's the point of building platforms?
Imagine somebody has already access to cloud like AWS or Azure.
Why do they need something on top of that?
The simple answer is that cloud infrastructure or infrastructure
in general is built for everyone.
Is not built for your developers specifically is
built for everyone in mind.
So every customer of Azure would need to find themselves in the cloud services.
So they might provide way too much than your developers need.
So thus, we need to simplifies the resources management and isolate
only those features that are useful.
And important and actually empower our developers rather than expose everything.
this in turn increases development efficiency because developers
don't have to spend time learning Kubernetes or learning all
kinds of complex cloud services.
They can just rely on the internal platform to provide them what they need.
platforms are also typically built in scalability in mind.
So they are from ground up, scalable and reliable.
And this is something that clouds, of course, already have.
But when you design your own platform, that's probably the concern that you
will think about at the very beginning.
So what are the building blocks of a typical cloud native developer platform?
those are not exclusive all the building blocks, but also the ones that you
would see in most modern platform.
So first of all, you have a some form of self service portal.
This portal can be a web interface like we will see in
the demo, but it can be a script.
It can be a program.
It can be a web interface.
API directly exposed to an endpoint.
At the end of the day, this is really encapsulating APIs and making it
simpler to access services and execute repeatable tasks like provisioning a
service, executing a test, and so on.
creating a femoral testing environment and so on.
Finally, a second point is really the key.
And I would like to emphasize this in this talk, but the platforms are
really an additional abstraction layer over an existing APIs and
they provide the programmatic APIs for developers to interact with.
Again, self service portal could be one way of doing it, but you can also
directly, access APIs if you need it.
Automated workflows.
So think of GitOps and similar principles.
Platforms by design would use those to automate the data flow within
the pipeline of whatever tasks are necessary from developers to execute.
Developers can provision a service.
And then the way it happens is that it uses behind the scenes
tools like Argo maybe, or GitOps.
That automate the whole provisioning process, monitoring and observability.
this is not platform specific, but it's very important.
Specifically, in the age of distributed systems, monitoring and
observability, ability to roll back and maybe perform canary deployments
or similar things is really critical.
So baking it in the platform is something that we see, Every time
and this is a very important building block security and governance controls.
I would also bundle here isolation.
So if you have a platform that needs to isolate workloads from a tenant to
tenant, then you might use services like the cluster internally to
isolate and achieve multi tenancy.
you want to use products, maybe a Falco or others to harden your security and
provide an end product that is secure.
in a way that corresponds to your compliance rules.
This also means providing audit trails or other things.
So those are things that you don't necessarily see as a developer, but
those are important building blocks.
And finally, platform is ever evolving.
Platform is a product.
It is going to continuously evolve with your users.
You need to have an organizational structure around platform that
controls its improvement, its evolution and its features.
So those are six building blocks that are definitely important to
see in any successful platform.
Best practices.
or successful platforms.
As I mentioned earlier, API driven is the key point.
If you look at AWS, Azure, Google, or any other successful cloud
platform, the way they provide access to their services is through API.
You might, of course, consume it through a UI.
But on a large scale, there's an A.
P.
I.
Responsible for driving your decisions within the cloud.
You can create a cloud service, manipulated and do
all kinds of things through A.
P.
I.
And I think having an A.
P.
I.
Driven mindset in building your platforms is the key to success, and
it sets you up on the path to success.
Cloud native principles.
This, of course, means designing The platforms to be friendly
to cloud native applications.
So things like containerizing orchestration, leveraging
various cloud native projects that falls into this bucket.
And as I mentioned earlier, GitOps workflow or some form of automated
rollbacks now releases a B testing and, following GitOps principles one way or
the other is definitely something that will ensure success of your platform.
What is an API?
Let's just make a quick refresher.
So API stands for application programming interface and is really
a set of rules and protocols for building and interacting with software.
So I emphasize those two words here, interface and interacting.
this is how typically the logic is driven.
And if you design your platform thinking, how my developers would interface with the
Set of services and API's are provide I provide and how would they how would the
API's interact with existing software?
So you have kind of a.
Pipeline idea.
At one end, you have a developers teams that use whatever interface
is comfortable for them.
And then at the other hand, you have some outbound, processes that
interact with various software.
Let's say Argo or, deploy something to Kubernetes.
So if you have this mindset of having an API pipeline, It really helps
design a very healthy platform.
So why it all matters?
Why do we want to have a platform?
And why do we want to follow an API driven design?
First of all, this simplifies a lot of things.
You can hide or create an obstruction that exactly,
corresponds to your developers needs.
You can standardize on existing standards, HTTP, GRPC.
You can use something that's already there.
You don't need to invent a new communication protocols.
It helps automate all kinds of tasks on various levels and layers of automation.
And finally, APIs are very good at scaling, and you can also
secure them relatively easily.
A brief reminder, I want to repeat that platform is a product.
So this simplistic diagram really shows that the way you design a
platform is not different than designing any other product.
So you want to understand your customer's needs.
You want to design something that fulfills a need or two, test it, implement it,
and deliver it to your customers, gather feedback, improve on it, rinse and repeat.
The platform should be approached in the same way.
So let's transition to a demo.
During this demo, I want you to think of the principles we just discussed and see
how they translate to an actual platform.
So we are going to see how platform can empower developers, how it uses
cloud native principles, is API driven, and is baked by GitOps workflow.
So let's transition to a demo.
I am going to wear a different, different hats.
So now I am starting as developer, what on the screen.
is my developer world.
This is my application.
It's called, Azure Storage Blob Reader.
And the task of this application is to read the content of Azure Blob
Storage and display it on the screen.
I might not know too much about Kubernetes or cloud services.
My knowledge of containerization ends with this very simple Dockerfile.
Right here.
Maybe somebody from platform team helped me create it.
But that's about it.
So my application is a Node.
js app.
We don't need to go in detail.
In code.
I just wanted to show you that me as developer, that's what I spend my time on.
I am creating new features every day.
I run tests.
I make sure that everything works.
And then I'm using cloud native services and cloud services in
conjunction with my application.
But the most important part here, that's where I would feel most comfortable.
I don't need to deal with all the cloud complexity.
All right, so that's the developer.
So the developer.
Now I would like to deploy this application and test it.
I want to see if everything works correctly.
So how do I do it?
Remember from our principles, we are actually using service portal.
So here for this demo, I am using port.
Unfortunately, port still doesn't have dark mode.
Apologies for that.
I tried to keep everything in dark mode for presentation.
Regardless, we can see, as developers, that we have two actions in our homepage.
One, it says cross plane storage account reader, and we can create
it, and another one for removing it.
this is the extent of the knowledge I need to have in order
to create my infrastructure.
let's kick it.
When I hit create, it asks me some variables, not many, just three
that I might be interested in.
One is connection string, how I want to name it.
Another one is my image, maybe I want to iterate on the new versions
of the image and I can bump it.
An application port.
I am happy to accept the defaults and behind the scenes what's happening port
in this case would actually kick off a CICD pipeline, which in turn creates
a PR that creates a necessary file.
Okay, so this step is really just for convenience.
We could have manually create a file, but we did it through port.
as developers, we interact with port or with backstage, and we
actually create various things.
let's see what's happening.
If we refresh this, we should see in a moment that there is a pull
request being created by port.
I don't know why it doesn't auto refresh yet, maybe a good feature request.
So now we have a pull request into a repository called apps deployment.
So let's see.
Let's click on our pull request.
All right, so we can see some details again.
This is from a developer point of view.
I can interact with this and read various things.
I can see logs and runs and whatnot.
That's not important at this point.
But here I have a link to my repository.
As you can see, we have apps deployment.
This repository is specifically designed to deploy my application.
So now I am changing my hats to becoming a platform team or maybe an
admin, and I can see, oh, there's a new PR on a deployment repository.
By the way, this action is obviously not mandatory.
You can skip the PR review, but I want to show you that it's possible.
So what does this PR do?
This PR creates a single YAML file.
We talked about API and this is the API in action.
The API we all agreed on is the Kubernetes style API.
So both platform engineers and developers agree that the way we're
going to talk to each other and the way we're going to make things happen
is by standardizing on Kubernetes API.
Why a Kubernetes API?
There are various reasons.
Kubernetes already exists and has strong ecosystem.
Kubernetes lends itself very well to designing custom APIs using CRDs.
And the list goes on.
So how does this file look like?
You can see this file has API version, which if you're familiar with Kubernetes,
you will know it has kind, which is a custom resource definition kind.
So this is an app claim.
It has some names and labels, and it also has spec.
It follows the Kubernetes API design, which is spec and status.
And within the spec, Those are the parameters you might remember.
We specified in port when we triggered the service.
So we have a namespace which we couldn't specify.
This namespace is hard coded for our team.
And then we have those parameters.
So this simple YAML file is all what it's necessary to create our
application, create associated cloud infrastructure, and other things.
All right.
So let me show you one more thing before we move.
I'm going to trigger Argo CD.
So here I just have a handy script to do this.
So I just type just launch Argo and you will see that inside of Argo CD,
there is not much happening just yet.
Let me log in real quick.
We have a simple bootstrap application and this Argo CD app Observes
the apps deployment repository.
The one we just seen a moment ago and the one which we opened the PR to.
So for now we have this empty app.
Nothing happens.
Okay.
So far so good.
Let's approve the PR.
Let's pretend that I reviewed it and we're going to merge it right now.
When we merge the PR, we will see in a moment that Argo will pick it up.
I will help Argo by refreshing the screen real quick and things start happening.
We will see there's a lot of new resources that are being created
just from the small configuration YAML that I submitted through API.
So we have app claim, which is exactly the YAML that we've seen
earlier with some Kubernetes annotations and additional code.
Kubernetes added stuff, but you can recognize the spec here and
little our parameters and so on and the kind API up claim.
So we have applied it to the cluster.
Okay, so how come we have all those things right here?
We'll talk about this later, but let's go back to being a developer.
Remember, one thing I did is I went through my port, I
created, clicked create, and how do I know if my app is ready?
Maybe I wanted to go and grab a coffee, but my goal is
really to test my application.
Okay, what's a better way than create a Slack notification?
And as you can see here in my channel, I have an app notify.
That I might be subscribed to and I have a notification and it tells me your
application is deployed to localhost.
Okay, let's give it a go.
Indeed, we have Kubernetes demo.
Let me make it a little bit bigger.
So this is my application.
That's my storage, Azure Storage Account Reader.
So now it's working, but there are no documents found inside.
And just to prove to you that we have all the architecture, all the infrastructure
here, If we go to Azure for a moment, you can see that we have platform demo,
which is the resource group that is being deployed and sorry for all the jumping.
But going back to, you can see that one of the resources that we've
deployed is actually resource group.
So it is corresponding to my resource group in Azure.
We also have vcluster, slug and some other stuff that's called object.
It's quite some resources that we as developers didn't need to specify.
All right, so you can see here we have platform demo storage.
This is our storage bucket.
If you're not familiar with Azure, storage bucket is like an S3 bucket in AWS.
And inside of it, we have a simple container called sample
blob, which is currently empty.
There's nothing here.
But our app works and we as developers.
Just needed to click one button.
Okay.
So how do we test this API all the way down?
Let's go back now.
By the way, you can see here.
we have cross plane resources.
Cross plane is the magic behind creating all the cloud infrastructure.
And then we also have here an NS, which is a Kubernetes cluster viewer.
We can see all the events and we will look at this a little later.
But for now, let's again switch hearts and let's become a developer.
So now I don't want to use get ups.
I don't want to use, something complicated.
I just want to quickly create something that I can test my app.
So my app needs an actual file that is inside of the blob and storage account.
So what I can do is I can use your catalog, apply that file.
And I go to examples folder.
And here I have a blob content dot Yama.
We'll see its content in a second, but remember, it's just an API.
So I want to create a blob content inside of my newly created infrastructure.
And let's see what happens.
So it says created.
If we go back to our web page and refresh now, you can see that
our storage block leader actually reads the content of a file.
Hello, 24 platform engineering and just to prove it to you that the file is here.
You can see, indeed, there's an example file present in my storage account.
from the developer's point of view, I was able to very easily spin
up the infrastructure, create my application, and do everything that
is necessary to interact with it.
Let's go back real quick to the application and see what's
happening inside of my application.
But if we go to namespaces, you can see that there's a DevOps team namespace.
And inside of it, I have a three replicas of my app full.
And I have also a cluster and some coordinates.
So why do I have the cluster?
Let's imagine that the developer we tested this, but we also want to.
Maybe we are a little bit more Kubernetes savvy and we want
to experiment with Kubernetes.
But of course, we Our kubernetes is populated by other tenants, and we don't
really want to get in anybody's way.
Maybe we want to create and try out new English controller or something else.
And here, if you go back quickly to our slack notification, it tells us that
we have here a dedicated V cluster.
So if I have a big cluster CLI, Slack tells me, Hey, this is your big cluster.
You can connect to it and you can do whatever you want in this big cluster.
So let me open a new window.
I can connect.
And as you can see, we are connected.
I can, for example, run maybe a debug port or I can run, any other port
or create any other part they want.
And this would be interacting with my future cluster.
So we're not only given the developer the Testing ephemeral environment.
We also given them, their own fully fledged virtual Kubernetes cluster
where they can test whatever they want.
So they don't have to open tickets and keep following up.
So we are escalating the ability for developers, to use various testing tools.
All right, let's, close it.
Let's go back to our regular cluster.
And we have seen the whole flow so far as a developer.
I am very happy.
I didn't need to interact with any of it.
I can see my application clearly.
and I can test it.
I can also additionally look at the observation log, so you can
open Grafana, which is a dedicated Grafana dashboard just for me.
And as you can see now, just a little bit of data here.
It's a dashboard that maybe my platform team prepared, and it only
observes my namespace and my vcluster.
And then I can see what's happening.
Maybe I want to tweak resource quotas and so on in my application.
So we can see we are giving developers various tools, and they
are All automatically deployed.
We don't need to create a ticket or anything.
We just deployed all of this that you've seen.
We have deployed from this small yaml file we've seen earlier, which is this guy.
Okay, so now we're How did it all happen?
Where is the secret sauce?
Obviously, the complexity cannot completely disappear.
So there is definitely complexity somewhere.
So let's open a composition.
So what I'm doing here, I'm using cross plane and cross plane composition.
As you can see, it's 285 lines of YAML.
And as a platform team, this is my task.
I am encapsulating and hiding this composition, this, complexity
inside of a composition.
I'm hiding it away.
From developers.
So let's see what this composition has inside.
So if we go for name, you can see it has an account service ingress
and also deployment namespace.
Those are all Kubernetes resources.
It has the cluster.
Resource group container and so on and so forth.
So all those things are deployed for us by cross plane, and we can hide
all the complexity using those tools.
So you can go and read all the file.
I will leave a link somewhere in the presentation, so you can see it later
and you check the repository for it.
But that's how it works.
But this is.
Again, I'm putting a heart of a platform engineer.
That's how, that's how we can make it happen.
That's how we can make it possible.
All right.
So now I'm done as a developer.
I don't want to deal with this anymore.
I have tested.
I successfully tested everything.
And now I want to delete.
All the infrastructure, including all the Kubernetes resources.
So as you might imagine, there is another action that I can perform in my
port here, namely delete the resource.
So I am just selecting the right repository.
And I have to give here a file name.
So I'm not deleting accidentally somebody else's file.
Click delete.
And as you can imagine, that opens another PR.
We need to wait a second for another PR to arrive.
And once we approve this PR, everything will be cleanly removed,
including our cloud infrastructure, as well as Kubernetes resources.
So that is a really, lean flow implementing the principles we were
talking about and, showing you how potentially you could implement
an infrastructure platform, and expose it to your developers.
Again, let's pretend I've reviewed it.
What it does, it just simply removes the file, but there's
nothing crazy happening here.
It's a simple GitHub action, confirm, and we are done.
So with this done, if I go back to Argo and refresh, you can see
that everything could be gone or is in the process of being deleted.
And we can go also here, you can see on the left hand side, all
the cross plane resources are also being currently removed.
My application is being removed.
Everything is cleanly done.
All right.
That was hopefully, showing you well, how you can do this, but let's
quickly go back to presentation and let's, summarize what we've just seen.
So what tools have we used?
We have used Kubernetes and we used it not as a Container orchestration,
but we use this as a control plane.
We've leveraged Kubernetes A.
P.
I.
Friends and foremost, and we used it to reconcile everything and anything.
We used also the cluster to give our developers, virtual Kubernetes
cluster if they need a little bit more.
to play around with or test.
We have used cross plane to deploy everything and reconcile our deployments
to keep it synchronized with the desired state with the actual state.
And we have used various cross plane providers.
Providers are like telephone providers that you can target
various infrastructure.
We have used Azure provider.
We have used cross plane functions, Kubernetes, Helm, HTTP.
So that's like Lego building blocks, which you can target.
We have used Port, which is a developer portal similar to Backstage,
and we let our developers interact with Port rather than, directly
interacting with the YAML file.
I want to emphasize it is possible to directly interact with YAML
or even to directly interact with Kubernetes API, depending on the need
and the level of, our developers, knowledge about Kubernetes.
We've used GitHub and specifically used it as an interface driving the exchange
between the platform team and developers.
And we used the.
Messaging system.
as prs, we use PRS as messaging systems.
Our prs are actionable messages that developers send to platform team.
They review it and then, magic happens.
For GitHubs, we used Argo cd.
We could have used flax or other mechanism, but Argo CD because of its ui,
was nice to show it and believe it or not.
All this that I showed you is running on my local kind Kubernetes cluster,
which is Kubernetes in Docker.
So I was able to encapsulate all those things inside of my Kubernetes.
You can, of course, run it in Docker.
in a cloud somewhere like in AKS, EKS or GCP, and you can do this equally well.
All right.
So that was tooling.
Let's look at a helpful diagram, that will again, guide us through
the journey that we just saw.
And I would like to pinpoint certain aspects here.
So we started by being a developer.
We interacted with the portal and then we accessed the UI and we created a
triggered, creating of our ephemeral testing environment, including our
app, this in turn triggered action to git and our git repository.
Received, an action or triggered an action as a third step, this PR, was
reviewed by platform engineers approving.
We have committed those changes to a repository where Argo reconciled
it, applied it to a cluster, and from there, Crossplane Bye.
Bye.
Talk to Kubernetes API and used CRDs and various other mechanisms to reconcile all
the infrastructure and the application.
This in turn resulted in provisioning vCluster, provisioning our application,
which is various Kubernetes resources, and also provisioning cloud services.
So that's just a quick overview of what we have done in the demo.
in conclusion, what we were able to do.
We reduced friction for developers to almost zero.
The only touch point that developers had to have with the infrastructure was a
PR approval from somebody from platform team, but we could very easily eliminate
this PR approval and then we would have actual zero friction in the whole process.
We eliminated waiting times related to Tickets because we anticipated that our
developers might need a little bit more experiment with and we have added the
cluster that gives them essentially admin level privileges, virtual cluster, and
they don't need to ask us constantly.
Hey, can you create this for me?
Can you create that for me?
So we have both cater to the immediate needs of testing and application.
And we also gave them A little playground that is encapsulated just for the team
or just for the person that does the PR.
We have used the Git PR as a unified interface between platform and
developer teams, and we encapsulated the API calls to the Kubernetes.
Those API calls are in the form of YAML.
This is just configuration, but at the end, those are instructions for the
Kubernetes API and other projects like Crossplane to do something with it.
And the magic is in the collaboration part where platform team prepared
something up front, which is a cross plane composition and all the setup.
And then developers were able to collaborate by executing various APIs.
We have also followed zero trust security principles because our developers
didn't need any access to Azure.
They can, but we've encapsulated it to a point where the developers simply
interacted only with API through Kubernetes without even knowing
that there's Kubernetes behind it.
They just apply files and work with files and configuration and
then achieve certain results.
And we didn't even mention AI once.
So the key take away from this presentation is really to think
of designing developer platforms, whether they are infrastructure
or other types of platforms.
Think about it like designing an API.
API is really a data pipeline.
It has input.
It has outputs.
It has various.
mutations that happen along the way.
And if you think about how your API should look like and how you design and craft the
contract between developers and platform team, that's a recipe for success.
Thank you for your attention.
Thank you for your time.
please enjoy the rest of the conference.
if you would like to reach out to me.
You can visit my web page, CloudRumble.
net, or simply connect me, connect with me on LinkedIn.
And I would be really interested to hear your thoughts about Kubernetes, platforms,
and all the tools that we've used.