Transcript
This transcript was autogenerated. To make changes, submit a PR.
Conf 42 cloud native 2022 thank
you for joining my presentation. I hope you've been enjoying all the other presentations
going on today. I'm my talk is entitled Local
Microservice Development with remote kubernetes assist and it's
really a story about how we at Stackhawk invested
in our dev tooling and how we were able to use kubernetes and
other cloud native services to facilitate our software
development process. My name is Zach
and I've been in startups for most of my career.
I really love the opportunities and the excitement of startup companies,
and I feel like you really get a breadth of experience that
you can't get at larger companies or you don't often get at
larger companies. Automation has been kind of a career
theme for me, starting out with networks and systems and security,
and these days I focus a lot on software delivery,
but I still do that other stuff too.
At Stackhawk we build tools for test driven security
and our primary tools is a dast scanner that's built
on the open source project OaSp Zap.
Dast scans for dynamic application security testing,
and it's a type of scanner that probes a web app for known vulnerabilities
by attempting to exploit them. Among dast scanners,
Stackhawk is the best for running in CI CD and for working in a
team environment. It's also free for developers
if you happen to be looking for a new challenge. I should also mention that
we are hiring. Okay,
so I want to talk about our app platform, because this
is the context of our story. When we
started about three years ago, we had a greenfield opportunity
in front of us and we wanted to build out a really state
of the art platform. So we looked at these
CNCF rules for consistency between apps and we wanted
to do like twelve factor apps, stateless design patterns,
set up some common rules so that once
we got started, we had a great platform to build on.
We knew that there was going to be a run anywhere das
scanner component to our architecture, and this is what people would run
out in the field to scan their applications. We would also
have a web UI and both of these components,
and this is where you would look at your scan data and stuff,
and both of these components would tie into microservices
and rest APIs running in Kubernetes.
So we had run anywhere DAS scanner react
single page application UI tests API running
on microservices and these microservices would run using
Kotlin as a language to develop those microservices at
the same time, we wanted to go ahead and invest
some time in our build platform. So we built it
with security in mind, but we also built it with fast iteration
in mind. And we wanted to set this up to be a Gitops kind of
platform where hopefully coders would just check in code,
they'd build and test locally check in code, and then automated
build systems would take over and deploy the software,
as long as all the checks completed correctly.
So to do this, we set up a bunch of AWS
accounts, and these accounts would serve as
our build environment and several runtime accounts so
that we could isolate the build operations and the runtime operations.
And then across that we would also stripe in different environments.
So a production environment, of course, but also several
staging environments where we could test things out.
To help us in this build platform, we created a
big bash script. Started out small, but it got bigger
over time. We called it biodome. And biodome is this
library of bash functions that would help us in
our local development as well as in the
pipeline when were building our software. And it would do things like get
information about our AWS environments, like figure out what environment
we're running in. And so what environment we would be targeting,
what the account numbers were for these different types
and app environments. But it
would also contain some common functions for pushing and pulling artifacts,
such as jars and container images.
And it would also take care of deploying manifests and helm
charts, or at least help with that stuff. To make it easy.
At the same time, we adopted a platform
called Gradle, which is a pluggable JVM tool similar to
Maven. And this is a tool that is opinionated,
and it makes it easy for a JVM based language like
Kotlin to build and test and package artifacts.
But at the same time, we went ahead and started to build our
own library of functions and plugins that we could
plug into gradle, and we called that library Ari.
Finally, all these stuff runs on AWS code.
Build, just runs build steps. It's a really simple system,
just runs build steps in response
to GitHub PR and were webhooks
that come in. So then at a high level,
what this looks like is we've got a repo
for every microservice that we develop. And those
repos live in GitHub. And as developers issue
prs and merges to GitHub,
GitHub would send a webhook over to code building in AWS.
And code building would kick off the build job. These build job
would use Biodome the big shell script library, as well as gradle,
to perform all the building and testing and figuring out
which environment we were in and deploying the software, as long as all the checks
completed correctly. So we
had all this stuff in place, and it felt like a pretty good platform.
And we got started, and for a couple of weeks, it was really
pretty cool. Developers could bring their own favorite
idE. They would test locally on their
laptops, code build, test, repeat, and then submit
prs, and automated build and deployment would take over
from there. But it turns
out that it ended up being kind of a chore for developers
to set up all the microservice dependencies that
these needed to work on their target service.
So if they're working on, like, service A and as
we build out more microservices, maybe that microservice
really doesn't do much unless it's got service BC and
D. So we
needed to figure out a way to make it easy to bring up the
latest versions of service Bc and D so
that they could just get to coding. And what
we came up with was a system where we took Docker compose,
and there's this cool feature in Docker compose,
which is that it can overlay configuration files.
Docker compose, if you're not familiar, is a way
to lay out a bunch of services or containers in
a YAmL file, and they can all talk to each other.
So it's a nice way to put together a little assembly of containers.
Much like microservices, it's a good way to
develop locally. So what we did was we
used this feature of overlays, and we concocted a
scheme where each one of our microservice project repositories
would contain its own compose file, a Docker compose file,
and each repo would also define all of its microservice
dependencies. Each microservice container,
again, it exposes unique reserved ports.
So for service a, you always know that it's listing on port
3200, and maybe a couple other ports in that range.
In each project, we would have a script, simple script called
local start sh, and that would pull those compose files
from all those other projects and run them together.
They would merge and run all these dependency microservices,
and your target app service a would be left out
of the mix, and the expectation would be that you work
on service a locally, but you've got all these containers running,
and they're listening on the local host address on their reserved ports,
and service a can talk to them and that made it a
lot easier to build and test. And when you came back the next day
and other people had made changes to service BC and D,
you could just restart your Devcube version zero
Docker compose setup and you'd get all the latest images.
Worked really well.
Let me kind of talk through what this took
like. So say you've got a repository
SVCA service a in GitHub,
that project repository is going to have a file called Docker
Compose service a. It's a YAML
file, and it defines its own services.
And you can see that top box.
The Docker compose file just describes how to
bring up service a itself. And when
we run this composition, of course, if you're working on service a,
we're not going to bring up service a in a container. So this is actually
used for other projects that depend on service a.
So service a in this compose file, it defines,
hey, what image do I need? Well, I need service a with
the latest tag, and we're getting that from an ECR
repo in AWS. It listens on port 3200,
so that should be exposed to the local host address, and it
depends on service b, C, and D.
Then if you go to the other projects for service b and service
c and service D, you'll find similar docker compose
files, and those might define local dependencies.
Like some of our projects require a database or a redis store
or something. Those can be defined in their
own compose files as well. But we can also say that
they depend on other microservices.
So then same is true for the repos
for service C and service D. Okay,
so when you run that local start script and
you're in project service a, it's going to pull
the docker compose files for service B, C, and D.
And this merged docker compose file, in effect,
is what you bring up when you're working on service a.
Now, what that looks like is all
those docker containers for those other services running locally
on your laptop. Listening on the local host address, and then
to the right there, that box on the bottom right shows
what you'd see in your ide when you run gradle boot run to bring
up your local application, it comes up and it
can connect to the other services that are running on your laptop.
This worked really well. It was a snap now
to bring up all your dependency microservices and just start coding.
And this worked for like another four or five weeks. But after
a while, the number and the size of these microservices grew
and grew and it became a little bit hard to manage memory between
IDe and Docker desktop. And your build tests run
sort of functions. And we started to question if we
had gotten laptops that just weren't powerful enough.
Well, it turns out we do not need faster
laptops. What we really needed to do was
figure out a way to offload some of those microservices.
And we had heard about different ways that you can use Kubernetes to
assist in your development process.
But one thing that we really wanted to keep about our
current process was this use of local
tools. We really like our ides. We like building
and running and debugging things locally and using profilers
locally. So what we did is we looked around in
the devtool space and we found something called compose
with a k, and compose with
a k, you might guess by the name.
Basically it just can reuse your existing docker compose files
and it can generate Kubernetes manifests based on
those. So what we did was we created
a script called Devcube Sh and this is Devcube
version one. And Devcube
expected that you had compose with these k installed on your laptop,
and it would go through that process, it would pull down your dependency docker
compose files, use compose to generate the
kube manifests, then go and apply those manifests to Kubernetes
in a namespace that's based on your username.
And then it would set up Kubernetes port forwarding to
reach those microservices locally. And if you haven't used this before,
it's a way to use the cubecuttle
command. I call it cubectyl. So if you hear me say cubectyl, that's just what
I call it. There's a cubectl
port forward command that allows you
to set up a port forward so that you can reach your microservices
reach your pods or services as if they are running locally.
So now it should look just like before
with the Docker compose setup.
Except now all of those dependent microservices, all those microservices that
your service depends on are running out in kubernetes and
you can continue to develop your target app locally.
So what that looks like is, do you remember with
the Docker compose setup, what you end up with is a merged
docker compose file. So we go
through that exact same process, we pull down that Docker
compose merged file and then from it
we run compose to create a bunch of manifests.
And the manifests end up being a bunch of deployments and a bunch of
services, so that the deployments
are to set up the pods that host your containers and
the services are to make it easy to connect to those pods.
We don't have to guess the names of the pods that get generated
by the deployments.
So then if you do a Quebectyl get pods in
namespace. Z conger in my case, you would
see all of your dependent services running in kubernetes,
and you can reach them on the localhost address.
And so it looks much the same as the previous process.
You're on your laptop, you run devcube up and
it creates your services and deployments,
and then you can see that your pods are running, and then you run your
service a locally and you can develop it and it is able to
walks to all those services. It was really pretty cool,
and we kind of had this sense of, it just felt
pretty powerful. It was really nice. It was a big performance
boost for everybody. Our laptops cooled down,
we could dedicate much more memory to the ide and to the
build run test process.
And this was kind of a hit. It worked well,
especially for UI developers who needed to basically bring up
the entire microservice stack.
So these worked for like a good two years. And it
was an amazing feat of shell scripting and local dev
tools. But these were, got to be honest, there were
some issues. So it's
built on an edifice of shell scripts, right? And shell scripts over time
can be hard to manage when they get big. They're just not built to scale
quite that much. So we had biodome, the bash function
library, and it had gotten pretty big at this point. And it also
depended on a bunch of cli tools, devcube too.
And they were finicky about the versions of devtools that you were using.
Not only like what sember version you were using,
but whether you were on a Mac or a Linux box or
a Windows machine, you had to pull down different packages.
Every software project had a bunch of shell scripts themselves,
and these were calling Biodome. And even Devcube was requiring
kind of a lot of locally installed tools.
And it was especially bad for new developers coming in. I mean,
not too bad, but they really had to do a lot to get their laptops
ready to start developing code. They had to install the AWS,
Cli and terraform and Docker compose and
this compose with a k and a bunch of other things.
And these sprawl was really starting to be a bit much.
Do you remember I mentioned that we also use Gradle as a
build tool for our JVM projects.
So if you're not familiar with Gradle, it's similar to maven
or NPM, or make or cargo.
It's really popular for JVM applications.
It's a neat build tool because it's highly opinionated. It's really easy
to get started developing with Gradle,
but it's also super extensible. And you can use Kotlin
or groovy to build plugins and tasks that
you can run in gradle. And since you can use Kotlin,
that was especially useful for us since we're a Kotlin shop. It gives
us access to rich Java and Kotlin libraries.
And those libraries, of course, you can do anything with these libraries.
There's a ton of them out there now. And one of the key
things that we can do, and that was useful for us is it gives us
access to cloud APIs. So what gradle
ends up doing for us over time, as we
were starting to develop our own plugins and tasks,
is it not only builds and tests and packages code,
it can also pull these plugins that we're developing in our project
that we call ARI. And we can start to do things like authenticate
to code artifact, which isn't a tough thing to do.
And most people just do a shell script to authenticate to code artifact if they
use it as their artifact repository. But we built it as
a task. We can also push
and pull containers, push and pull objects to s three.
We can get a lot of that information about our different AWS environments,
and we can deploy workloads to kubernetes, and we can do
all of this kind of stuff, even opening prs to
GitHub using these native APIs
of those services.
So over time, what we found was that our
old biodome shell script started to become
less necessary overall for our build process.
Over time, we were building a lot of these functionality that biodome
was providing into ARI and into our gradle tasks.
So we made a decision to formally
try and get off of that shell script and start building
the rest of those functions into ARI. And furthermore,
we wanted to abstract all of those custom functions that we were using for
gradle tasks, abstract those into libraries that could be used by
other software applications. And the first software application that
we thought of was a command line utility, sort of like Biodome.
And so we just called it biodome. But this time biodome is
built using Kotlin. So it's a nice CLI and
it generally speaks directly to the APIs
for the services that it manipulates instead of relying a bunch of local
dev tools. So it became easier for
new developers to get on board because they would just
download biodome. And in fact, we've got helpers in Biodome
to help developers install any devtools that they do happen
to need. So the advantages, at least to us,
were super clear. Now we can write all of these build functions
in a strongly typed language that's compose. It's testable,
it's much easier to scale. We can grow this thing to
a very large size, and we know JVM languages can handle
large applications and you can build on top of them.
And we can directly access these cloud API libraries so we can
manipulate AWS and Docker, kubernetes, GitHub,
anything else that comes along. And it's all written in the developer's
own language, in the language of our platform, Kotlin.
So it's accessible to everybody to
use and to manipulate and to add on to, to build on.
Okay, let's come back to our story now. We were talking about Devcube.
So Devcube was a big shell script,
but now it's just a part of Biodome.
We created a subcommand in the new Kotlin based Biodome
called Devcube, and now it's got access to all
these common build functions that we've abstracted out into libraries
in ArI. We have less reliance on local tools,
and it works directly against the Kubernetes and AWS
APIs. It's opinionated,
super simple to use for newcomers, and it's flexible
and extensible, so anybody can go in and add functions to it if they
want. So the new
devcube has a. I'll describe it from a couple of angles.
First, I want to describe the configuration language. The configuration
language is again Yaml, just like Docker compose,
and it looks similar to a Docker compose configuration file,
but it's more tuned to our exact types of services.
So now we define microservices and other dependencies
for our platform apps, but we can abstract
away a lot of common details. Like in
our previous docker compose files, we were defining resource requests
and limits so that as these devcube environments would
come up in kubernetes, we were telling kubernetes how
much memory and cpu we expected those Devcube environments
to take, and that way kubernetes could
auto scale to handle more of these devcube environments coming
up and down. Now we can bake
that into the libraries that we're calling. We've got some sample sizes
or some typical sizes that we expect,
and of course we can still customize it within our YaML file, but a lot
of that stuff we can just assume will be handled in a
rational way by default. There's also a lot of
common environment variables that we'll bake in, and pod
permissions in kubernetes and AWS. We can build that
into our libraries as well, so we can abstract that
away. And for the most part you don't have to specify any of those
details. But if you want you can, because we built some customizability
into it.
Then there's the developer experience. So once you've got all
those Devcube configuration files set
up in all of the repos for our microservices,
a developer working on say service a can just say biodome
devcube up by default that
reads in the config files from all the other dependency repos
and it builds out manifests behind the scenes
and it applies them to kubernetes. So you end up with a devcube environment
that looks just the same as what we had previously,
but there's other options that we can bake in as well.
So we've got an option to do a devcube up,
but bring it up in a local docker compose
like environment. So it just brings up those same containers running in
Docker desktop. Or if you want,
you can also bring it up in a native jvm way, so they'll all
be running on your local machine, but just natively
directly on your metal and not in containers.
And that ends up being kind of a nice way to go if you just
have a couple of microservice dependencies,
because it's simple and lightweight and pretty fast.
We also have a function in there to take snapshots. So you remember
some of these services have their own databases, and they'll bring up those databases
and as you add sample data to it, maybe users and
some sample scan data. You hate to lose
that environment when you bring your devcube down and bring it up the next time.
So we added a snapshot functionality so we can take a
copy of that, store that backup in s these
and you can select which backup you want to use or by default
there's just a default name for your default snapshot and
it's great, it's really handy.
But in addition to that biodome command, the Biodome devcube
command now we can do devcube like things and other
functions in gradle. So now
gradle can deploy devcubes, and those can be super handy
as ephemeral iterations, test environments, for instance,
or to deploy static environments for user acceptance testing.
And we can also use these to create a
bunch of manifests for deployment with argo CD or flux CD.
So this can be a way for our applications
to generate their own installer manifests that can be
used by cloud native tools that expect to be working with those sorts of manifests.
And what we had come across or what we had developed
here was really a larger internal developer platform that was
based on our own Kotlin code, reaching out to well
established APIs for AWS and kubernetes and all these other
cloud native services. And it was
great because it was really allowed to our engineers and environments.
It was built with knowledge about the way we do things and the
way our developers like to work. And it was easy for newcomers
to come in and just start using this tool, but it was
also easy to add on to, and all of us can add on to it,
including the developers. And it just made sense.
And we've thought about directions that we can go in the
future with Biodome and Devcube took. So some
potential enhancements that we've been talking about, or at
least I've been talking about, are why not add a web UI
and get quick access to some of the common operations that
are available in our library in Ari?
So we could create UAT dev cubes
for product. So maybe product could come along and
spin up their own UAT environment for tests that they want to do.
Or maybe product support could use Devcubes
for troubleshooting. They could set up an entire platform and an
entire scanner environment so that they could run some tests and run some
experiments. We could also create
crds and controllers to manage our Devcube
environments. And one idea that came up pretty quickly was
with some of the new functionality that we have for Devcube,
it's possible that developers will spin up more than one
apiece. They might spin up several. And over time that could
be wasteful. So we might want to have a process that just runs and
watches for Devcube environments that have been around for too long,
maybe take a snapshot of them for safety and bring them down.
Were also thought about other functions that we can use to
provision other kinds of resources for developers.
For instance, when we come up with a new microservice,
there's a whole setup process for setting up the new repo
and the associated build pipelines and it's automated
enough. We use terraform to do that, but it's still some work.
Why not just have a gradle command or a biodome
command that just creates that repo and creates those build
pipelines. When we go to push new
containers to ECR repos, we have to create
these ECR repos. Again, we use terraform to do that and
it doesn't take that long, but it takes a little bit of work encoding
to do and it's not all that dry. So we
actually have done this. We've created a function that whenever you use
gradle to go and push a container to an ECR
repo, first thing it does is check to see if that ECR repo
exists, and if it doesn't, it just creates it for you.
Just a huge time saver. But why not
also make functions available for developers to be able to create their
own eks clusters, create these own Kubernetes clusters that
they can use to test without fear of damaging any of the other environments?
Well, that's our developer tool story, and of
course it's not over yet. We ended up building
an IDP that really helps our developers get on board fast,
focus on their work, and take part in building more functionality into it
themselves, since it's written in Kotlin, which is what
they know. When we look back at all the efforts we put
in, I think everybody at Stackhawk would agree that it
was really worth it. It's been a huge enabler, not only
for developers, but for the product team and for our ability
to quickly deliver and iterate. I hope that our story is
helpful for you and on your developer
tools journey as well.
And I want to take a moment to thank all of the developers at
Stackhawk. Everybody really took part in this project.
And just to call out a couple Casey is our chief
architect. He really has been a champion for investing in
our local build tools and doing it right.
Sam Boland is our full stack engineer, and he's been a massive contributor
to the whole effort. Topher Lamey started us
down the path of using gradle plugins and it's just paid off
handsomely. And Brandon Ward is a new software engineer
who came in recently, but he had a bunch of great ideas for
how to build good tools for developers.
Then finally Scott Gerlock, who inspired us to build a
really solid, scalable, secure platform on cloud
native technologies so that our laptops would never be a roadblock
to success and thank you so much to all of you here
at Cloud native, and thanks for watching.
Take care.