Transcript
This transcript was autogenerated. To make changes, submit a PR.
You hi, my name is Kenneth
Dumez and I'm a developer relations engineers here at Teleport.
Thank you so much for taking the time to come listen to my talk.
We've got so many good ones at the conference this year, and I really urge
you to check them all out if you get the chance. The folks from Comp
42 really know how to get a group together,
so today I hope I can teach you a couple things about securing your
automated workflows, how the landscape looks right now, and why it's probably a bad
idea to use long lived static credentials in your various CI
CD flows. But today we're going to focus especially in
on GitHub actions, hence the name of the talk,
why you should never use static shared secrets in GitHub actions
so many of you are probably familiar with these two logos,
if not the one on the right, certainly. If not the one on the right,
certainly the one on the left. Gotta love that strange little octacat
guy that the GitHub folks have conjured up. The logo on the right, if you're
not familiar, is for their CI CD solution, GitHub Actions
GitHub Actions is great because it allows you to centralize all of your integration
and development testing workflows in the same place as you keep the code you're testing.
So that way there's no need for a bunch of other repositories floating around
with different workplace configuration files and
et cetera. There's no need for separate DevOps repos.
You get this nifty little UI where you can see all of your test runs.
You can also click into individual runs and see all of your details.
It's really a great tool for managing your development lifecycle in a pretty
intuitive manner. The GitHub actions config files themselves are
also pretty simple. It's easy to get started. It's really just a
great solution without overcomplicating things and a lot of
minimal overhead. This is just a little example from
one of our repos at teleport from the instruct labs
that we have, and I'm
certainly not alone in my opinion on the tool. This is data
from HG insights that shows the adoption of GitHub actions by companies
in the last year. As the product has matured, its user base
has also grown wildly and still continues to grow.
When GitHub Actions first came out, it was a little bit rough around the edges,
but now as it matures, adoption has skyrocketed
and this is also only tracking enterprise organizations and
doesn't account for the thousands of open source product that are also relying on
GitHub actions for their CI CD needs. As you can see,
over the last twelve months there's been a 71.88%
increase in companies using GitHub actions. This brings the current running
total of companies that HG insights tracks to 9406.
It's a lot of companies. And if
you've seen some of my other talks, you know I love the Git Guardian
State of Secret Sprawl report. This is the most recent numbers
from their 2022 report. Looking back on the past year,
I really do love this report because it really illustrates how big
the problem with secrets, especially in GitHub, is. You would
think by now we as an industry would start adapting our practices a little
bit and being more careful with how we manage credentials. But no,
the problem is actually getting worse. 6 million secrets
were leaked in 2021. That was double that of 2020.
Part of this has to do with the increased amount of companies moving their infrastructure
from more traditional on prem setups moving
over to the cloud. As there are more cloud resources, of course there's
going to be more credentials requiring to access them, different access tokens,
API keys, long live passwords, you name it.
And frankly, most organizations are just not equipped to deal with
these leaks. Another quote from the report is that on
average, in 2021, a typical company with 400
developers and four appsec engineers would discover 1050
unique secrets leaked upon scanning its repositories and commits.
And each of these secrets is typically not leaked in an isolated way in just
one place. On average, each of these individual secrets appeared
13 different times per secrets in different places
across the code base. Accounting for
all this duplication across the code base. This means that
a single appsec engineer, on average, annually needs
to handle 3413 secrets,
on average. But that said, this is simply not sustainable.
Those poor appsec engineers need a break.
So there's a couple different solutions to this problem.
How do we deal with credentials? How do we deal with secrets in our repos?
One of the purported solutions is of course, just to use GitHub's encrypted secrets.
These are pretty good. Everything is encrypted on the client side and then decrypted
on runtime, so the secret can actually just be injected into the workflow.
And GitHub actually does use a mechanism that attempts to redact
any secrets that appear in run logs or get
exposed in other ways.
However, because during the runs there are multiple ways
secret values can be mutated and transformed, accidental exposure
does happen another problem is dynamic
access, dynamic credential exposure. For example,
say you're using a private key to generate assigned JWT
token to access a web API. Unless you register
that JWT as a secret in GitHub, it won't be redacted and
can be exposed in logs and standard out and standard error anywhere
that that is being printed. Another issue is chain of
custody. So this is important because any user
with write access to your repository has read access to all the
secrets configured in your repo. This makes it very difficult to audit
and keep track of who is accessing your resources, at what time,
and who is doing what with your various secrets. This is increasingly
a bigger problem at scale. It could be easier if you have
three, four engineers, but then once you have that example prior
where there's 400, it's a lot to keep track
of. There's also this issue of duplication.
So in an ideal world, of course, the secrets you are using in your GitHub
actions, repo would only live there and there alone.
However, a common setup that I've seen in the past
is that these secrets will actually be duplicated across various places in
your infrastructure. They might be stored in a password vault, for example,
as well as in the GitHub repository. This is really useful for an engineer,
because if you wanted to manually access a resource, you'll have the credentials at hand.
They're right in the vault, you can look them up
and go from there. They're not hidden behind this GitHub
encryption. The problem is though is that now you have these credentials
floating around in a few different places. This makes it very difficult,
for example, to rotate these creds. Say if an engineer leaves,
a new one joins or a credential
gets compromised. You need to rotate that. You need to now track down all
the different places that you're using this credential and rotate
it in every single one of these. It becomes a lot very quickly.
This also expands the attack surface that would allow malicious
actors to take advantage of these credentials. The more places that you
have these secrets stored, the less secure they are, leading to more chances for mistakes
and compromising developer efficiency. Whenever secrets are added,
removed, or need to be rotated,
another avenue is saying, okay, so we know that secrets
are probably going to be leaked at some point, so we should constantly be
monitoring our repositories for those creds so we can respond as
quickly as possible to leaks. This is where the monitoring
and scanning solutions come in, just like Gitguardian.
So these tools are great and not mutually exclusive with using,
say, encrypted secrets when you can, but really they're just not
quite enough. They're more of a reactive solution that you
can use to do damage control rather than preventing the problem at the resources,
which is kind of always the end goal to make sure that the problem doesn't
happen in the beginning. They also often require manual
intervention. So, say, when a scan picks up a security leak
and a secret gets out there, a security engineer
may be pinged and he'll have to put down dinner with
their family. Then go rotate that cred in the password vault and delete
it the 13 times it appears in the leaked code base.
And for anyone that's had to delete old commit history
from GitHub and absolutely sift through that huge
tree and try to repair it, they know how difficult that it can
can. It's a real mess to delete and overwrite GitHub history to make sure that
commit is fully, fully gone.
So again, these tools are great, these scanning and monitoring
tools, but they just don't go far
enough, and they certainly aren't enough by themselves.
So this leads to the question of, well,
so what can we do about this? What can we do about all of our
secrets?
What if we simply removed the long lived credentials?
Keeping long lived credentials safe is hard. It's really,
really difficult. So the reality is that as long as they
exist, no matter everyone's best intentions, to follow best security practice guidelines,
always encrypt those secrets, make sure they're rotated,
don't leak anything, is that humans
are human, right? They will eventually make a mistake.
And when they do, if it's not properly handled immediately, there could be huge
repercussions. We're talking customer data leaks. We're talking bitcoin
miners in all of your infrastructure costs shooting up to millions of
dollars, et cetera.
And you might stop 99 out of 100
of those leaks, maybe 999 out of 1000.
But eventually one of those secrets is going to make it into a paste
bin file somewhere on the dark web that some kid in Brussels is going to
sell to buy some NFTs or whatever hacker
teens in Brussels do, it's not going to be good.
So one of the ways that we can actually eliminate these
long lived credentials is by using a solution like teleport machine
id for GitHub actions. In teleport eleven,
one of our most recent releases, we actually added support for GitHub
actions workflows. So with
teleport machineid, if instead of managing your access using long live credentials,
you can just join each infrastructure resource to your teleport cluster
and instead use automated short lived certificates. There's no
credentials to manage, there's a rich audit log of everything happening in your
CI CD environments, and you have that chain of custody even for
your automated worker nodes. So this is kind of a higher level architecture
diagram showing how teleport machineid can
interact. For the Kubernetes cluster, the worker node will actually refresh its credentials
on a cadence, getting a new kubeconfig from the teleport host,
renewing its access in an automated, secure fashion.
So in this instance, this machineid worker node
does not actually use any persistent credentials. It has
these short lived certs that it renews from the teleport host,
making sure that there's no secrets to manage, there's nothing to jumble.
And this will actually interact with your GitHub actions workflows
to make sure that you don't have to use any static credentials
in your CI CD workflows.
So let's check it out, do a little demo, see it in action.
Cool. So first what we're going to need to do is to create a
join token. These tokens set out criteria by
which the auth server decides whether or not to allow a bot or node
to join. To create a token, we can write the resources yaml
to a file on disk, and then use the teleport CLI control
tool tcuttle to apply it. Let's take a look at
our token here.
So we have our token, it's pretty simple. We have the name Comp
42 GitHub token. We have the
when it expires, which I just set to the year 2100 for this
example. It's going to be around for a while, but you can set this
arbitrarily as you'd like. And then we have the spec
which contains things like the role. The role defines
which roles that this token will grant access to. The value of the bot
states that this token grants access to a machine id bot.
Then we have cube, which specifies that it will allow the bot to
interact with Kubernetes resources. We have the join method
bot name, which is the name of the bot, and then the GitHub
section. This will be our repo that we'll be running our actions
from. I just spun up a quick little demo repo for the purposes
of this example just in my personal repository.
Dumez Conf 42 demo.
Next, we'll actually create the token resource using that
cli utility I was talking about earlier called tcuttle
in order to create the token, we'll run tcuttle create.
This command will take in the config yaml and product a
resource on our cluster.
Great. Now let's just check to make sure that the token was created successfully.
We can do that by running tcuttle tokens
ls as you can see, we have the name
of our token here conf 42 GitHub token
with the expected type bot and cube.
Perfect. Next thing, what we're going to do is actually create
our bot. This will be the bot that will be running all of their companies
triggered by our GitHub actions workflow. The machine id bot created
in this example will be used to access a specific node on the cluster
via tsh Ssh. Teleport's Ssh utility,
and will therefore require a role that can access the cluster as needed.
This example configuration will apply the access role. However,
care should be taken to either create or apply a role according to the
principle of least privilege in production environments. For this demo,
it doesn't matter as much, but if you're using this in production, you always want
the role to have the least privileges possible. Additionally,
it should have explicit access to the cluster using a username created specifically
for the bot user alone, and do not share this username with any other use
case. So here we have our command tcuttle
bots add comp 42 demo the name of our bot,
we give it the roles access and we input the token here. We also give
it the login Ubuntu. Again, in a production environment,
you're going to want a specific user for this.
Great, so it looks like it worked. Let's just check to make
sure that the bot was successfully created. In order to do this, we can
use tcuttle bots ls and here we have our
demo bot.com 42 demo and you can see
that it has the correct user and the correct roles.
Awesome. Now let's take
a look at this example GitHub actions workflow I created earlier.
Great. This workflow leverages two existing teleport actions,
which first install teleport on the actions runner. Then we'll authorize
the runner by fetching the machine ID credentials
from our bot. Then we'll list the
remote ssh nodes we'll have access to on
the cluster, and finally we'll ssh onto one of our nodes
and actually write the GitHub commit Shaw that triggered the workflow to
a file on the ssh code.
As you can see, we have this Tsh command that will then echo the
GitHub commit Shaw to this file called GitHub run logs.
Perfect. So now let's go ahead and actually
commit this action and see
it go see it in action.
We're going to say demo add conf
42 demo action.
So now let's go ahead and actually push our commit push this
workflow up to our repo to see it in action.
So we've run the git push, we have our bot. Now let's go check on
our actions page.
We can see the action being run. This was triggered because of the
push to main. Now let's
just give it a second and we should be able to see this
action actually run using the GitHub actions runner using
the machine id bot with its credentials. And as you can see here,
we don't have any secrets or any long lived credentials that we're using.
All we're using is this short lived certificate that is produced by the
machine id.
Awesome. And our job ran successfully.
So now if we log into our cluster here,
we should be able to see this is the teleport UI
here. And this actually allows us to interact with our teleport
cluster and check on all of our activity. So if we go to
the audit log, we can actually see the certificate issued
and we can see what the bot was doing.
So we can see that it started the session and it actually executed a
command on the node, Kate's host. And then
we can see the exact commit shot and exactly what was run
on this worker node. Just like
that, we're able to manipulate our teleport resources, our resources
managed by the teleport cluster, all without using any
long lived static credentials stored in our GitHub.
There's nothing to be leaked. And all
of this is fully extensible and fully configurable for your various needs.
So you can do a lot of different things with this. You can manipulate SSH
nodes, you can manipulate databases, you can even manipulate kubernetes
clusters, whatever kind of resources that you have, you can use teleport to manage
your GitHub actions.
Great. So that's a little bit in a nutshell about how
teleport machineid can integrate with your GitHub actions workflows to
secure your CI CD pipelines and reduce the risk of a hack
through a leaked credential, completely eliminating static credentials
from your workflows and making it so that your engineers can sleep a little
bit easier at night. Thank you
so much. I hope you learned a little bit from this talk, and you
should go out there right now and secure your GitHub actions.
Check us out on Slack at teleport slack.com. We have
a great community there. I'm hanging out there all the time so
we can chat. And if you want to learn more, just go to teleport.com.
Remember, we have an open source and an enterprise version, so if you want to
download it and just hack around with it and try it out for yourself,
feel free to do so. It's a lot of fun. Thank you again so much
and I hope I see you at the next talk.