Conf42 DevSecOps 2024 - Online

- premiere 5PM GMT

The Power of SecretOps: Automating Secrets Workflows

Video size:

Abstract

Modern trends in app development focus on code and compute, often overlooking the importance of secrets and configuration management. Let’s explore “SecretOps,” a layered framework for automating secrets workflows.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi there. My name is Nick Manoogian and I'm the head of engineering at Doppler. Today, we're going to be talking about secret ops, but before we talk about that, I want to talk about software. Every app has three components, the code, which is the logic of the app, the compute, which is what runs the code, virtual machines, Kubernetes clusters, serverless functions, things like that. And then there's secrets, which are the. Configuration for the app, these could be sensitive values like API tokens, encryption keys, or they could be non sensitive values like database hosts and port numbers. I want you to think about the automation that your org has for each of these three. For code, we have things like Git and GitHub and CI for managing the lifecycle of our applications and deploying them to where they need to go. We've got compute like Kubernetes, IAC, Terraform, and Pulumi that are responsible for managing how we deploy our code. And then we've got secrets. What do we have there, right? There's usually a native store in our compute systems for being able to inject these values, but where is the Git like maintenance of these values? Where, how do we, how are we tracking changes over time? Every organization has some processes for managing their secrets, and we're calling this secret ops. Every organization is already doing these things to some extent. And I'm trying to propose a framework here for thinking about how you are managing your secrets on a day to day basis. I want to make a small disclaimer that as a Secret Ops tool, we do I don't have a product in this space and I'm going to be showing it throughout, throughout this, this talk, but I am not trying to sell you on the product. I want to sell you on the framework. So this is a layered approach and these layers don't really correspond with particular features. They're more broad categories. And I think it's useful to think about these as abstractions for the processes that you already use today and where you can improve as an organization. So let's start at the top with. I like to tell a story about, my first job out of school, I was working, at a shop that was developing an app in Python. So I cloned the repo, I ran app. py and I got an error about a missing API key, the Google Maps API key. So I went to my manager and he told me that I should go to Steve for the ENV file for this, project. So I went to Steve and he got back to me after a few minutes over Slack, or I think maybe it was HipChat at the time. And, he gave me his ENV file. I was wondering, do I need to change any of these? Are they unique to Steve? Do I need to update, my, any of the values in here? And, how do I add values to this going forward? I'm getting, was planning on integrating with an email service. And do I just add the SendGrid API key in here? How does it make it to the other developers? like they absolutely don't know how to add it for staging and production. And the point of this is storage is more than just keeping secrets in a secure place. It's about centralizing them to make sure that everyone knows where to find them and where to put them. And this kind of ties to this idea of sprawl, which is a problem that happens when there are many secret stores. And it becomes a little challenging to know where everything is. So if you're running your monolith, you've got secrets for that stored in AWS Secrets Manager. But then there's also secrets for CI that are in GitHub Actions. And there might be some overlap, right? There's like a service that's used in both. They're both posting messages to Slack. And pretty quickly you lose confidence that you understand where all of your secrets are being used. And if you find one in the wild, or if you're trying to rotate something, you're not quite sure what's going to happen next. and that's really the core problem at this layer is how do we get a single source of truth. For where secrets are stored and make sure that everyone knows where that place is. So I promise to not sell you on Doppler. I'm going to try not to, but I do want to point out how we're thinking about the problems at each of these layers. So for the storage layer, this is how we're thinking about it. When we're talking about secrets, we're talking about secrets for a particular environment. So I've got a list of secrets here that are all used to run the backend project, right? The app. py example. And some of these values are sensitive. Some of them are not. And addressing that being forthright about that is useful. If I'm a developer and I want to add a new API key, I know where to put that. I'm want to make an edit to the development environment. So I go to dev and I add my SendGrid API key. I like to call out that this is just a random value. It's not actually a SendGrid API key, but you're welcome to try it. I add that API key, and when I go back to the project level, I'm getting warnings in staging and production that secret is missing. Because presumably I'm going to commit some code that requires that secret, and I don't want to take down these, these other environments that might need it. So these are the kind of benefits that you get from centralized, from a centralized secret store. You get this sort of cross sectional view across environments and projects as a whole to be able to get this visibility around the parity that should exist in these different environments. Okay, centralization is good. We've got one store where everyone is working out of the next layer is governance. So it's really up to teams to determine who should have access to what at Doppler. No developers have access to the secrets in any production environment, but not every org works this way. There are disadvantages to doing that. No matter what your policy is, though, you should be able to set up rules and your secrets manager to make sure that you can get access to your values. And this is particularly important in onboarding and offboarding. When a dev joins the team, they shouldn't need to go find Steve, right? It shouldn't be a whole thing. They should get access immediately to what they need to do their job. When they leave a team, they should be able to lose access immediately without having to, try to claw out an EMV file from them or having to track down who has access to what. it's a, it's an important thing to be able to grant and revoke access whenever, you of the handler as well as other variables or context, assigned, by user whenever it's required. And another section of this is, tracking secrets, just like Git, you wouldn't ever want to try and develop an app without being able to see how your application is changing the context. Over time, the same is true for secrets. If you've got a static secret store and you aren't able to see how changes are happening over time, there are huge risks overwriting data could result in loss that you can't get back that data. And you're also missing a lot of auditing information around what's changed, who changed it, it can make it very hard to determine what happened. So I won't show, how we handle roles and groups in Doppler. Cause it looks pretty similar to other tools, but I will show our audit logs, particularly for writing. So when you're evaluating, your posture, in storage and governance, it's useful to see exactly what's been modified, what it looked like before and after the change. And it's also useful to be able to easily roll back to earlier versions. So I'm looking at this red green line diff here for what's been modified. I can see I've modified, the password for the database, for example. And if there was a typo in that, or I need to roll back to that earlier version or see what it was. I can click a rollback button immediately and jump back to that earlier version. really useful kind of table stakes. I'll, if I'm honest, this is something that everyone should have. So once you've got, access control and, you have the ability to roll back, the last bit of this is being able to gate that access control. The ephemeral access that people need. as I mentioned, nobody at Doppler has access to the production environment, but that doesn't mean that they aren't, that it isn't important to them or, that it isn't a part of their job to introduce a change to that environment. So the way that, this is handled at a lot of orgs is file a ticket with the ops team and they'll take care of it. And one of the things we're seeing in this space is. Tools allowing teams to self serve these sorts of operations. So instead of just. I'm just handing this off and hoping that it gets done, giving developers and, and folks on these teams, the ability to submit changes for review, and then still maintaining control over the approval and application of these secrets. So what this looks like in Doppler. Is a change request. So instead of making a change directly to an environment, for this centigrade API key, for example, as a developer, I can propose a change. So I've got three values in each of these environments that I want to merge into the backend project. I can review how this is going to impact those environments. I have access to the development and staging environments, so I can see exactly what's being changed, but in production, that's not visible to me. So all I know is that there's a new value going in. Then the team members with approval access are going to get notified and they can improve, they're able to approve individual updates. So dev and staging, they've gone ahead and approved that. And the one for production is still waiting to be approved. Once it's approved, either the author or the approver is able to apply that into the environment. So providing a. You can call it ephemeral access or, workflows from making modifications in a more controlled way. So we've got storage and governance down. Orchestration is really, where things start to get interesting because all the earlier layers are useful, but we're not really doing much more than storing secrets in a password manager, right? Which is exactly what Steve and I were doing. At the company that we worked for, so one password was our source of truth. And, the orchestration layer is like the CICD bit, right? How do you actually deliver these secrets to where they need to go? And Steve was responsible for this. he needed to be super diligent about making sure that a secret for staging or production. Ended up in one password and he needed to SSH into the servers and update the secrets. json file. And we've absolutely had downtime because of a copy pasta error or forgetting to copy it over entirely. He just added it to one password and it's been many years since Steve and I worked together and their processes have improved. but this is a real problem for a lot of businesses. So how do we solve it? there are a couple ways to go about it, but at Doppler, our approach is to meet you where you are. So we have a hub and spoke style model, where we see Doppler as the source of truth and it's responsible for sinking secrets to wherever they are consumed. So let's look at what that looks like in a couple of different spots. It's just like the hub and spoke visualization here, but for a developer running locally. And the CLI might be the easiest spot to consume secrets, right? they're going to be working in, app. py, right? So the CLI is the easy place to use those things. In a Kubernetes environment, using an operator to sync those secrets into native cluster secrets might be the easiest way to, to use those. In an AWS Lambda environment, Parameter store or secrets manager might be the right spot. So treat Doppler as the source of truth and then be able to sync secrets to wherever you need them. And the last bit of this, it's interesting to talk about is automatic redeployment. So this is optional, but there are some really cool benefits to being able to layer in. automatic updates. So the idea here is that you wire up Doppler with either the Kubernetes operator or with webhooks to trigger your builds or redeploy, redeploy your workloads whenever a secret is modified. and this is particularly nice in conjunction with CRs because you're empowering teams to propose a change, you approve them, and then they deploy them when they're ready. And with this orchestration set up, there's a, you have a lot more knowledge around how secrets are being used. You have confidence that the services that are relying on these secrets are actually going to consume the new values that you are adding to the system. Move through here. Great. So this is definitely where Steve and I, Could not go further. We did not have the orchestration layer and we wanted to rotate secrets and we talked about doing it, but it just seemed too risky because there was certainly some service somewhere that had the key that we wanted to rotate in secrets. json and we didn't have the confidence that we knew where all these things were. This is the ugly end for secret sprawl, right? you really don't want to be in a position where you're choosing between security posture and risking your uptime. So orchestration is the solution to this. Once orchestration is set up, you have the confidence that you're able to rotate a secret whenever you want. You can just go to SendGrid, issue a new API key and save it right back into the application. They're going to restart either on the next deploy or automatically, if you have that set up and. You can actually take this a step further. If you have the confidence in your orchestration, you can actually go forward and automate this. So we do have an integration with SendGrid. I'm showing here Twilio. And in this example, we have Doppler hooked up with Twilio directly, and it's responsible for issuing and revoking Twilio API keys that are going to be consumed in the app. So I've got a policy here set up. It's a little. Small and hard to see here, maybe, but, policy to rotate every 30 days, or you can manually rotate in an emergency. And what's really neat about this is when you have automatic rotation set up, no human has ever handled or seen that value. And I'll talk a little bit more later around why this is so useful. But at this stage, you've got automation set up. That's coming from the confidence that you have in your applications, being able to consume the secrets that they need to run. That's really the key here. So every time a secret comes across the wire, it increases the chance of a leak, right? A secret is basically just a vulnerability, right? That's like another way to think of it. It's a portable identity. And when viewed this way, a secret read is just as important as a secret write, right? We should be tracking these events. we should be able to audit these things. And when you're looking at this data, historically, you can answer questions like a person left the team. I need to know every secret that they've seen. Are there any services that are using this key? Can I revoke this? Is it safe to do And it can also help you track down leaks if they do happen. So you can imagine that secret that's being rotated every 30 days. If you find a API key hanging out somewhere, it should be a pretty short list of human and non human identities who had access to that particular version of the secret. So let's play through that. we find this key. somewhere we know it's like the password for the database and it's leaked in the wild, right? You can do a global search in Doppler, find where it's being stored. Okay. It's being used in the backend environment, in, in the backend project in, in production. So then we can look at the access logs for that secret. And for the current version, we can conclude that only me, Nick, and these sync bots that are responsible for syncing this value. Into Azure vault and, and AWS secrets manager have seen this value. So I can conclude it was either Nick or it was one of these identities. I should go look at how they're being used. I should look at the audit logs in these services and track down what happened here. Super powerful stuff. So what is secret ops? It's an automation framework for secrets. That's really what it is. You've got Git and GitHub, which are 15 years old at this point. You've got Kubernetes and Docker that are 10 years old at this point. Where are the tools for automating secrets? I'm hopeful that I've convinced you to take a closer look at this framework and to look at these layers and really see where your org stacks up. Do developers know how and where to store secrets? Is access configured in the way that you want it? Are secrets delivered to where they need to go? And are you able to effectively rotate secrets and quickly adapt to leaks? It's worth thinking about and you've made it to the end. we're really excited about secrets. And if you are too, please feel free to drop us a line. My email is nickatdoppler. com. We would love to hear your feedback. Thanks again for watching and hope to see you all soon.
...

Nic Manoogian

Head of Engineering @ Doppler

Nic Manoogian's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)