Conf42 Site Reliability Engineering 2023 - Online

Replacing Privileged Users With Automated Just-in-Time Access Requests

Video size:

Abstract

Managing privileged access to resources has for many been a cumbersome process. As developers often need temporary access to resources beyond their normal day-to-day duties, it often falls on SREs to respond to these numerous ticketed requests. In addition, employees often require to break glass, thus sharing super-privileged admin accounts to circumvent or expedite the process.

With the rise of just-in-time privileged access solutions, engineers can apply the principle of least privilege, and escalate when required, in a secure and effective manner. In this talk, we’ll look at how we can use an open-source access platform like Teleport to securely automate just-in-time-access (JIT access) to its managed infrastructure in tandem with everyday messaging tools used by engineering teams like Slack, Microsoft Teams, or PagerDuty.

Combined with the right role-based access control (RBAC) in place, teams can drastically reduce, or even eliminate, the need for admin accounts.

Summary

  • Travis Rodgers: The title of my talk today is replacing privileged users with automated, just in time access requests. There's going to be four parts to these ideal state. We'll talk in detail about those four parts, and then I'll move to a practical example of what this would look like using the open source solutions teleport.
  • The ideal scenario involves everyone having a least privileged policy. It also includes the ability to elevate privilege upon request and approval. And then finally, there needs to be automation in place to ease this workflow.
  • Just because you're a privileged user doesn't mean you should have blanket access to everything. As far as nonprivileged users, we can define this as users without administrative privileges. They have very specific roles assigned according to the principle of least privilege.
  • The ideal scenario is the ability to elevate privilege upon request and approval. This elevated privilege needs to be just in time or temporary, not perpetual. The fourth piece of this ideal scenario would be to have automation in place to ease these workflow.
  • teleport allows you to access all of your infrastructure in one central location. It does so by deploying this identity aware proxy. To access this proxy, you have to prove your identity. Just in time access is fully integrated with teleport.
  • Open source teleport allows users to request access to a Kubernetes dev cluster for a short time. The more advanced version of teleport can also be integrated with Slack. Here are two simple examples of how this works.
  • In these scenarios we have resource based access requests. So imagine a contractor that just needs periodic access to one of the servers. Everything that comes through this proxy is also audited with all sessions having the ability to be played back. With just in time access, you can move away from super privileged accounts.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, and thank you for joining me for today's talk. My name is Travis Rodgers. I'm a developer relations engineer over at Teleport, where we provide identity native infrastructure access for engineers and machines. Now, the title of my talk today is replacing privileged users with automated, just in time access requests. Now, we could start this off with some history or with a bunch of definitions, but instead I'm going to start off with an end state in mind, an ideal end state for replacing privileged users with automated just in time access requests. There's going to be four parts to these ideal state. We'll talk in detail about those four parts, and then I'll move to a practical example of what this would look like using the open source solutions teleport. If all that sounds good, let's go ahead and get started. So what is this ideal scenario? So there's four parts to it. Number one, everyone has a least privileged policy. Number two, the ability to elevate privilege upon request and approval. Number three, this elevated privilege needs to be just in time or temporary, not perpetual. And then finally, there needs to be automation in place to ease this workflow. These are the four parts of this ideal scenario. So for now, let's take a deeper look at these four. All right. Number one, everyone with a least privilege policy. So the principle of least privilege means giving a user account or process only the privileges which are essential to perform its intended function. So there are two groups. These, there's privileged users and there's privileged users. This principle can be applied to both groups. Just because you're a privileged user doesn't mean you should have blanket access to everything. So, privileged users, now we do need some privileged users. Someone has to create your account as a new employee, right? Someone has to grant you a role or set you up can active directory. And that person often has privileged access to do so, allowing them to do so. These are it managers, these are system admins, database admins, security teams, et cetera. But again, this doesn't necessarily mean that they need blanket access to everything available. Access that they don't use regularly could be requested on an as needed basis and approved by other admins granting that approval. For example, there could be a super admin role, not a super admin account or super admin user, but a role that could be assumed temporarily for duties above the access of these privileged users. Maybe these could be a role for production Kubernetes clusters that no one has but would need to requests in order to access. This would dramatically shrink the attack surface for production clusters I mean, no one has that access unless requests temporarily, but we'll talk more about that later. As far as nonprivileged users, we can define this as users without administrative privileges. They have very specific roles assigned according to the principle of least privilege in relation to their titles or duties. So overall, in both of these examples, we're eliminating blanket super users with everyone abiding by the principle of least privilege. All right, number two in this ideal scenario is the ability to elevate privilege upon request and approval. So this could be assuming a role temporarily, like a super user role or a particular role set up for production clusters or whatever. This can also be accessing some very particular piece of infrastructure temporarily, like a server, a kubernetes cluster, or a database. And there are two parts to this. So number one, a request is made. An engineers requests access to a role or resource. And number two, the request is approved. Another engineer with the rights to approve these requests will approve. Maybe only a certain group can do the approving. There's flexibility in that. So there's two parts to it. There's the requester, and then there's the approver with RBAC that can decide who has the ability to do either one of those. And number three, in our ideal scenario, is that this elevated privilege needs to be just in time or temporary, not perpetual. So there should be no standing privilege or broad user access privileges that are always on. Access for a resource or role is often granted permanently and painted with a broad brush. I mean, it takes time to figure out the exact pieces that a user needs to elevate. So why not just grant the whole group out of convenience, right? You've done it. I've done jit. I mean, as engineers, often we don't know exactly what that person needs, so we just give them more access than they really need. In addition, this access can come in the form of keys or passwords to an account or resource or some other form of shared secret that's passed around between engineers. Think about a Pim key or a kubeconfig. Oh, you need admin access. Just use mine for now. But instead, an elevated access should be a just in time access. Just in time access is these way to enforce the principle of lease privilege, ensuring that users and service accounts are one, given the minimum amount of access needed or required, and number two, only granted that access temporarily. And then finally, the fourth piece of this ideal scenario would be to have automation in place to ease these workflow. Now, without automation of some sort, we have a big annoyance. Instead of someone putting in a request for some privileged escalation and having to wait for it to get process and probably trying to have to reach out or tag someone to look at it sooner than later and then multiplying that by 20 engineers a day. We need instead to make JIT easier to approve these requests. If we have to continually monitor tickets coming in for requests, we can't get our own work done. We need to automate this at least partially to ease this workflow. And I say partially because the whole security of it all required that it be approved by others on the team. Years ago I was an SRE and I got put temporarily on a developer project as a developer, and when I initially tried to build my dev environment, I ran into issues with not being able to access files in Azure storage and we had to create ticketed with the SRE team to get access to that azure storage. And don't ask me why sres are supposed to do this, but they're the ones to contact in these setting. On this particular contract that I was on, we were supposed to reach out to the sres and they would grant us access, but it would often take them days to get to the ticket. And if you tried to reach out directly, there would be pushback. Rightly so. Sres, I understand, and often they would settle with just giving you blanket access to the resource that you needed instead of trying to figure out the particulars of your place on that project. And again, that's out of convenience. Like I don't really know what you're supposed to have or what your teammates have, but I'll just give you admin access for now, and for now turns into perpetually. And this happened over and over as I got further into the project and it was a real pain, but automation could have eased that pain. So that's number four, have automation in place to ease this workflow. So we have these four requirements. Let me go over them again. Everyone with the least privileged policy, the ability to elevate privilege upon request and approval. This elevated privilege needs to be just in time or temporary, not perpetual. And there needs to be automation in place to ease this workflow. Now, before we move forward, I can hear your objection. Okay, it goes something like this. Bad actors can still get access to privileged user accounts and break havoc just because there isn't blanket super user access, and regardless of just in time access, it doesn't mean that a privileged account can't get hacked. And you're absolutely right. However, this is why the concept of identity is so important when it comes to access in general. So let me take a minute to talk about identity. Now, with true identity, like biometrics, you can get rid of passwords and secrets and API keys in anything that can be shared or impersonated. Think about a password and how it's supposed to link identity, but it's not true identity though, as it can be shared or stolen or users by someone else. Just because it's my username and password doesn't necessitate that it's actually me. And if someone gets my password or secret and they authenticate with it, then they're authorized to access what I have access to, and the audit trails falsely show me performing those tasks in any shared secret, granting access to standing privilege is dangerous, and this often happens where an engineer shares his admin access with another engineer, whether a kubeconfig or a PIM key or some other form of secret, so that they can perform administrative duties. And this is normally for convenience reasons. If there is no secret at all, and we are forced to prove identity, then we can successfully use RBAC and just in time access for privileged access. And this isn't just for users, but also for machines. Machines need an identity as well. And with all this being said, let's bring it all together with an open source identity native infrastructure access platform called teleport. And before we get to the just in time access part of teleport and that demonstration, let me just explain to you a little bit about what teleport is. So, teleport allows you to access all of your infrastructure in one central location, and it does so by deploying this identity aware proxy. So this is the teleport proxy. From here, users and service accounts will provide identity to gain access to this proxy. And that's done via web authent. So, biometrics service keys, it also integrates with your SSO solution. And the teleport proxy comes required with an RBAC system. So you're able to provide roles to decide who has access to what infrastructure. So you can access databases, servers, windows, desktops, kubernetes, clusters, and web applications. Intelliport has a built in certificate authority that issues short lived certificates to access this infrastructure. So in this entire setup, there are no passwords, keys, or secrets that can be shared. There's short lived certificates, there's an RBAC system deciding who has access to what. And to access this identity aware proxy, you have to prove your identity and let me actually log in and show you what this looks like. So here I'm at the login screen. And I have different ways to log in, but I'm just going to log in via a GitHub SSO setup. So I'm going to click GitHub and this is going to issue me a certificate to access this proxy for 12 hours. Once inside, I have a role that allows me everything, pretty much. So here are all of my servers. I can connect to a server from within teleport. I have applications, I can do the same thing here. I can launch the application. I have kubernetes clusters, I have databases and I have desktops. So as an engineer, you log in and you can access all of your infrastructure from one central location. If I go to management, you'll see we do that via roles. So if I go to this cube access role, you'll see that I have access to all Kubernetes clusters, because you can assign labels to clusters and then roles based on the labels. So these two stars mean I have access to everything, all labels. So there's a complete RBAc system, there's an auditing system. So here's the audit log of who's logging in, what certificates are being issued, et cetera. But we're here to talk about just in time access, right? So I'm going to log into my own cluster that has that set up already. So here's the login to another cluster that I have set up for this demonstration. And I would normally just go passwordless and touch my fingerprint reader, but since my computer's closed up, I'm going to go back and put in a password. So, Travis admin and then I'm going to touch my security key as a second factor and I'm logged in now, just in time access, of course, is not exclusive by any means to teleport. However, teleport provides a complete, secure, and frictionless solution to your engineers to access their infrastructure and has this just in time access fully integrated with it. So, taking a look again at our ideal state, let's compare this with what teleport offers. So number one, teleport has a built in RBAC system to configure least privileged policy for users. Number two, teleport provides the ability to elevate privilege upon request and approval. Number three, these elevated privilege requests are just in time and they're not perpetual. And then four, teleport integrates with tools like Slack, Pagerduty, et cetera, to provide automation conf 42 ease the workflow of requests and approvals, and we're going to look at all of that now, so we're going to look at three examples here. Number one, we're going to look at a simple role based access example with open source teleport. So let's say we have a user named Bob who's on the security team. He often needs access to the Kubernetes dev cluster a couple of times a week to run scans. In fact the project manager also needs periodic access as well. But due to internal hypothetical reasons, we don't want to grant either team permanent access to our cluster. So what we can do is assign them a role that allows them to request access to the Kubernetes dev cluster for a short time. So let's imagine Bob on the security team needs an hour or two to work on the Kubernetes dev cluster. So if you look at Kubernetes you'll see that there's a cluster called minicube Dev. This is the cluster that Bob is looking to access for a couple of hours. So if we go to management and find Bob here, we see that Bob has a role called Jit Kubernetes dev admin. If we look at what that role does, that role allows whoever has it to request access to this Kubernetes dev admin role. Whoever has this role is allowed to request access to this particular role. So Bob has that, he should be good to go and able to request access to that role. And we're going to use the terminal for this teleport has a CLI called the TSH CLI to do this. It also has a tcuddle CLI for administrative duties which we'll see in a minute. So first we have to log in as Bob. So Bob opens his computer and he opens up his terminal and he's going to log in and then request access to this Kubernetes cluster. So Tsh login, we'll put the proxy address, auth is local and the user is Bob hit return. Bob will punch in his password and touch his security key as a second and you'll see here you're logged into the proxy as these Bob has these roles, Kubernetes is enabled but he doesn't have access to any clusters. And you can see that by doing TSH cubels. What Kubernetes clusters does Bob have access to? Right now Bob has access to no users. So what Bob wants to do is to request access, a just in time access request for this cluster. So in order for Bob to request access to this role or this cluster, he just needs to type in TSh request create and then for the roles flag the role that he wants to request Kubernetes dev admin in giving the reason, the reason he gave is fixing a pod error. So this is all he needs to do to request just in time access to that role. So he'll hit enter creating request and the request is created. It's currently pending and it's waiting for request approval. By default this command will block until the request is approved. To submit this request without waiting for approval though, just add the no wait flag. But what we'll do now is the administrator can now go to the terminal where teleport's hosted and can do a let me see which tcuddle we're going to use these tcuddle ClI now so pseudo and I'm going to do the full path here user local I'm going to do the full path here user local been tcuttle and just do a request ls to see what requests are in the queue. And there's currently the request that Bob put in and it's currently pending and as an administrator you can approve it. And I'm going to make this smaller so that we can see this get approved. So what you'll do to approve it is just tcuddle requests approve and then the id of the request. So I'm just going to copy this id of the request and paste it here and hit return to approve JIT. And over here you'll see approval received, getting updated certificates. Now you'll see that Bob has a role of Kubernetes dev admin and it's only good for 51 more minutes. He put the requests in, it was good for an hour, now it's only good for 51 more minutes. So now he can do tsh cube ls to view his available clusters and that cluster should now be available mini Cube dev and he can do tsh Cube login mini Cube dev to log into the cluster and run Kubectl commands. Kubectl get pods all should give us these pods and after 50 more minutes this elevated request or privileged access will be gone. The administrator can also deny this request with a reason. They can put a reason like hey, we're currently in the process of doing a cluster upgrade deny and Bob will see where he's denied that access. So that's the first example. The second example is an automated, more advanced role based access example with enterprise teleport in Slack. So this gets much better in the enterprise version of teleport where this can also be managed in the UI and integrated with Slack. So in this scenario let's imagine that one of our project managers, Alice, often needs access to the Kubernetes dev cluster. So let's log in as Alice. I'm going to go over here Alice PM project manager. I'm going to put in a password and touch my security key and you'll see that Alice has access to like nothing. There's no servers, no applications, no databases, no Kubernetes clusters. But she often needs access to this Kubernetes cluster. So instead of having to use the CLI she can use the UI here so she can go to access requests. And it's this simple, just click on new request and she has a role that allows her to requests access to this Kubernetes dev admin role just like we saw with Bob. So all she needs to do is click add to request, proceed to request, and then she can put in a reason, need to update a pod or something like that. And when she submits the requests, instead of someone having to check the review requests here, which is also available in the UI, we're going to get an automated message on Slack. So your team is using Slack. There's a group of administrators or whoever that can approve these requests that's going to see that come through in slack just like this. So she's going to submit the request and here in slack we're going to see something pop up. You have a new request, here's the iD, here's the cluster. The user is Alice PM. The role she's requesting is this Kubernetes Dev admin role. And the reason is that she needs to update a pod. Now me as an administrator, all I need to do is click on this link and I see the review here and I can say yeah, that's fine, approved, have at it and submit review. And I've approved the request. You'll see that updated. Also here in Slack status is now approved and Alice falls. She has to do now is go back to listings, check her review requests. And now that it's approved she can just click assume roles. To assume the role, that privileged role. So click assumed roles and you'll see here kubernetes dev admin role assumed and expires in 56 minutes. So this just in time. Access is much better in the enterprise version of teleport. It's still available in open source, it's fully functional, but we get to use the UI for these and the integration with Slack. And again this role runs out in 56 minutes. It's just in time. Now the third and final scenario here is an example of resource based access. So in the first two examples that was role based. So you request access to a role and you get whatever that role allows you. Well, in these scenario we have resource based access requests. So let's imagine that we have a contractor that just needs periodic access to one of our servers. Okay. Just needs periodic access to one of the servers. So I'm going to log out as Alice. We're done with her. And let's log in as Tim contractor. He's a contractor that just needs access to one of the servers. So you'll see this contractor signs in. He has access to nothing, nothing here, but he needs access to one of the servers. Now in this example we're still requests a role, but the user doesn't need to know anything about roles or RBAC controls used under the hood. They just see the resources they're allowed to access. But just to let you know how this works, if I go to management and users, you'll see that Tim contractor has a resource requester role. So if we take a look at that, this role actually allows all of the resources that an access role can access. So you're not requests a role, you're requesting access to resources that a role can access. So we have a role in here called access and it actually allows you access to everything. But we'll see in the case of Tim here, when he goes to access requests, we'll see that he can access particular resources. So applications, databases, whatever we want access to a server. So let's go to servers. There's two that this access role allows you to access. So Tim is able to request access to individual resources this way. So let's say this is the server he needs add to request, proceed to request, just need to update the server. And when I submit this again, it's going to come through Slack and it's going to notify that team that's allowed to approve it. So submit request and open slack. And we see we have a new requests. Here's the iD, the cluster users, Tim contractor. These role is the access. And the reason is just need to update the server. So again you click on the link and me as an approver, I can say sure, let's approve it, go for it and submit review. And if we go back to Tim, go back to listings and Tim can then assume the role. And for this one he has 12 hours. That one wasn't set. So by default you get 12 hours and he can then go to his servers. There's his server and he can log right into it. And let's run a few commands. Ls PWD exit and what's neat about teleport, just to add to this, is that it has a full audit logging capability here. So you can go to audit log and you can see who logged in. Tim Contractor has an issued certificate, started a session, and that server that we just logged into, you can actually playback that session. So if I click on this, I can see exactly what Tim did, which is great for auditing and compliance reasons. So everything that comes through this proxy is also audited with all sessions having the ability to be played back. So for more information about access requests, you'll see here we have documentation on role requests, resource requests, and then the open source role requesting that I did in the first example. There's also more access request plugins. We have Slack, mattermost teams, Jira, Pagerduty, email, and Discord. In addition, the teleport API allows developers to define custom access workflows using a program language of their choice. So with just in time access, you can move away from super privileged accounts. You can see that everyone abides by the principle of least privilege and can request any privileged access easily with automation in solutions like teleport. If you had any questions during this presentation, feel free to email me your questions at travis rogers@teleport.com also, check out our community slack. We're very active there. We can answer your questions@teleport.com. Slash Slack and slash Labs will take you to our interactive labs where you can try our product hands on. Hope you enjoyed the presentation and have a great day.
...

Travis Rodgers

Developer Relations Engineer @ Teleport

Travis Rodgers's LinkedIn account Travis Rodgers's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)