Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm Travis Carey, it director here at Teleport, and today
for my talk, I'm going to tell you how you can get rid of Jira
tickets in your organization by embracing infrastructure as
code and going to give you some practical examples today,
as well as kind of explaining why you should do this and the general philosophies
behind it. So let's get started.
So terraform, not Jira tickets, pass your compliance audits the developers
way and actually improve your security too.
So quick disclaimer this advice primarily applies
to a Sock two type two certification, and I can only 100%
guarantee this advice works because that's the one certification that
teleport has. But I can't tell you, I believe it most
likely works 100% for PCI 27 one
because that's about 90% similar. Should note that NIST
for government certification and socs for public companies
are much stricter. But I do believe, having gone through some of
those processes, that these same lessons can be applied.
Should also note there's a little bit of trash talking of ticketing systems
here, but they do serve an important purpose, and the purpose
is primarily for support teams, and we'll go into a little more detail about that
before. And finally, if you're really embracing infrastructure
as code, you have to remember that IC pipelines have really
powerful API keys, and you have to take a lot of
work to secure those. There's lots of ways to exfiltrate API
keys, and if you're replacing dangerous console access with
infrastructureascode as code, you have to make sure that you have a way to protect
those API keys. And we're not going to cover that in this talk,
but there's lots of resources you can look up for that.
All right, let's jump into this. So first,
why are we doing this? Well, it's mainly because hackers don't care
about your change management process. No amount of
Jira tickets is going to actually improve your security.
So we have to think about why did we even start
doing tickets in the first place? If they're not helping security,
why are we doing it? I mean, isn't the point of doing compliance audits
to actually have your organization's security be improved?
So how did we end up with this process that we
do this via tickets? And there are some good reasons, but they're a
bit dated. So let's look at some of the common reasons that people
decide to start using ticketing systems. So often
people will say, well, we use Jira for planning, so we should also use it
for change management you'll hear because it or
systems work is considered a service organization.
Or finally you might hear from some older IT directors that
we follow itil philosophies and we'll dive a little
bit into what that means if you haven't heard of it before. So first
jumping know teams will say we already use Jira,
so that's just the way it is. But GitHub
is a fully featured tool for both planning
and change management, and GitHub has improved a lot
in the planning front, especially recently.
You can assign your issues to project boards and
have kanban for planning sprints and
doing agile just like you would in Jira. But it's even better because
they're innately linked with pull requests. You can use tags for doing
automation, they're also great for doing auditing, to query
different types of tickets for different views, great for release
management that you can set up milestones. All this is really built in.
And if you've worked for a large enterprise, they might have done a lot of
integrations that are really linking a lot of the Jira
functionality to GitHub. But why do we need two systems?
That's just more work to integrate. We should stay native
using the tools that where developers already are, and that's
GitHub. So the next part is change management.
Change management often happens in Jira because
that's maybe where your business people are but not your developers.
But change management works way better in GitHub.
A pull request is a change request,
but it's better than a Jira change request because
you can't just change what it is. Once you ship it,
when you merge that you approve the exact code or the exact
infrastructure change described there, which is very different than
if you do can. Older form of change management where
you are describing a change in a Jira ticket and then
someone has to manually perform that change. Now if they're manually performing
that change, they could make a mistake or
if their credentials got stolen by a hacker,
any change could be made with those credentials versus doing it the Gitops
way only what's approved via git is what ultimately happens.
So in that scenario, if you compromise developer account,
you'd have to submit a pr and then convince someone else to
code review it and it would also have to pass your automatic tests. And this
is a huge improvement that we all should be very familiar with
in the DevOps world of automatic testing has already started
to really replace QA departments and
these other DevOps lessons can replace kind of all of the other needs for
these manual change management things like if you
work in an Itil organization, they might say, oh, you need a rollback
plan in order to ship your change.
And of course with GitHub, rollback is as simple as a revert.
And finally, to make your auditors happy, GitHub is the best
audit trail. We can see exactly what happened, who approved
it, who reverted it, everything is right there. It doesn't have
the kind of reporting that a lot of people really like Jira for,
but it has a very full featured API and you can write some easy scripts
to pull out those kind of csv files. You need to make your auditors
happy. Let's take a look at the next reason. People often
say that we can't use GitHub
and we need Jira. And so often
it's because they say that it is a service organization.
And remember I mentioned that ticketing queues are
for service organizations. But that's the it world
of old. The new it and systems or platform
DevOps way of thinking is that we're a platform team,
is that we want to make tools that enable developers
to do their jobs better. We don't want to do the work for developers
and make those changes for them. We want to give them the tools
so that it happens automatically. It's all about automation here.
Every team that you include in the process slows the process
down dramatically. I think we've all probably worked in an organization
where you have to go through request process to make changes and
you put in a request to it on Monday, they finally get to it on
Tuesday, then they got to send it out for approval. Then it needs
to get reviewed by the change management board or change
advisory board. And before you know it, you've wasted
an entire week just kind of waiting for the changes to get
approved. And those things are really wholly
unnecessary. The most we should do is have two
people, preferably on the same team, two devs. This is just like
your code review process that you're used to. We can apply the same concept
to all sorts of places that we were used to do change
request tickets. For now, I should note that some very strict compliance
requirements that you might find in NIST or Sox.
They do require approval from an independent second
party. And this is often like an application owner. If you need access to
that application, you need to ask the application owner, not someone
on your team, whether that's okay. So it
still fulfills the plus tool rule. Using clever code reviews on
GitHub and code owner files, you can actually make that process
happen pretty automatically. There are also some
other access management tools on the market that help do
access approvals and things like that outside of Jira,
just quickly and easily, rather than having to do it in a
ticket based workflow. And finally,
talking about why it needs to not be
a service organization is that requests just don't
scale. You can't have an effective dev team
and follow the right developers philosophies of like that. We want to ship
code fast, we want to automate things if there's a manual process
in the loop. So if you have to rely on an
it team to say, complete a DNS request for you
and get that approved, it's just not going
to scale. Because if you're shipping fast or you
have developers all over the world, suddenly you need a
really responsive 24 7365
support desk. And that's just way too much to ask from a lot of
small companies. Building a global team is a lot of work.
I know quite personally it's hard work and it's also stressful
for those teams. And it's a bit of a fool's errand of trying to develop
this, especially at a small company, versus instead
investing your time in creating tools and behaving
more like a platform team. And that allows developers to
self serve, solve issues by themselves within
their own time zone, hopefully with another coworker in that
same time zone. And that's what's going to scale,
and that's what's using to allow your organization to have a competitive advantage.
And it's only when you make your it team start developing
or behaving like a developers team rather than being a
service organization. And that's what really allows you
to ditch the queues and the ticket queues that we're also
familiar with that service organizations rely on.
So the final one, ITil. Now, when you
hear this word, I want you to think of a dead dinosaur, because that's
what ITIl is. It's a philosophy from the
past for when it people were racking servers and running
bare metal compute, that's not the case anymore,
and we need to let it go.
So ITIL was created to give you a history lesson,
folks that have hopefully the luck of not working
in an ITIL based organization. These were created to manage processes that
are really manual and error prone.
So if you're having to rack servers, you have to talk
to a lot of teams, you have to talk to finance and procurement, you have
to plan where it's going to go in the rack and doing rollbacks
is not as easy as just saying oh, git, we're going to revert Git
and it's fine. No, a rollback is a
lot of work where it could take hours to move servers around
to reimage servers to rechange configuration.
That's what this was developed for. It was developed for another era.
So trying to hold on to this is not going
to help your developers teams, it's just going to slow them down.
So it's time that you need to stop following
it philosophies from the pre cloud era. It's a different world now.
We actually can deploy servers with the click of a button. We can roll back
entire data centers worth of infrastructure just by doing a revert
and watching terraform taint and rebuild all the infrastructure
you need to run a modern app. So we need to make sure that the
entire rest of the tech stack that you have is as
sophisticated as deploying your infrastructure
would be with terraform or other infrastructureascode as code tools.
So let's talk about actually
applying some of these lessons. So a lot of people
are familiar with using IC systems like terraform to deploy your
AWS infrastructure or setting
up GCP or those kind of changes, but it's
also really helpful for other parts of your tech stack.
A lot of people don't think about the SaaS apps that are really controlling
this. So this includes things like GitHub and Okta. So a lot
of times those systems are still controlled by
sysadmins who are manually pointing and clicking within the
console to make changes. But when you think about how powerful
those systems are, GitHub controls everything.
If you follow the GitHub's philosophy, and if you're doing proper
access controls with RBAC or the newer ABAC
attribute based access controls, then your directory system like Okta
controls the access to everything. So if we don't let
people manually deploy servers via the AWS console,
why do we let people manually make changes to the GitHub console or the okta
console, which are arguably more powerful and dangerous
because they control all the systems?
So let's think about this. If you use GitHub to
manage your infrastructure, then a compromised GitHub admin owns your infrastructure.
So it's of critical importance that we get rid
of GitHub admins. But if we're getting rid of GitHub admins,
then how do we do the admin work? Probably figured this out.
It's terraform we're going to use. You know,
you can terraform your GitHub instance on GitHub itself.
So you want to apply these principles to
kind of all the things in your tech stack. And this includes like terraform
cloud itself, you can actually apply these lessons
to the same systems they're managing and you should.
So we're going to look at a really short practical example that
we did here at teleport about terraforming okta. So we're going to apply
attribute based access directory rules via terraform to eliminate Jira
tickets for what's a really common thing in can it department is
handling access requests. So this is just three easy steps
and you can apply the same concept to a lot of
different systems. So let's take a look.
We're actually going to have some code in this discussion. So first you want
to understand what the schema here is of the relationship between
kind of the users and groups. So first you need to create a directory
group for every single app. So I prefix these
with app what the system name is. So we might have one
that is app GitHub or app Salesforce.
And that group is used in
assigning to can okta application that lets users in
through the front door, that authorizes them that they can authenticate
hopefully with Sam'l not password via the okta directory
to go log into that app. And ideally that login
should have no entitlements. That should be like a basic read
only role, the least privileged
user that people will want. And then for all the other users
we should create roles for each of those.
So in our code example here, we have our basic group
for Salesforce and we're writing in here some
attribute based rules to decide who should go in
the group. So we're looking at the user profile and looking at what the
department field is to decide who should get access to Salesforce. And we say okay,
it's the sales team or the marketing team in this simple example.
And then for the bigger role entitlements like who's a
Salesforce admin? Again,
we can use things like other
attributes that you could say, hey, you're in the IT department and you're a manager.
Things like, you know, I wanted to call it this example
because there often are weird exceptions. We can't always use attributes that sometimes
we can just name names here and keep it easy.
So if we wanted to add a new Salesforce admin, we could create
a pull request and add a new person right here and
have someone approve it. And I should note that you
should make these groups and
roles even for systems that don't support the automatic provisioning
of roles. So Salesforce, does they support. I can actually assign
the admin role to the two of us because
we're in that group, but not all systems do. But you need to
still create these because it's that important placeholder
for change management. Otherwise you would need to create a Jira ticket to keep track
of this. So we have to keep track of it here. And it's an important
form of future proofing that eventually this system
might support automatic role providing, or you
might decide that it's important enough for a critical system
to write your own integration to
make that happen. So a lot of systems,
like good systems like AWS, Salesforce, teleport also supports
this kind of setup where you can map groups
within Okta for certain roles and then assigning those to
the roles within that group. And you can see the terraform code here is quite
simple. It's just a quick loop
to loop through all the different apps in here and then go
create the groups and then the associated group rule that uses
the attribute based access controls we described to put the people
in the group.
So we mentioned you want to do this
anyways even when you don't have
an automation for it. And that way we've created a
request, approval and audit system that lives entirely in git and
we've eliminated the need for all access requests for Jira.
So the next step is once you do
that, you want to remove the ability for admins
to manage those groups within the
console. And this is a DevOps lesson that a lot of people do
in AWS. When you reach this happy DevOps Nirvana, you actually
take away console access from developers because they need
to make the changes via terraform. So in
this case, we'll actually remove just the permission of group admin
from all the groups that are not managed by terraform. And we should
manage if we can, 100% of your groups in terraform. But if
that's not realistic for you, you can at least do the ones that
control some sort of access based permissions because you don't want to give the permission
to say like an it help desk associate that they
should not be able to decide who gets AWS admin in
your SaaS app. So step
three, you want to alert on any changes made
outside terraform. So this is to make sure that nobody
was able to circumvent your IAC process. And this
is important in providing to your auditors that this was the
only way that changes were made. And it's also a great way
to do security investigations if a hacker was able to find
their way around your process. So you want to connect Okta to
your steam, their security information events management platform.
If you don't have one, and they're quite expensive, you can actually hack
it together using Okta, webhooks and Zapier,
a really cheap low code solution. So what you
want to do is you want to write an alert to fire anytime
a group change is made by anyone other than the terraform service
user. So if someone were able to log in by
any other means, for some reason there was a misconfigured thing. You can also
check for metadata on that. Did the request come from
the ip we expected from terraform cloud? Or maybe someone stole
our terraform service user credentials and
they were able to use them elsewhere. So the seam really helps make sure that
no one got around the process. Now, you should still do an occasional audit
process, going through your logs on like a quarterly or
annual basis to make sure that nothing slipped through the cracks,
that you missed an alert on something that was maybe an
unauthorized change that was not made through terraform.
So finally, any good loop has step N,
and you want to repeat this process until you reach 100%
terraform coverage. So you want to keep doing this for other resources
you'd have in Okta. Your authentication policies,
your application setup, everything you can until you've
reached 100% code coverage. And at that point,
you get to the really cool thing of removing console access
entirely. And at that point, you can create what's called a
break glass. You know, of course, if terraform or
the IEC process breaks down, there's an incident and
terraform is down. You need a way to get in.
And what you can do is create that service user that is
your super admin. And we use one password as our password
store. And I highly recommend it, especially now that in their recent
release, you can also connect it to your steam. And so
we set up an alert that if the break glass service
user's credentials are accessed, that creates an incident,
because the only reason we should ever be using those is
during an incident. And if someone's using them outside of an incident,
they're either breaking the rules or they're a hacker that's trying to compromise
your system. And you want to know that fast.
So that's kind of the process. And if you reach
that 100% coverage, you don't need change
management tickets at all for any admin functions
within that platform. And you can apply these same lessons to
other important systems in your tech stack like GitHub.
Get rid of all your GitHub admins. They are so powerful and dangerous.
You can do it to terraform itself, you can do it to all sorts of
SAS apps and keep applying these lessons. And as you do,
your ticket count will reduce. So you can't just throw out Jira
right now immediately you have to kind of slowly carve away
at it and reduce that ticket number as you increase your code coverage.
So if you're going to remember one
kind of lesson from this whole thing is that tickets are only
for changes made outside of code.
No changes outside code, no tickets.
So remember that we develop
tickets for service organizations and for these older philosophies
where we have lots of manual processes because manual
processes are very air prone. So you have to come up
with these systems to track manual processes, to come up
with plans to make sure you don't make mistakes. But when you do things in
code, we no longer need to do that.
Gitops has paved the way to remove all those manual
processes. So you want to do this completely up and down
your stack, including managing the SAS apps in
the realm of it that is traditionally still done with Jira tickets.
So I hope this talk helps empower you at your organization
to realize that you can apply these lessons.
And not only is it going to make your life easier
for your developers, that you're going to be more agile, you'll be able to work
across many time zones remotely, you'll be
able to get tickets done quicker because you don't have to interact with as
many teams. You're going to be more secure because only the
changes that actually happen in GitHub are what's happening
in your system and it becomes very hard to circumvent that process.
And finally, your it teams are going to be a lot happier actually
working as engineers, writing code building systems
and platforms rather than responding to ticket queues and behaving
like a service organization. So this is really a win win for
everybody. It does require some upfront investment,
but I promise you it's worth it. It's drastically
improved our process and we can't wait to expand
our code coverage to more and more systems because we're already seeing the
benefits in the time that we no longer have to spend
handling access requests. We're now able that spend that time automating
more systems, writing more ise, writing more tests,
improving our theme alerts and all
the other things that we enjoy doing as engineers rather than
responding to ticket queues. So thanks for tuning in today and
I hope you can apply these lessons at your organization.