Conf42 Site Reliability Engineering 2021 - Online

Self-service PR-based automated Terraform

Video size:

Abstract

Maintaining your whole infrastructure using Terraform and reusable modules makes most of our lives easier, but when those less familiar with “DevOps” want to create or update resources, you usually either have to train and enable them to use Terraform, or handle the request yourself.

However what if you could offload the execution of those changes to a centralised tool and just review both the code and output being submitted for review? Atlantis, Terraform Cloud or env0 can act as a PR-based feedback loop for a hosted Terraform executor to make self-service a little bit easier.

Summary

  • You can enable your DevOps for reliability with chaos native. Create your free account at Chaos native Litmus Cloud afternoon, morning or evening. Hope you're having a good conference so far.
  • Andrew Kirkpatrick talks about self service pr based automated terraform. Can offload the execution of those changes to a centralized tool and just review both the code and output being submitted for review. Atlantis or m zero can act as a pr based feedback loop for a hosted terraforming executor to make self service easier.
  • Remote state is key to working with multiple engineers. How does that hook into self service infrastructure? Can you collaborate using pool based workflows for some projects, but continue to use a local workflow for others. How do these requests typically come in?
  • Atlantis has a concept of project locking, and this is separate from terraform state locking. Atlantis has a couple comment commands which you essentially comment on the pr, and it will trigger Atlantis plan and Atlantis apply. How does this relate to pull request workflows?
  • In cloud solutions like terraform cloud and n zero with Atlantis, you're going to have to figure but how to inject them. You can either fetch these from vault, given the correct integration, or say you could use Kubernetes external secrets. But otherwise it runs pretty much as you run it on the desktop.
  • A PR based workflow allows people to dip in and out of making infrastructure contributions. It also adds a proper peer review process before execution, which is nice. Some of the disadvantages though is that feedback cycle can be slow.
  • If anybody's interested in engineering at Partnerstack, please take a look at our job vacancies. If you have any questions I'm at magic, but on the Internet, just come and ping me and ask me a question.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Are you an SRE, a developer? A quality engineer who wants to tackle the challenge of improving reliability in your DevOps? You can enable your DevOps for reliability with chaos native. Create your free account at Chaos native Litmus Cloud afternoon, morning or evening everybody. Hope you're having a good conference so far. My name is Andrew Kirkpatrick and I'm here to talk to you today about self service pr based automated terraform. So maintaining your whole infrastructure using terraform and reusable modules makes most of our lives that little bit easier. But when those less familiar with DevOps want to create or update resources, you usually either have to train and enable them to use terraform or handle the request yourself. However, what if you could offload the execution of those changes to a centralized tool and just review both the code and output being submitted for review? Atlantis, terraform cloud, or m zero can act as a pr based feedback loop for a hosted terraform executor to make self service a little bit easier. So infrastructure as code solves some problems, but not all problems. So having a codified representation of everything in your infrastructure, whether that be cloud or on premise, is great, means you can point to exactly what line of code represents what thing. But on the same hand, that doesn't stop people continually bugging you for I need this change made, or can you look at this? Or something's not quite right here? And there are lots of legitimate reasons why people submit change requests. So they need like a new virtual machine for increasing capacity for an existing application, they need to test out a new application, they might need to make changes to a database configuration, all kinds of things. So what is important to keep track of? Do you actually need infrastructure engineers to make these changes, or are there very specific things in their day to day that are actually the more important parts to make note of? So do we need to make sure the changes being made are performed safely? To make sure that production infrastructure isn't accidentally misconfigured or deleted, or making sure that any changes, say, to network access is performed in compliance with whatever network policies you have? Making sure that any changes are actually tracked against specific individuals or particular teams or against specific projects. Making sure that the changes that you made are codified in a way that makes them reproducible so you can duplicate them or roll back in case of accidental misconfiguration. But most importantly, from a pr based perspective, is are we instituting a proper peer review process similar to a pull request perk workflow for regular code changes? And do we have approval by the correct chain of command, making sure that any changes that hit x, y or z are approved by the people that it should be run past. So why automate terraform? So Hashicorp control language is a great way to represent all kinds of different parts of infrastructure with many different vendors like AWS, GCP, Azure, plenty more. And one of the advantages of it is being able to kind of bundle up more complex concepts in modules. So kind of abstracting some of the complexity of I need this specific set of resources to go out in this exact configuration each time and making that kind of tweakable. So using the pre built building blocks that are sort of like that, would you be able to kind of hand those over to developers or other stakeholders to kind of roll out mostly cookie cutter bits of infrastructure, provide you kind of give them the guardrails to do so. And if you did that, how would you actually validate that those changes are going to be correct and make sure that the approval process is there to make sure that only the changes you want to go out and only the changes that should go out. So validation is kind of a key point. So terraform plan on the command line is these are the kind of changes that I'm proposing to make. What are you actually going to do based on the difference between the code that I've got and what's in remote state. So being able to validate and revalidate that, what's being put up, say, in a pull request in this example is accurate to what say the developer originally intended versus what say as an Sre you'd want to double check. Just making sure that that kind of matches up. And from an approval perspective, say if someone's making changes to core DNS records, making sure that say via a code owner's file or some kind of other validation that the correct people are getting notified to make sure they approve. Just in case, say we're making a change that could take down everything in production to make sure that the right checks and balances are in place, not just from an audit standpoint, but from a safety standpoint and making sure, say, if you've got integration between something like Jira and GitHub, that the right kind of workflows happen in other tools such as project management. So what are we going to talk about today? So this is just going to be a brief touch point on each of these topics, just kind of running through terraform how that kind of sits in a self service infrastructure concept and evaluating some of the tools that are there and going through a few examples of how that might work. So terraform at a glance, for anyone that's not familiar, it's actually kind of like a domain specific language representative in what's called hashic code control language of these are things that I wish to exist typically in the cloud, but also for on premise infrastructure. We also use it for identity and access management. So essentially any kind of thing that you want to create, update, update, or delete, fundamentally they just abstract underlying APIs behind terraform providers, which essentially translates how this code looks into API calls under the hood, and it works with many different providers. This is kind of the way that I typically think about it, which possibly not strictly correct, but I think of code as the things that I want to be true. If I've added, then I want these things to be created. If I've deleted, then I want these things to be removed. State is what either I or somebody else last caused to be true. So this is at last known time that we modified state what was true, and then the APIs represent what is actually true as of this moment. So if I'm going to try and make something, I'm going to ask you and you're going to tell me either whether it was successful or whether it failed. And remote state is kind of key to working with multiple engineers, which is typical for a team based setup, but also in the sense of a pool based workflow. It means that you can continue to use your local development workflow to work on all terraform projects as you do typically say, syncing state changes to an AWS S three bucket or Google cloud storage bucket for example. And then you use your central executor to also sync changes there. So it means you can collaborate using pool based workflows for some projects, but continue to use a local workflow for others. So as a basic example, just in case anyone hasn't seen, say this is how I would create a bucket, I would type this, but giving some basic details with some defaults that are provided by the resource, I would initialize it, which essentially because terraform is modular, just downloads the modules and well, those plugins rather that I need to communicate to the backend APIs I could plan, which means these are the API calls that are going to get sent. I would like to create this bucket, I would then apply, which then actually creates the underlying bucket as you can see in the UI. And then we'll write to state saying the bucket got written, and then subsequently if I deleted that code, it will say exists in state benign code so I must need to delete it. And then you see that becomes then reflected in the cloud. So that's great and all, but how does that hook into self service infrastructure? Infrastructure engineers usually write infrastructure code, and application engineers usually write application code, but that often results in the thrown over the wall antipatten. In terms of I've built this application, I don't want to care about how it's run, you go and figure it out, or I manage these servers, why don't you run applications that don't use too much memory? And that's kind of like a typical clash of historical dev and ops days that the whole DevOps ethos is trying to break down. And that kind of creates a couple of problems of infrastructure engineers want applications to be changed so that they don't cause too much memory, too much cpu or application engineers want to make a change to infrastructure because they need a different database, more servers, more capacity. We won't focus on the former, so today we'll kind of focus more on those latter of. How do these requests typically come in? So a lot of this say is either via a Jira ticket, a slack message, tap on the shoulder, other kind of just pop up, hey, I would like this thing, can you do it for me? And that just kind of represents typical toil that infrastructure, or sres, have to deal with on a day to day basis. So self service is designed to allow these people to do these things for themselves. That being said, writing terraform from scratch is pretty daunting. There's a reason why infrastructure engineers are generally the people that write all these complex configurations for, say, how load balancers are supposed to hook up with specific firewall rules, security groups, all of that jazz. So how do we kind of encapsulate that complexity away so that, say, an application developer can just throw up a server that automatically is in the right VPC, hooked up to the right security groups, all of that stuff, and it just kind of works transparently. So I tend to think of terraform modules as kind of classes of HCL, in that you should be able to configure attributes of specific things, but not everything. And all of the other stuff should kind of happen automatically out of the box. So as kind of example, using a digitalocean VM, there's lots of different attributes that you configure on the resource, but if you abstract that behind a module, you can say, either make things directly configurable via, say, the name, or you can use interpolation or even say, ternary statements or hashmap lookups. So say I'm using a string for environment to configure whether I want backups or monitoring to be in place. I'm interpolating cpus and memory into a specific string and then say I'm forcing things like I want this vm of this type to be in this region, I always want it to be in this image to make it consistent across say all invocations of this module or in particular all vms. Say if you wanted to use a pre baked packer image for example. So what options are there to kind of facilitate this from a pull request standpoint? So Atlantis was one of the first options out there, Terraform cloud subsorm that came along as quite a fully featured solution from Hashicorp themselves. Then n Zero is a relatively newer player that takes a slightly different twist on the concept. We'll lives into that in a moment. So Atlantis versus some of the alternatives. Atlantis works purely on the basis of pull requests, whether it be GitHub, GitLab or BitBucket. You can run it in a container typically, but it's just a go application. So anywhere that will run and it just responds to webhooks. So the kind of scope of what it does is very limited and specific to say this presentation. Those other tools are slightly more or less flexible. You can configure it to work with multiple repositories, multiple projects per repository, which is based on directory structure, and you can implement kind of custom workflows which we'll dive into in a bit. Terraform cloud on the other hand, is a solution that runs either entirely in the cloud, entirely on premise, or using a hybrid model where you can run the control plane in the cloud and then executers within your own environment. Bit uses an enhanced back env, so if you're familiar with remote stake for terraform, say if you store it in an s three bucket, it uses a specific special back env where local development and development in Terraform cloud will both communicate to a backend that exists in terraform cloud with some additional features. But it does kind of feature some other slight limitations in that how it works with workspaces is slightly different. So whereas Atlantis is only pr based, Terraform cloud offers many many different ways that you can work with it. So there's an API rest API that you can interact with, there's a CLI tool, it's much more fully featured. The actual confirmation screen for manual approvals for specific things actually happens within the interface itself, so it's not kind of triggered via pull requests, as we'll see later with Atlantis, but the UI is fairly simple to use and self explanatory. When you confirm those steps, you can see obviously everything happens exactly as you would see it normally on the CLI bit has some kind of handy features, like being able to block destroying things accidentally, and it has a lot of integrations in terms of notifying you when certain things have occurred within terraform cloud as an overall platform. Some of the gotchas that I came across is that it doesn't support SIM links, so I use some trickery to link TFRS files into auto TFRS files that's not supported. So it's just one of those things. There may be others, and one thing that got me initially is that bit only supports enhanced backends. Not that it's recommended to work with, but it will actually not. So it didn't read remote state from my GCS bucket. Mzero, on the other hand, uses what they call organization templates, and that's essentially kind of like a one to n carbon copy of any project that you have in a specific directory. So kind of the idea is more along those lines of ephemeral environments. Say if you wanted to spin up a dev environment based on a specific template, this is kind of a tool that's built around that kind of workflow and uses terraform workspaces to do it. It's a relatively new tool, so they're probably adding more features, and they probably have since I originally wrote this presentation, but it's definitely looking promising so far. So the project templates you can see, you can kind of create a workspace name. And this kind of differs from a typical workflow in that you don't pre create workspaces. Say like if you wanted everything to be a carbon copy, you have development, staging, production, you can just create these workspaces ad hoc, and that's kind of the intention, or at least what I took away from trying to use it. So those environments kind of pop up and you get what is intended to be like cookie cutter environments of, I want say this load balancer with these three application servers, one db like that, just like print, print, print and repeat. That makes it quite easy to do. So some of the neat features is that it's got cost limitations built in. So say, if you've got a team of 20 developers making sure that they don't just spin up infinite amounts of environments to test things so you don't run out of money, having things be truly ephemeral. So say if someone spins up a workspace to test something out, you can set how long it's supposed to last and it can be automatically destroyed after. And you can limit the number of environments per user. So kind of, I think it's a great feature for the environmental environments, and there's a lot of features that really help support that. So for the Gotchas, it doesn't actually support remote state. It literally copies state bars out of a working directory within m zero itself, because it runs entirely in the cloud and there's no other way to run it. And it uses workspaces to kind of manage that in a way that you never really see. So everything that kind of happens in m zero stays in m zero, which is useful if you are just using it to create environments on the fly. But for more long running infrastructure, it might not necessarily be the right fit. So how does this relate to pull request workflows? So some people have kind of asked, why don't you just use CI? Like, why don't you use something like Circle CI, hook it into that? You can do it. But there's a few kind of gotchas that Hashicorp themselves actually highlight quite well in their own documentation, one of which is making sure that when you plan on something, making sure that state hasn't changed. In the meantime, commits haven't been added to that, prs haven't been open, and plans run against them for the same project somewhere else. How that gets approved and actually trying to figure out which directory or which workspace on the same directory to work on. So kind of tricky things that you take for granted when you're working on it locally, but from an automated standpoint, those have to be sort of ways for it to identify. So the plan and apply synchronization is just if, say you're running it on CI and the plan can happen on any given machine, how do you get that plan out? File, make sure that it relates to a specific commit, and then have the plan and then subsequent apply happen on the exact same commit. So it's kind of a slightly odd workflow from a CI perspective, which is supposed to validate each commit as being golden and good. You'd have to write it to somewhere, cloud it back up. And then there's kind of the issue of the approval step. So if I've planned something out, how do I decide after potentially minutes, hours or days whether that's something that I want to do and know that things haven't changed in the meantime for something that's supposed to be continuously integrating, that's potentially hiccups. So it's not that you can, it's just having a nice. This is what I said I wanted to do. Am I sure this has been approved? I now want it. I'm now going to let it go ahead. You could do it automatically, but that has dangers. So in the context of a pr, how do you actually get feedback on what terraform is doing in the background? So Atlantis has a couple comment commands which you essentially comment on the pr, and it will trigger Atlantis plan and Atlantis apply, which triggers terraform plan and terraform apply. Correspondingly, it will show feedback of those commands actually as comments itself. If you assign like a machine user in say, GitHub, GitLab or BitBucket, Terraform cloud will only provide the feedback in its own user interface. N zero will also comment back, but doesn't have a corresponding status check, whereas Atlantis has both. So you can kind of see like it'll be churning away in the background. It'll eventually give you feedback on what's going on in terms of locking. So one of the issues I kind of mentioned before is if you're making changes to one project and someone else also makes changes to a project at the same time, how do you decide who goes first? Especially if you've both branched off of master or main? Atlantis has a concept of project locking, and this is separate from terraform state locking. So it will keep separate track of this and go. If I have planned out a pr over here and someone else tries to make changes to, say, development DNS, it'll go okay, you're trying to essentially make a same change to the same thing. This person was first, so they get to go first, and you'll see those locks pop up in the UI. You'll get a notification on a pull request that will basically say this plan has failed because you're not allowed to do it because someone else is first in the queue. That'll then become unlocked and bit will sort of show you. If you want to get this pushed through, you have to get the owner of this other pr to go first. Apply requirements essentially comes back to your version control system workflow. So if you're used to using GitHub, whatever approval workflow you use there could similarly apply, which is kind of nice because if you use things like code owners and people are very familiar with a GitHub, GitLab or BitBucket type workflow, this is essentially the same. So if you need two approvals before it's good to go. If you need code owners for specific people, for specific files, all of that works exactly the same and then mergeable requirements, basically making sure that it's not going to cause a code conflict. So exactly the same as most people are used to. And that I think is one of the kind of comforting things about it is that this is kind of a very similar workflow that people are used to in a lot of cases. Not necessarily all of those is good and well, but where does this actually happen? Where is terraform actually running? In the case of Atlantis, it's deployed into your infrastructure, so it runs from within. Webhooks are sent into some exposed DNS point, so it takes a little bit more configuration to set up. Terraform cloud, as I mentioned before, can be run entirely in those cloud with a control plane in the cloud, or agent pools that run in your infrastructure. So a hybrid model, or you can run it entirely on premise if you play for the enterprise plan. M zero on the other hand, runs entirely in the cloud. But one thing that some people don't necessarily consider to start with is that when you're normally working with terraform, you typically identify as yourself. So I am an SRE. I have these elevated credentials that work in say Google Cloud, AWS, GitHub, pagerduty, whatever provider that you're working with, you'll identify as you and you get elevated permissions on the things that you have access to control. Whereas if you're using a central executor, it typically has to be a service or a robot account that you give permission to on behalf of this one place. And then everyone essentially tells this one thing, the central executor, what to do. So that can be good and bad, depending on your viewpoint. But creating these service accounts is just kind of one consideration, and you need to figure out how to get those credentials into which is easier, say, in those cloud solutions like terraform cloud and n zero with Atlantis, you're going to have to figure but how to inject them. But being in mind that these work exactly the same as the providers normally do. So the configuration for that on your desktop, is it going to be exactly the same for, say, if you run it in Kubernetes? How are you going to inject those credentials into the Kubernetes pod if you're running it in Fargate, which I have done, same kind of thing, you need to just figure out how to get those credentials securely in there, which entirely depends on your security posture, how strict you need to be on that. So this is kind of a very basic example of how you can inject various secrets keys into an Atlantis pod. In Kubernetes. You can either fetch these from vault, given the correct integration, or say you could use Kubernetes external secrets. There's lots of different mechanisms. You can use. Anything which you're normally using for making you sure your secrets are more secure kind of applies here, but this is a bit more of a manual approach in terms of how that runs in the background. You can kind of see those is how Atlantis will figure out how to do things. So it has a Yaml file, similar ish to the other two, where you can basically say I want you to track these projects in those places. When you see changes, I want you to do these things and you can then apply customized workflows on top of that. But otherwise it runs pretty much as you run it on the desktop. Go to this directory init, plan and apply, and then print out the results. If you did want custom execution, those are certain ways that you can do that in Atlantis, much more limited in terraform cloud. But M zero also supports custom flows, which is kind of nice. So a potential weird example of this, say, if you needed to get specific special credentials, say for AWS, in this odd example, you can run custom scripts. So I can inject scripts into the pod, I can run essentially arbitrary commands before and after every corresponding terraform command for the plan and apply. So if I need to generate special tokens, modify tokens, if I need any kind of homegrown lives weirdness as part of your workflow, you can get that in there. So if any provider doesn't do what you want out of the box and you're doing anything funky, you can pretty much get. That's all set up to actually kind of show you how this looks and work. So we're going to edit a zone file, I'm going to delete one of the records that we've got, just as an example. So let's pull that out of there. So this is Google Cloud DNS. So you can see the Atlantis UI on the left hand side. Doesn't really do an awful lot. So I'm going to commit that change to version control, push that up, and then I'm going to create a pr off the back of that to say this is what I want to delete. I want my automation to make this change for me. So you see that pop up. I'm just following the link, put in a comment, explain to my team members what's going on and then hope that the relevant people will come along, review it. Obviously I haven't technically reviews in this example, but get the point. You see, the Atlantis plan will happen in the status check. Eventually it will comment back and say this is what I plan to do. I'm going to delete this DNS record because that's what you said you wanted. So I'm like okay, great, Atlantis apply, let's get this DNS record blasted. You go to apply that and the comment you'll get back is you need someone to approve this. I'm not going to do it, so go and get someone. So special person comes over, reviews, everything looks good. So let's comment Atlantis apply again. The way that I've got my workflow set up, once everything's good and golden it'll apply and I set it to automatically merge the branch for me and then delete it off the back of that in terms of how the locking looks. So say we've already got a pr up that says I want to delete this record. I'm being to add the change in, pop the record back in. So let's say, yeah, I want this record back. Why is it gone? I needed it. So we'll put up the branch, create a PI just like before, open that up and then we'll let Atlantis work away in the background and you get an error message which will say this project is currently locked by unapplied plan. You take a quick look in the UI and you can see that pop up. It's for this repository, this project. So you can click through onto that and go ah, okay, so they need to go first that's locked. Once that goes through, then I get my turn. So as for n Zero, say I want to create a project environment. I will go to a project template which is going to be essentially a project somewhere in my repository, in a version control system. I'll say I want you to make a new workspace, give me a new project environment that's based off this project. So it's going to go through, clone out the repository, go through very similar steps, initialize everything and then it will give me a plan and we'll wait for approval. Say I'm being to make these resources that are in this project, like this is what I plan to do, do you want to do it? So manual approval, step in those UI here, you'll go through there, think about it some more and eventually apply it and create the environment for you. See that everything was created and then we're good to go. In terms of a pull request workflow. Say we're going to jump into our zone file here again, delete a record, and then we're going to commit that and push that up. Same workflow as usual, but up a pull request. So once the change is up, go through the same process as before, add a comment for clarity. And then once we've created it, you'll see that M Zero will eventually comment back saying this is the plan of things that I plan to do. Takes a little while, and then success, we flip back, you'll see that bit essentially comments back similar to how Atlantis did, saying I'm going to create these resources for you. Once that's all approved by somebody, I'll be able to get that merged in. Then that will trigger an actual apply. So this is kind of the difference between this, say an Atlantis and in fact terraform cloud in that you will get the plan and kind of like a preview of what changes are going to be made ahead of time. And then once those have actually been merged back into your master or main branch, that's when the apply happens. So just kind of a difference of what happens before or after. You'll get the manual approval step here, which appears in the UI itself saying this has all been merged into main. This is what I'm going to now create for this project environment for this workspace for this project. Are you sure? Rolls ahead. Terraform apply creates that and then just gives you those output. In terms of terraform cloud, you can run a plan manually, say via the user interface that looks fairly similar to the other two. In terms of what it looks like. You'll see a plan. What I plan to do, you'll get a manual confirmation step similar to m zero, which you have to agree to in the terraform cloud UI apply finishes. All good. And that's pretty similar. In terms of how that works in a pull request workflow. Very kind of similar. I'm going to delete a record from a DNS server again. Let's delete it, get that committed, get that pushed up to a pull request. So you'll see the change comes up. Let's create that, add another comment for clarity. And then you'll see that terraform cloud will show the outputs of what it plans to do off the back of that. So you'll see that show up as a status check, but it won't actually comment back on the PR itself. Once I've got an approval from someone, I'll get that merged in. So that's merged into Main, I can then see similar to m zero, but more similar to how Atlantis works. And I'll see the apply actually come up here with their manual confirmation step in the same way. Let's say, okay, these changes now remain, it's good to go, let's run that apply, let's get that approved, and then that's now applied and out into the wild. So what are some of the advantages and drawbacks of a PR based workflow? So some of the advantages are that, say, if you're working with people that need to make infrastructure changes from time to time, but they don't necessarily have everything checked out, set up good to go, because that's what they do day to day. It allows people to kind of dip in and out of making infrastructure contributions, which is nice for people that need to make changes now and then. It also kind of adds a proper peer review process before execution, which is nice. So I imagine myself and probably many other people make sure they actually terraform, apply certain things to make sure that things happen. Because as anyone that's used more early stage terraform providers know that sometimes just because something looks good on a plan doesn't necessarily mean it will work. Also, there can potentially be conflicting things in, say, AWS GCP that aren't necessarily apparent when you try and make changes via the API. Sometimes it's nice to catch things like this. In this case, say with Atlantis, you'll catch that pre pr merge, which is kind of nice bit ties in nicely into other workflow automated tools. So anything that else that hooks up to your version control system, say, if you want to hook things into Jira, like for full auditability, making sure that fires off this, that and the other to any other systems, just to make sure that checks and balances are in place, it potentially decreases your credential theft. But the flip side of that is obviously now your credentials are in one place and it can alleviate some of your kind of toil bottlenecks. But a lot of that is going to depend on people's familiarity and comfort with a peer review process. So how kind of streamlined are your prs flowing through normally? How are people used to that kind of workflow as it is? Also sort of how well documented is not only your own code base, but say the providers that you're using. So if you're using some of the providers that are less well used, less well known, that can be a bit more difficult for people who are not familiar day to day to kind of drop code in and making sure that the code that's been written and preexist is actually easy enough to work with. So if you're not making efficient use of modules, if you haven't segregated your projects up into small enough chunks, it can be a bit unwieldy to work with. And that can be scary, especially for someone seeing feedback on a pull request. I didn't ask to delete 100 servers. What's going on can be scary. Some of the disadvantages though is that say especially if you're developing a module, the feedback cycle can be really slow if you're fully reliant on a central executor or remote executor, rather to be previewing what you want to do. So like me committing code, pushing it up, waiting for something in the cloud to churn away and tell me if that's good or not is a lot slower than me just developing it locally. So I typically develop modules locally for reusable for other people and then once they're kind of tested and good to go, start hooking those projects that use those modules up to self service. One somewhat controversial point, depending on how identity access management is handled at your company, is that you then kind of move some security controls from say like my AWS account, my GCP account, to version control. So you then sort of delegate power to my GitHub account, my GitLab account. That may or may not be desirable or bit may be completely anonymous depending on your security model. As mentioned, having kind of a skeleton key, one thing that has credentials to everything can potentially be a risk, but really you can apply those same kind of secrets management technologies that you would to the rest of your infrastructure. So in theory it's no more or less vulnerable, depending also making sure you don't host it in places that are like a big attack vector. So not putting in production like we run it in a separate cluster, you've got to maintaining yet another thing. So Atlantis is pretty easy to set up, but at the end of the day it's something we do have to manage and keep an eye on. And it can be problematic if it runs on some of the same infrastructure that it controls. So trying to make cluster updates to the same cluster that it's on, it's not really much of an issue, but it can catch you out if you're not paying attention. So why did we choose Atlantis open source? And being free to use was a big one. It's quite well maintained and there's been a lot of contributions to it. So it's not like a dead project. It's pretty active, to be honest, it's pretty easy to use. There's not a huge amount of functionality in it, but of the functionality that we needed, it covers most of our bases. And the flexibility with custom workflows and being able to inject firebase configurations, say for managing multiple kubernetes clusters is kind of useful, which was less obvious to do with the other tools. But as I mentioned before, it's been a little while since I evaluated terraform cloud and Mzero, and they've been moving at very rapid pace, both of them. So definitely worth checking out, sort of where their feature set lies at the moment. So in summary, Atlantis is great. It runs entirely on premise, which is quite nice for some people, but you are going to have to set it up yourself, figure out how to run it, and you're going to have to manage how secrets get into there. So a little bit more to be concerned about. It's not out of those box. M zero seems like a great solution for I want to allow people to make sort of environments on the fly. I want developers to quickly test out features with production like environments. I think it's an excellent, or it looks like an excellent tool for that. Terraform cloud on the other hand, is kind of the big guns enterprise solution. So you can use that. Everything from it covers everything that Atlantis does all those way up to managing big company stuff. So I think the feature set there is endlessly growing and I think for a lot of companies that's probably going to be the logical one to choose. But it really depends on what you're after, especially because their level of support. Know there are plenty of people at Hashicorp that can help you out with that. Thank you very much for listening. I'm Andrew Kirkpatrick. If anybody's interested in engineering at Partnerstack, please take a look at our job vacancies. The link to this slide deck is on slideshare and if you have any questions I'm at magic, but on the Internet, just come and ping me and ask me a question. Thank you very much and hope you have a lovely day.
...

Andrew Kirkpatrick

Staff Engineer @ PartnerStack

Andrew Kirkpatrick's LinkedIn account Andrew Kirkpatrick's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)