How CircleCI's Field Engineer team use custom Terraform modules and K8s to deliver globally distributed & highly tailored enterprise demos

Video size:

Abstract

Fully automated “customer enterprise reference architecture” deployed to 3 AWS regions from scratch using Terraform and EKS. Teams can run existing automated demo apps immediately or request a new custom app across all clusters in <15 minutes.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Welcome, everyone. And thanks for joining us today at Conf42's kube native 2024 conference. Today, we'll dive into how CircleCI's field engineering team uses Kubernetes and Terraform modules to revolutionize the way that we present enterprise demos. We're going to explore some of the challenges we face, the solutions we built out, and the impact it's had both on our team and our customer interactions since. Firstly, let's introduce ourselves. My name is Eddie Webinaro. I am the Senior Director of Field Engineering here at CircleCI. We are a global team supporting North America, EMEA, and the Japan ANZ region. I'm joined today by Mualanian, who is a Senior Enterprise Field Engineer with the team. He's got a really interesting background, embedded firmware devices, deploying software into submarines. Myself, I spent a career in software development as well, but lamer tamer traditional cloud infrastructure type stuff. But we're both really excited to be here today and present this conversation. What is that conversation? We're going to do a a brief overview of what we're talking about today, and I want to give you a quick demo of the product. Then we're going to dive deep into, and take a step back, and really talk about what the problem was, what the challenges were we were seeing, how we came up with the solution. And really how we built it out to make it to easy as deploy as possible, whether locally in our own environments or actually for demos out in a customer's environment and giving them the chance to get hands on. We'll then wrap up what Sarah looks like today, how it changed the way our field engineering team interacts with potential customers. So that all sounds good. Stick around. Let's get started. Now, I got to give a little credit here to one of my former solutions engineers, Aaron G. He once told me you always got to start the conversation with a finished cake, right? If you think about those traditional cooking shows, here's the beautiful dinner that we're going to make tonight, right? And they put it away and then they jump into the conversation. And so that's what I wanted to do with y'all today, right? And so in our case, The finished cake is doing a demo with a globally distributed customer, right? And it just works. All the magic's there. It happens right out of the box. We're talking sleek, ready to wow enterprise clients without skipping a beat. Our solution mirrors those enterprise environments of customers, enterprise type infrastructure controls and concerns and set up with just a few clicks. And so by the end of the presentation today, you'll understand how we built this powerful system and how it impacts the way that we deliver demos here at CircleCI. So before we hop into the demo, I want to take a look at what it is we're going to show you here. I'm going to show you triggering a deployment Through CircleCI pipeline to roll out new releases to an EKS environment across multiple regions. So we've actually got clusters deployed across North America, EMEA, and Japan. We're going to show you how the CircleCI releases dashboard. We'll go through a series of steps. to incrementally roll out these new features as they're released or automatically roll back when an error is encountered when possible. This is just a quick demo to show you what the application developer experience is like, how they would interact with CircleCI in a typical enterprise environment. The key here is that we're deploying an app automagically without ever needing credentials that a developer might see, leak, or be exposed to, and without the need to have manual intervention if something goes wrong during the deployment. So it's demo time. And so this is the finished product. This is our Sarah and enterprise environment. And what's going on here is a deployment demo that just showcases a few of those capabilities. So as you can see, we have three windows upon the screen. On the left hand side is a circle CIS pipeline page. I'm actually going to trigger a deployment to run a build and deployment pipeline to run against these environments in North America and a Mia. which are the two windows on the right hand side. The app we're using is a colloquial named Doctor Demo. It's a deploy and release demo, hence the Doctor DR. And it's actually going to show you how we can incrementally and progressively roll out these features. So the left window you'll see now has updated. We're actively building these applications across two different jobs, one for the North American region and one for the MIA region. As these jobs Progress and actually start to introduce this environment into those versions. You'll actually see the colors on the right hand side change. So the colors represent the current app version. And you'll see us introduce a new color to each one of these screens. And you can see that on the top we're already introduced a yellow version. And on the bottom we're going to introduce a green version. And there it goes. We can jump over to CircleCI's release dashboard that I mentioned. On the left screen here and actually see those releases as well. So now this is being monitored from the Kubernetes cluster and sending information back to CircleCI about how those rollouts are progressing, whether they're succeeding or failing or needed to be rolled back. I'm actually going to intentionally, you can see on the right hand side, I've dragged the error rate up to this 200%. So I'm intentionally causing this EMEA instance to send bad bad results back to the server, basically invoking failures. This should be enough to trigger Our analytics to say, wait a minute, something is wrong with this release. I'm going to leave the North American release up top healthy, so that one should continue on normally. And the window here we're looking at is that yellow release, it's coming online, we're about halfway through, and it's basically increasing the amount of traffic that is going to each one of these applications. Now I'm clicking into that failed release. It was a failed release because I intentionally set the error rate high. CircleCI's release agent detected that in the rollout and said, Wait a minute, we've exceeded our error limit for this application. And so now on the bottom you can see the traffic was incrementally increasing the amount we were sending to this new version. It actually completely stopped that and rolled back to the prior version. Meanwhile, in North America, though, we didn't encounter those same errors. Whatever was different in the environment, i. e. Eddie mucking around with it it's succeeding. And so we're going to let that rollout continue. And you can see that both in the CircleCI releases dashboard, as well as the environment there. So this is just one of those apps that we can use with customers to drive that kind of demo conversation. Now, to elaborate a little bit more on what we saw and go deeper into what our solution actually looks like, this is a point I'm going to go ahead and hand it off to Mu to walk through how we went through this process as a team. Thanks, Eddie. To elaborate a little bit on what Eddie showed us there, this environment is fully built on infrastructure as code. Everything is automated and repeatable, except for the initial domain purchase and IAM role setup, which are one time manual steps. Thanks, Eddie. It runs on Kubernetes, specifically on EKS, giving us the flexibility and scalability needed for enterprise level demos. We've implemented Istio as our service mesh and Ingress controller, with tools like Kiali, Grafana, and Prometheus to manage, monitor, and visualize what's happening in the environment. Now we've showed you the solution, but we didn't start there. Let's go back to the problem we wanted to solve. Now, the problem was clear. Each demo needed to be unique, which meant we were constantly reinventing the wheel. music ends We lack standardization, and that caused friction amongst customer trials. And guess what? It was a mess. Repeated work with too much manual effort led to demos that were missing the mark with enterprise clients, big time. Without infrastructure as code, security was a concern, and the manual steps slowed us down. Enterprise teams expected better, and we knew we had to step it up. We have two main goals when building this demo environment. First, we wanted to reduce the amount of effort that it took to deliver clean, consistent demos. This meant creating a highly available demo environment that could be spun up quickly, without needing manual intervention or endless preparation. The more reusable we could make the environment, the less time we would spend reinventing the wheel for each demo. Second, we wanted to elevate the conversation we were having with customers instead of just showing them a static demo. We needed a dynamic, scalable platform that mirrors the kind of infrastructure that they're running. By doing this, we're able to provide a more consistent experience across different enterprise roles, whether it's the dev team, security team, or operation teams. Everyone can see how our platform fits their specific needs. How did we get all of this started? Let me tell you, it wasn't an overnight process. It took a lot of collaboration and late night discussions across teams. We started with technology chats, where our team, spread across North America, Europe, Middle East, and Asia, and Japan Asia Pacific worked together to select the right tools and technologies. We had to ensure everything aligned with customer needs while fitting seamlessly into our own ecosystem. Next, we had vision shots. These were more strategic, where management and field engineers focused on defining the impact that we wanted to achieve with this environment. It was about telling the right story, one that demonstrated real value to customers. Initially, our goal was to create a reference architecture that customers could trust as a close reflection of their own infrastructure. Now, while we worked on this, the team continued their regular duties, generating leads and closing deals using our existing methods. But as the architecture took shape, it became clear that this new environment would become central to how we run demos going forward here at CircleCI. And personally, in my own deals, it has really changed the way that I interact and engage with prospects. Having this consistent. And scalable environment to demo in real time has removed a lot of the friction and made my conversations more impactful and relevant to potential customers. So what did we come up with? After laying out the solution, one of the key discussions we had was whether to create a multi tenant environment or to provide each field engineer with their own demo environment. On one hand, a multi tenant environment would be more cost effective and aligned with how shared environments are used in the real world enterprise settings. However We also considered giving each field engineer their own isolated environment to eliminate any potential conflicts between demos. Ultimately, we decided that while individual environments were attractive, the cost didn't justify the need, especially since enterprise teams typically worked in shared spaces already. This decision helped us to maintain consistency while keeping the infrastructure efficient. We also explored the possibility of building versus buying a dedicated demo automation tool. While we found some tools that partially covered our needs, they didn't really offer the flexibility and control we needed to mirror customer environments. We stuck with our modular infrastructure as code approach, which allowed us to customize on a per region, per demo basis without deviating from the core architecture. Now, building on that decision to go with a multi tenant environment, we designed the architecture to be modular and layered to handle the complexity of demo environments without creating unnecessary overhead. At the top, we have the global layer. This covers the foundational elements like DNS and IAM policies, which are shared across all regions. It is a single source of truth for global resources that every region can rely on. Next, we have the region core. This layer handles region specific infrastructure, like the EKS clusters, networking, routing, and certificates. It's designed to be consistent across regions, so no matter where a demo runs, the core setup is the same. And finally, we built out the region platform layer. This is where the customization happens. It includes components like Vault for secrets management, Nexus for artifact storage, and the app namespaces where teams can deploy their demo apps. Each demo is isolated, but still follows the same core structure, which means we can easily replicate it for different regions or use cases without starting from scratch. Now, by using this layered approach, we ensure that the infrastructure remains consistent across regions, while still allowing flexibility for specific customer needs. You have you may have noticed, The load vault credential step in our earlier preview. Let's talk about how we tackle secrets management because that's also super important. One of the biggest challenges in any demo or production environment is managing secrets securely. We wanted to ensure that our infrastructure not only mirrored enterprise setups, but also adhere to the best practices for secrets management. Our approach begins with single sign on or SSO for initial AWS operator access. Once that's in place, we use OpenID Connect or OIDC assumptions, which means our pipelines run securely with web identity to provision the entire stack without exposing credentials. For added security, the root key for Vault, which is the backbone of web is stored in AWS Key Management Services, or KMS. This allows Vault to auto unseal, adding another layer of security during setup. Now, the operating operators themselves are given limited, unique roles that restrict their access outside of the pipeline. This ensures that no one has unnecessary access to the sensitive parts of the system. Meanwhile, our development pipelines rely on predefined policies within Vault to load any additional credentials or tokens so that the secrets are only retrieved as needed. In short, our setup ensures that sensitive information is protected at every layer, from initial provisioning to ongoing operations. By keeping the secrets out of developers hands and automating access through Vault, we're able to secure the environment without slowing things down. Now that we've figured out all the details and ironed out all the kinks, let's talk a little bit about how things are going now. When we first rolled this out, adoption was a bit slow. While the environment was powerful, it lacked the polish to really grab attention. It was functional, but it didn't have that wow factor, just yet. Now, as we improved the demo apps and incorporated more eye catching features like modular microservices, firmware testing, mobile releases, things started to change. The addition of visual deploy tracking, which is what we demoed earlier, also added some much needed flair. Customers really like that. Teams could now see the deployments in real time, which was also a huge selling point. Now, initially, contributions to the environment were slow, and we realized the architecture was a bit too complex and entangled. Each new field engineer who used it, though, slowly cleaned up the process, little by little, making it more user friendly and accessible. Over time, the environment became widely adopted, not just by our team, but by teams outside of field engineering as well. We even have some customers that have deployed it locally for their own testing and trial purposes. It's now the go to demo platform, and its use has become consistent across the organization. Now, speaking of customers that are using Environment, we had a real light bulb a ha moment when one of our customers came to us and asked, Hey! How do we get our own access to your sandbox environment? That was a real game changer because up until that point we were focused on using the environment for internal demos or internal demos to customers. But this request really opened up a whole new perspective and it showed us that customers aren't just interested in seeing the demo. They wanted to get hands-on with the same infrastructure that we were using. And on top of that's a really powerful selling point to be able to tell your customer, Hey. You can go ahead and use, our tool within your infrastructure, right? So this was a pivotal moment because one, it validated all the work that we had put into making the environment scalable, modular, and secure. And two, now we're not just showing off the platform. We're actually getting to offer customers the chance to test drive it themselves in real time, which is super powerful, especially in the sales process. But naturally, this also meant we had to make some changes. Quite a few, actually. First off, we had to remove any hard coded IDs or paths from the environment to make it flexible enough for external use. This wasn't just about running demos anymore, it was about giving customers the ability to use the environment themselves, which meant it needed to be adaptable for their unique needs. We also needed to fully document the provisioning process. We knew if customers were going to dive into the environment, everything had to be clear and straightforward. Thankfully, we already had a solid document in place, which was about 80 percent of what we needed. So going through and patching that up before we gave it to the customer wasn't too difficult a task. From there, it was all about refining the architecture, making it truly implementable for external use, while still maintaining the security, scalability, and modularity that made it so powerful to begin with. Now, adapting the environment for customer access wasn't just about making it usable for them, it also opened up new opportunities to make our deployment more portable. Like I said before, we shifted from hard coded elements to a mostly parametrized setup, which allowed us to deploy the same architecture across multiple environments, regions, and customer specific configurations with ease. This meant that we can now spin up environments not only for our internal demos, but also for customer POCs, sandbox environments, and even full scale trials. By modularizing the core components, we made it easy to tweak or scale the deployment based on what the customer actually needs. And the benefit is the architecture now adapts without requiring a total rebuild, which is awesome. And this portability has given us and our customers the flexibility to deploy the same powerful infrastructure in a variety of contexts, saving time and ensuring consistency across environments. Now that you've seen it all, let's recap everything we've covered and the impact that it's had. First and foremost, the consistent demo experience. By using a modular and layered infrastructure built on Terraform and Kubernetes, we've removed a lot of the friction that came with setting up one off environments for each demo. Now, no matter which region or customer we're dealing with, we can spin up the same reliable environment with minimal effort. Next, the platform is reflective of our customers own infrastructure. From multi region deployments to the integration of tools like Vault for Secret Managements and Istio for Service Mesh, the environment mirrors what enterprise customers are using and what they're using right now, which allows them to really see themselves in the demo. Which adds a lot of value and helps them better understand how our solution fits into their own ecosystem. We've also implemented better controls, particularly around security. Password list deployments, IAM policies, and secret management with Vault mean that sensitive information is handled automatically, securely, and with minimal human interaction. This setup aligns perfectly with what customers expect in an enterprise setting, helping build trust during the demo process. Beyond that, the adaptability and portability of this environment have been real game changers. We can customize the deployment for different customer use cases, or even hand it over to them for sandbox or trial purposes. Something that would have been impossible with our previous setup. This flexibility has opened up new doors for how we engage with customers and accelerate their path to production. Finally, this process has allowed us to streamline internal operations. The environment is not only more efficient for demos, but has been widely adopted across teams, reducing the need for manual setup. Improving speed and ensuring that everyone has the access to the same high quality environment. In short, this architecture has transformed how we deliver demos, making them more consistent, secure, and tailored to the enterprise needs of our customers. And with that, we end. Thank you for joining us today. We hope this walkthrough gave you a clear understanding of how our demo environment has evolved and the impact it's having on how we deliver value to our customers. We're excited about what's ahead and looking forward to seeing how this continues to improve our ability to meet customer needs with greater consistency, security, and flexibility. Thanks again for your time, and we hope you enjoyed the presentation.

Slides

Download slides (PDF)

See all 32 talks at this event!

Conf42 Kube Native 2024 - Online

September 26 2024 - premiere 5PM GMT

How CircleCI's Field Engineer team use custom Terraform modules and K8s to deliver globally distributed & highly tailored enterprise demos

Video size:

Abstract

Summary

Transcript

Slides

Eddie Webbinaro

Senior Director, Head of Field Engineering @ CircleCI

Moo Olaniyan

Senior Field Engineer @ CircleCI

Join the community!

Featured event

2025

2024

Info

Conf42 Kube Native 2024 - Online

September 26 2024 - premiere 5PM GMT

How CircleCI's Field Engineer team use custom Terraform modules and K8s to deliver globally distributed & highly tailored enterprise demos

Video size:

Abstract

Summary

Transcript

Slides

Eddie Webbinaro

Senior Director, Head of Field Engineering @ CircleCI

Moo Olaniyan

Senior Field Engineer @ CircleCI

Join the community!