Transcript
This transcript was autogenerated. To make changes, submit a PR.
Welcome, everyone.
And thanks for joining us today at Conf42's kube native 2024 conference.
Today, we'll dive into how CircleCI's field engineering team uses Kubernetes
and Terraform modules to revolutionize the way that we present enterprise demos.
We're going to explore some of the challenges we face, the solutions we built
out, and the impact it's had both on our team and our customer interactions since.
Firstly, let's introduce ourselves.
My name is Eddie Webinaro.
I am the Senior Director of Field Engineering here at CircleCI.
We are a global team supporting North America, EMEA, and the Japan ANZ region.
I'm joined today by Mualanian, who is a Senior Enterprise
Field Engineer with the team.
He's got a really interesting background, embedded firmware devices,
deploying software into submarines.
Myself, I spent a career in software development as well,
but lamer tamer traditional cloud infrastructure type stuff.
But we're both really excited to be here today and present this conversation.
What is that conversation?
We're going to do a a brief overview of what we're talking about today, and I want
to give you a quick demo of the product.
Then we're going to dive deep into, and take a step back, and really
talk about what the problem was, what the challenges were we were seeing,
how we came up with the solution.
And really how we built it out to make it to easy as deploy as
possible, whether locally in our own environments or actually for demos
out in a customer's environment and giving them the chance to get hands on.
We'll then wrap up what Sarah looks like today, how it changed
the way our field engineering team interacts with potential customers.
So that all sounds good.
Stick around.
Let's get started.
Now, I got to give a little credit here to one of my former
solutions engineers, Aaron G.
He once told me you always got to start the conversation
with a finished cake, right?
If you think about those traditional cooking shows, here's the beautiful dinner
that we're going to make tonight, right?
And they put it away and then they jump into the conversation.
And so that's what I wanted to do with y'all today, right?
And so in our case, The finished cake is doing a demo with a globally
distributed customer, right?
And it just works.
All the magic's there.
It happens right out of the box.
We're talking sleek, ready to wow enterprise clients
without skipping a beat.
Our solution mirrors those enterprise environments of customers, enterprise
type infrastructure controls and concerns and set up with just a few clicks.
And so by the end of the presentation today, you'll understand how we built this
powerful system and how it impacts the way that we deliver demos here at CircleCI.
So before we hop into the demo, I want to take a look at what it
is we're going to show you here.
I'm going to show you triggering a deployment Through CircleCI pipeline
to roll out new releases to an EKS environment across multiple regions.
So we've actually got clusters deployed across North America, EMEA, and Japan.
We're going to show you how the CircleCI releases dashboard.
We'll go through a series of steps.
to incrementally roll out these new features as they're released
or automatically roll back when an error is encountered when possible.
This is just a quick demo to show you what the application developer
experience is like, how they would interact with CircleCI in
a typical enterprise environment.
The key here is that we're deploying an app automagically without ever
needing credentials that a developer might see, leak, or be exposed
to, and without the need to have manual intervention if something
goes wrong during the deployment.
So it's demo time.
And so this is the finished product.
This is our Sarah and enterprise environment.
And what's going on here is a deployment demo that just showcases
a few of those capabilities.
So as you can see, we have three windows upon the screen.
On the left hand side is a circle CIS pipeline page.
I'm actually going to trigger a deployment to run a build and
deployment pipeline to run against these environments in North America and a Mia.
which are the two windows on the right hand side.
The app we're using is a colloquial named Doctor Demo.
It's a deploy and release demo, hence the Doctor DR.
And it's actually going to show you how we can incrementally and
progressively roll out these features.
So the left window you'll see now has updated.
We're actively building these applications across two different
jobs, one for the North American region and one for the MIA region.
As these jobs Progress and actually start to introduce this
environment into those versions.
You'll actually see the colors on the right hand side change.
So the colors represent the current app version.
And you'll see us introduce a new color to each one of these screens.
And you can see that on the top we're already introduced a yellow version.
And on the bottom we're going to introduce a green version.
And there it goes.
We can jump over to CircleCI's release dashboard that I mentioned.
On the left screen here and actually see those releases as well.
So now this is being monitored from the Kubernetes cluster and sending
information back to CircleCI about how those rollouts are progressing,
whether they're succeeding or failing or needed to be rolled back.
I'm actually going to intentionally, you can see on the right hand side, I've
dragged the error rate up to this 200%.
So I'm intentionally causing this EMEA instance to send bad bad results back to
the server, basically invoking failures.
This should be enough to trigger Our analytics to say, wait a minute,
something is wrong with this release.
I'm going to leave the North American release up top healthy, so that
one should continue on normally.
And the window here we're looking at is that yellow release, it's coming
online, we're about halfway through, and it's basically increasing the
amount of traffic that is going to each one of these applications.
Now I'm clicking into that failed release.
It was a failed release because I intentionally set the error rate high.
CircleCI's release agent detected that in the rollout and said,
Wait a minute, we've exceeded our error limit for this application.
And so now on the bottom you can see the traffic was incrementally
increasing the amount we were sending to this new version.
It actually completely stopped that and rolled back to the prior version.
Meanwhile, in North America, though, we didn't encounter those same errors.
Whatever was different in the environment, i.
e.
Eddie mucking around with it it's succeeding.
And so we're going to let that rollout continue.
And you can see that both in the CircleCI releases dashboard, as
well as the environment there.
So this is just one of those apps that we can use with customers to
drive that kind of demo conversation.
Now, to elaborate a little bit more on what we saw and go deeper into what
our solution actually looks like, this is a point I'm going to go ahead and
hand it off to Mu to walk through how we went through this process as a team.
Thanks, Eddie.
To elaborate a little bit on what Eddie showed us there, this environment is
fully built on infrastructure as code.
Everything is automated and repeatable, except for the initial
domain purchase and IAM role setup, which are one time manual steps.
Thanks, Eddie.
It runs on Kubernetes, specifically on EKS, giving us
the flexibility and scalability needed for enterprise level demos.
We've implemented Istio as our service mesh and Ingress controller, with tools
like Kiali, Grafana, and Prometheus to manage, monitor, and visualize
what's happening in the environment.
Now we've showed you the solution, but we didn't start there.
Let's go back to the problem we wanted to solve.
Now, the problem was clear.
Each demo needed to be unique, which meant we were constantly reinventing the wheel.
music ends We lack standardization, and that caused friction
amongst customer trials.
And guess what?
It was a mess.
Repeated work with too much manual effort led to demos that were missing the
mark with enterprise clients, big time.
Without infrastructure as code, security was a concern, and the
manual steps slowed us down.
Enterprise teams expected better, and we knew we had to step it up.
We have two main goals when building this demo environment.
First, we wanted to reduce the amount of effort that it took to
deliver clean, consistent demos.
This meant creating a highly available demo environment that could be spun
up quickly, without needing manual intervention or endless preparation.
The more reusable we could make the environment, the less time we would spend
reinventing the wheel for each demo.
Second, we wanted to elevate the conversation we were having
with customers instead of just showing them a static demo.
We needed a dynamic, scalable platform that mirrors the kind of
infrastructure that they're running.
By doing this, we're able to provide a more consistent experience
across different enterprise roles, whether it's the dev team,
security team, or operation teams.
Everyone can see how our platform fits their specific needs.
How did we get all of this started?
Let me tell you, it wasn't an overnight process.
It took a lot of collaboration and late night discussions across teams.
We started with technology chats, where our team, spread across North America,
Europe, Middle East, and Asia, and Japan Asia Pacific worked together to
select the right tools and technologies.
We had to ensure everything aligned with customer needs while fitting
seamlessly into our own ecosystem.
Next, we had vision shots.
These were more strategic, where management and field engineers focused
on defining the impact that we wanted to achieve with this environment.
It was about telling the right story, one that demonstrated real value to customers.
Initially, our goal was to create a reference architecture that customers
could trust as a close reflection of their own infrastructure.
Now, while we worked on this, the team continued their regular
duties, generating leads and closing deals using our existing methods.
But as the architecture took shape, it became clear that this new environment
would become central to how we run demos going forward here at CircleCI.
And personally, in my own deals, it has really changed the way that I
interact and engage with prospects.
Having this consistent.
And scalable environment to demo in real time has removed a lot of the friction
and made my conversations more impactful and relevant to potential customers.
So what did we come up with?
After laying out the solution, one of the key discussions we had was whether
to create a multi tenant environment or to provide each field engineer
with their own demo environment.
On one hand, a multi tenant environment would be more cost effective and aligned
with how shared environments are used in the real world enterprise settings.
However We also considered giving each field engineer their own
isolated environment to eliminate any potential conflicts between demos.
Ultimately, we decided that while individual environments were attractive,
the cost didn't justify the need, especially since enterprise teams
typically worked in shared spaces already.
This decision helped us to maintain consistency while keeping
the infrastructure efficient.
We also explored the possibility of building versus buying a
dedicated demo automation tool.
While we found some tools that partially covered our needs, they didn't really
offer the flexibility and control we needed to mirror customer environments.
We stuck with our modular infrastructure as code approach,
which allowed us to customize on a per region, per demo basis without
deviating from the core architecture.
Now, building on that decision to go with a multi tenant environment,
we designed the architecture to be modular and layered to handle the
complexity of demo environments without creating unnecessary overhead.
At the top, we have the global layer.
This covers the foundational elements like DNS and IAM policies, which
are shared across all regions.
It is a single source of truth for global resources that every region can rely on.
Next, we have the region core.
This layer handles region specific infrastructure, like the EKS clusters,
networking, routing, and certificates.
It's designed to be consistent across regions, so no matter where a demo
runs, the core setup is the same.
And finally, we built out the region platform layer.
This is where the customization happens.
It includes components like Vault for secrets management, Nexus for artifact
storage, and the app namespaces where teams can deploy their demo apps.
Each demo is isolated, but still follows the same core structure,
which means we can easily replicate it for different regions or use
cases without starting from scratch.
Now, by using this layered approach, we ensure that the infrastructure
remains consistent across regions, while still allowing flexibility
for specific customer needs.
You have you may have noticed, The load vault credential
step in our earlier preview.
Let's talk about how we tackle secrets management because
that's also super important.
One of the biggest challenges in any demo or production environment
is managing secrets securely.
We wanted to ensure that our infrastructure not only mirrored
enterprise setups, but also adhere to the best practices for secrets management.
Our approach begins with single sign on or SSO for initial AWS operator access.
Once that's in place, we use OpenID Connect or OIDC assumptions, which
means our pipelines run securely with web identity to provision the entire
stack without exposing credentials.
For added security, the root key for Vault, which is the
backbone of web is stored in AWS Key Management Services, or KMS.
This allows Vault to auto unseal, adding another layer of security during setup.
Now, the operating operators themselves are given limited,
unique roles that restrict their access outside of the pipeline.
This ensures that no one has unnecessary access to the
sensitive parts of the system.
Meanwhile, our development pipelines rely on predefined policies within
Vault to load any additional credentials or tokens so that the
secrets are only retrieved as needed.
In short, our setup ensures that sensitive information is protected
at every layer, from initial provisioning to ongoing operations.
By keeping the secrets out of developers hands and automating access through
Vault, we're able to secure the environment without slowing things down.
Now that we've figured out all the details and ironed out all the kinks, let's talk a
little bit about how things are going now.
When we first rolled this out, adoption was a bit slow.
While the environment was powerful, it lacked the polish
to really grab attention.
It was functional, but it didn't have that wow factor, just yet.
Now, as we improved the demo apps and incorporated more eye catching
features like modular microservices, firmware testing, mobile releases,
things started to change.
The addition of visual deploy tracking, which is what we demoed earlier,
also added some much needed flair.
Customers really like that.
Teams could now see the deployments in real time, which
was also a huge selling point.
Now, initially, contributions to the environment were slow, and
we realized the architecture was a bit too complex and entangled.
Each new field engineer who used it, though, slowly cleaned up the
process, little by little, making it more user friendly and accessible.
Over time, the environment became widely adopted, not just by
our team, but by teams outside of field engineering as well.
We even have some customers that have deployed it locally for their
own testing and trial purposes.
It's now the go to demo platform, and its use has become consistent
across the organization.
Now, speaking of customers that are using Environment, we had a real
light bulb a ha moment when one of our customers came to us and asked, Hey!
How do we get our own access to your sandbox environment?
That was a real game changer because up until that point we were focused
on using the environment for internal demos or internal demos to customers.
But this request really opened up a whole new perspective and it
showed us that customers aren't just interested in seeing the demo.
They wanted to get hands-on with the same infrastructure that we were using.
And on top of that's a really powerful selling point to be
able to tell your customer, Hey.
You can go ahead and use, our tool within your infrastructure, right?
So this was a pivotal moment because one, it validated all the work that
we had put into making the environment scalable, modular, and secure.
And two, now we're not just showing off the platform.
We're actually getting to offer customers the chance to test drive it themselves
in real time, which is super powerful, especially in the sales process.
But naturally, this also meant we had to make some changes.
Quite a few, actually.
First off, we had to remove any hard coded IDs or paths from the environment to
make it flexible enough for external use.
This wasn't just about running demos anymore, it was about giving customers
the ability to use the environment themselves, which meant it needed to
be adaptable for their unique needs.
We also needed to fully document the provisioning process.
We knew if customers were going to dive into the environment, everything
had to be clear and straightforward.
Thankfully, we already had a solid document in place, which was about
80 percent of what we needed.
So going through and patching that up before we gave it to the
customer wasn't too difficult a task.
From there, it was all about refining the architecture, making it truly
implementable for external use, while still maintaining the security,
scalability, and modularity that made it so powerful to begin with.
Now, adapting the environment for customer access wasn't just about
making it usable for them, it also opened up new opportunities to
make our deployment more portable.
Like I said before, we shifted from hard coded elements to a mostly parametrized
setup, which allowed us to deploy the same architecture across multiple
environments, regions, and customer specific configurations with ease.
This meant that we can now spin up environments not only for our internal
demos, but also for customer POCs, sandbox environments, and even full scale trials.
By modularizing the core components, we made it easy to tweak or scale
the deployment based on what the customer actually needs.
And the benefit is the architecture now adapts without requiring a
total rebuild, which is awesome.
And this portability has given us and our customers the flexibility to deploy
the same powerful infrastructure in a variety of contexts, saving time and
ensuring consistency across environments.
Now that you've seen it all, let's recap everything we've covered
and the impact that it's had.
First and foremost, the consistent demo experience.
By using a modular and layered infrastructure built on Terraform and
Kubernetes, we've removed a lot of the friction that came with setting
up one off environments for each demo.
Now, no matter which region or customer we're dealing with, we
can spin up the same reliable environment with minimal effort.
Next, the platform is reflective of our customers own infrastructure.
From multi region deployments to the integration of tools like Vault for Secret
Managements and Istio for Service Mesh, the environment mirrors what enterprise
customers are using and what they're using right now, which allows them
to really see themselves in the demo.
Which adds a lot of value and helps them better understand how our
solution fits into their own ecosystem.
We've also implemented better controls, particularly around security.
Password list deployments, IAM policies, and secret management with
Vault mean that sensitive information is handled automatically, securely,
and with minimal human interaction.
This setup aligns perfectly with what customers expect in an
enterprise setting, helping build trust during the demo process.
Beyond that, the adaptability and portability of this environment
have been real game changers.
We can customize the deployment for different customer use cases,
or even hand it over to them for sandbox or trial purposes.
Something that would have been impossible with our previous setup.
This flexibility has opened up new doors for how we engage with customers and
accelerate their path to production.
Finally, this process has allowed us to streamline internal operations.
The environment is not only more efficient for demos, but has
been widely adopted across teams, reducing the need for manual setup.
Improving speed and ensuring that everyone has the access to the
same high quality environment.
In short, this architecture has transformed how we deliver demos, making
them more consistent, secure, and tailored to the enterprise needs of our customers.
And with that, we end.
Thank you for joining us today.
We hope this walkthrough gave you a clear understanding of how our demo environment
has evolved and the impact it's having on how we deliver value to our customers.
We're excited about what's ahead and looking forward to seeing how this
continues to improve our ability to meet customer needs with greater
consistency, security, and flexibility.
Thanks again for your time, and we hope you enjoyed the presentation.