Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone, and welcome to this session. First of all, thank you very much to
the Con 42 audio organizers for allowing me this time to
speak. Let's jump straight into it. So my session is all about
platform engineering at Dynatrace. The journey that we've been
on and continue to be on from
up to a thousand developers and enabling 1000 developers
with platform engineering at Dynatrace, some of what we've achieved and some
of the lessons we've learned so far. So a quick agenda.
We'll do an introduction. Who am I? I'll have to explain
the dinosaurs perform conference and the hands on training session so that
you understand how I got involved in
all of this. We'll zoom out to the wider Dynatrace
use cases across the organization. We'll talk
about some of the discoveries, some of the lessons learned, and then,
of course, I'll wrap up with what's next. Where are we going next
as an organization? So first up, who am I? My name
is Adam Gardner. I'm a CNCF ambassador. I'm also a
developer relations expert at Dynatrace. And I spent about
nine months working on this area of platform engineering
at Dynatrace. So this story begins where
Andy Andreas Grabner, a fellow CNCF ambassador and
also works at Dynatrace, came to me and said,
Adam, we want to do something for the perform conference
in Las Vegas. Now, perform is the yearly
flagship conference for Dynatrace in Las Vegas.
And Andy said, we want to create an observable platform engineering
hot session. What's a hot session? It's the hands on training.
The day before the conference, prospects and customers come
into the room and we enable them on some topics. So there's
about ten rooms. Some of them are on dashboarding, some are on alert,
some are on kubernetes, open telemetry and so on and so
on. So of course we wanted to put together a platform
engineering demo. It was a little bit of a unique setup
because we had the instructors, of course, who were playing the
role of the platform team, delivering that software
as a product to the application teams. And each
of the users that would come into the training session
would model an individual app team. This is
one of the sessions. I think we ran six of these sessions
that URL, by the way, everything I'm talking about is
available for you to spin up in a GitHub code space. You can
actually spin this demo up yourself if you want to have a play around
with it. So what was the platform? Well, the platform was
based on GitLab. So GitLab was on the cluster
being our source of truth. We did it because we didn't want
or need the people walking into the room to have any
sort of dependencies. Then we had Argo CD,
argo workflows, and Argo rollouts to deliver
the GitOps side of the solution. Backstage, of course,
is the platform UI. Captain will watch what Argo is
deploying and actually generate Dora metrics. So the open telemetry data
for those deployments, of course, we've got open telemetry
and the open telemetry collector on the cluster cert manager
to generate TL's certificates. Open feature for
feature flagging. Open feature is a vendor neutral
specification for feature flagging. We've got cloud events,
we've got NginX ingress, obviously, to get into the cluster, and then
we've got a whole host of security scanning. So products like
Kubehunter, Cube audit, cube bench and so on.
So all of this data, all of this observability data is
getting fed back into an observability platform
of your choice. In this case, we were using, of course,
Dynatrace. So, talking of the observability,
it starts as soon as you spin up the environment,
because we need to know whether the environment actually spun up successfully.
And this is what you're looking at, an open telemetry trace of the
platform itself, signaling that it's healthy. Once the
platform has spun up, this is what it looks like. So Argo
is managing everything else that's on the cluster, and then
as users, we start to use the platform.
So in terms of the platform capabilities, you've got two. You can create
or onboard a new application, and when you hit choose,
it walks you through a workflow that says, well, what's the application
name? What version should it be? Do you want Dora metrics
tracking this deployment? Do you want security scans, observability tooling,
like dashboards and things? Next. Next. Next.
And the platform goes and deploys your application.
The second one is more about the 360 feedback for the platform team.
Us, the instructors, of course, we need feedback to
see whether the application teams are actually enjoying
or find this whole thing useful. So when someone picks
choose on the feedback form, they get a form,
they fill out, it packages up that feedback as a
cloud event and then sends that into dynatrace. And so
we as a platform team, can see that feedback in real time. What about the
use cases? If we zoom out across Dynatrace, because a curious
thing happened on day two, after we delivered all of those sessions,
I was standing upstairs and someone came across and
said, oh, I hear you doing platform engineering. Can you tell me more about
it? Went through the spiel and said, okay, that sounds good.
And then someone else came up and someone else came up. And by the end
of the day, I'd spoken to almost every other department,
sales engineering services, d one, the post salespeople,
research and development, the devs, and also the support organization.
And boiling it down, they could all do with platform engineering.
And it really came down to these use cases. Easily accessible disposable
environments. So this is the sales engineering and post sales want to
rock up at a customer site, demo something like open feature or open
telemetry. Show the customer, tear down the environment and walk away.
They don't want to worry about provisioning or anything like that. Now imagine
you're a developer who's just been hired by Dynatrace. If you're a Java developer,
you might need a template for a spring boot application. Or if you're
a Dynatrace app developer, you need a template for a Dynatrace app. Get you
up and running as fast as possible. Of course, the basic
normal things like provisioning infrastructure, cpu, memory,
network, clusters, buckets, storage buckets, that kind
of thing. And possibly the most interesting or novel use case I saw throughout
the day was the support ticket. So when a support ticket comes in,
the support team need a reproducible
environment. But then, of course, when they've solved that support ticket, they just
want to throw that environment away. So the platform engineering
aspect comes in. What can we layer on top of that to make all of
these use cases accessible? So summing all of this
up, this nine month journey, what have I discovered? What have we
discovered on our way to 1000
developers using this sort of technology? At dynatrace,
platforms are products. You're maybe not going to have money changing
hands, but time is money. So you're going to have to convince
someone that it's worth putting a team together to work on
this stuff. So treat it like a product. Have a roadmap,
have a product manager, have a sales tactic,
an outreach program where you go out to other teams and
talk about what you're doing, because people only have
a finite amount of time and attention. Make sure you
grab them and keep them. Platforms are abstractions. They're a
way to take the complex and make it a commodity.
They're also opportunities to offer guidance,
best practices or enforce policies. So if you've got
something complicated, platform engineering allows you
as an organization to scale that in the most efficient way. But talking
about platform engineering, we couldn't talk about
it without complexity because it can be quite easy and tempting.
You'll see in a moment. I fell for this as well, to just do
everything, shove everything in there. But complexity.
In platforms, we're talking about bloat. It leads to security,
performance, compliance and maintenance issues.
The smaller you can keep the product, the better your outcomes, the easier
it will be to maintain. Talking about enforcing policies,
guidance and best practices, this is actually when you spin up the platform
demo, this is what you get. You get a checkbox that says, do you want
Dora metrics enabled or disabled for this deployment?
Do you want to include security scans or dynastrace
or observable configuration like dashboards?
Now, you may have no idea how that works. You might not be
a security expert or know how security tooling works at all,
but because there is a nice, simple yes or no box,
you're more likely to say, yeah, well, I'll have some of that.
As a platform owner, as the platform team, I can
pre select that that is by default enabled, so I can
guide you down the right paths. Wrapping all of this up. The lessons
learned dry. Don't repeat yourself. Don't do
things more than once. And this is a problem we're trying to solve with platform
engineering at dynatrace. We've got a thousand developers, we've got
many teams, and they're all doing things a slightly different
way, in their own way, with their own stacks. Keep it
simple. I think I've already covered this to death, but it's really, really important.
A feature today that you implement is a support
ticket tomorrow. If it's not there, you can't get a support ticket
for it. So really, really be clinical and critical about what
you're including. Avoid gold plating. That goes to the same
idea. Keep it simple. The MVSP we're
all familiar with MVP, minimal viable product.
MVSP is minimal viable,
secure product. I'll talk about that more in a moment. I said,
I fell victim to this as well, because as soon as we developed
the MVSP for this demo, this happened.
I thought, oh, we could add feature flagging, and we can add
Kaiverno and chaos tools and policy and compliance
tools and on and on and on. And actually, there were no comments.
Nobody wanted this stuff. So to me, that's a pretty good signal.
Don't build it. And I went and closed all the issues. So it's
very tempting. It's very hard to say no. Now,
talking about the minimal, viable, secure product. This is actually an initiative
that is a checklist. And it goes through all the things that it means
to develop an enterprise ready, secure product.
And it already has backing from some
big names there, Salesforce, Google, the USA,
CISA organization, Okta and so on and so forth.
So I recommend checking out MVSP. That said,
though there are some exceptions, and if I was a product manager,
if I could meet these three indicators, I would say
you don't need MVSP, you maybe just need MVP.
So where the product or the thing that you're building is temporary or
throwaway and it doesn't touch important systems. Now important
is up to you to define. I define it as not touching customer data,
not touching production data, not really touching any real
stuff at all. Because if you can meet these three qualifications,
you can assume a breach. You can basically say,
okay, I know there is a bad guy looking at this data.
It doesn't matter, there's nothing real in there. Talking about things
that you may want to read, you may want to research the Cloud
Native Computing foundation app delivery group have
put out two brilliant pieces of content, the platform's whitepaper
and the follow up, which is the platform engineering maturity model.
So both of those combined, the white paper will give you a general introduction and
the maturity model will really give you that checklist of where
you are today with platform engineering and how you can and what you
need to think about to get to the next level.
Where do we go next in Dynatrace? What's next on this
journey? We're using this activity as a testing
ground to scale this across our organizations, as I've already said
across those teams to more than 1000 developers.
During this, we realized we needed a supported open
telemetry collector distribution from Dynatrace. And I'm
happy to say that has actually been delivered now. So of course we deployed backstage
and I wanted to integrate that with Dynatrace. We do have a backstage plugin,
but we actually, as part of this work, we're developing a new,
improved, updated backstage plugin. We're working
on cloud native standards for pipeline observability. So a pipeline
run started, a pipeline run finished, an error occurred and so
on and so forth, and how we standardize that across the company,
but hopefully across the industry as well. In this demo, we're using
Dynastrace Monoco tooling, but we're now looking at more cloud
native ways like crossplane and or operators
for configuration as code. So that work is currently in
progress. As well and we're looking at dev containers. So this whole setup
is based on the dev container standard and
we're looking at how we standardize our disposable
environments using the dev container specification. So with that,
thank you very much for your time. I hope it was useful.
Here are a couple of links. The QR codes, the left one goes to
the hands on demo. You can spin that up in a code space,
play around, destroy it when you're done. GitHub give
you 2000 minutes I think for free per month.
So yeah, go feel free to play around with
this. Connect with me on LinkedIn if you get stuck or you need any
help. And the links at the bottom are all of the white papers,
the MVSP and the backstage plugin that I've discussed
during this session. Thank you so
much for your time and yeah, enjoy the rest of the conference and
good luck on your platform engineering journey.
Thanks. I'll see you soon.