Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey everyone, my name is Gal. And today I want
to convince you all that platform engineering is all about product.
And that product is an important part of building a successful
internal developer's platform. Now before I do that,
I want to take you all down a stroll down memory lane.
We all know the stories about throwing over the fence, but I just want
us to remember what organizations used to look like back in the
90s or in the early two thousand s. And these organizations
were made of two groups. Well, a lot of other group built two groups
that I want to focus on today, which are the dev group and the Ops
group. The dev group was basically in charge of building features
and they wanted to build as many features as they could as fast as they
could. And the Ops group was in charge of making the system
run. In production. They were tasked with the
system being up, being stable, being reliable. And these
two groups were siloed. And the silos basically
caused a lot of miscommunication between them which slowed
down the business and hurt it. So somewhere
around 2008 we all came down and said,
okay, let's think of a new cultural shift and we'll call
it DevOps. Now these cultural shifts sound brilliant. We're going
to break down the silos between the two organizations. We're either going to
have one organization or two organizations that are functioning very well
in harmony together. The developers are going two own production
and you are going to be able to deliver software manifester,
more reliable and a better way. But the problem
is that it's not as easy has it seemed. It turns
out that only around 3% of the organizations were able to achieve
the DevOps ideal, where every developer if he builds something,
he can run it, he can operate it. Basically the
other organizations fell down into two or
maybe a few more, but two main patterns.
One is where they just added another DevOps team which
tried to make connections between the
dev and the Ops team and basically just created a third silo.
Another type of organizations let go of
Ops entirely. Said our developers are
able to run Ops themselves, where in fact they couldn't,
that they just disregarded the art and the
effort that is needed to run production. And the
result was that there were kind of a shadow Ops operation going
on. The senior developers are doing the Ops. No one is exactly
sure what is going on and only a few knows how
to operate production, which actually caused even more skills.
Basically what we ended up with in either case
is again two groups of people. One are the devs
and one are the Ops. But instead of the Ops we just call them
DevOps now. And the skills continued because
not everyone can adopt this complex cultural shift.
Then ten years later, somewhere around 2018,
it was probably before that we all said, okay,
Google developed something pretty cool back in the early two
thousand s and they called it an SRE. So maybe instead
of DevOps, we can do SRE. Let's take our DevOps group
and embrace the practices that Google created and
now let's do SRE within our company and that will break down the silos and
make us work more effectively. And you
guessed it, basically being Google isn't that easy.
For most companies that try to adopt the SRE mentality, they don't follow
all the rules and all the methodologies that come with it, and they end
up with two groups, one dev and one users to be
called Ops. Then it was called DevOps and now it's called SRE that are siloed
from each other. The devs are focused on building features, the SRE
are focused on holding production, and we're not
getting to deliver value faster, we're not getting two building features
faster, we're not getting to help our business in the
way that we want them. Fast forward five years
later, now we're all saying, let's do platform engineers.
Now before we dive into that, let's talk a bit about what platform
engineering is right, what platform engineering is offering us.
Platform engineering is basically saying, okay, let's build
an internal developer platform that will help our developers
move faster. It will simplify the workflows
and give them paved draws, some golden paths so
they can focus on bringing value into our business.
We will have a group that are the platform
engineers who will build this platform, and the rest of the engineers will use
this platform in order to create value for our business.
And now my question that is going to
follow this entire presentation is, how do we
avoid this? How in two years
from now, we don't end up with a bunch of organizations who look exactly
like this. And this is what I'm going to try to
answer in this session.
Let me just give a brief introduction of myself. Hi everyone, I'm Gal
Bashan. I'm the head of engineering at Epsagon, which was recently acquired
by Cisco. Before that, I did a lot of cool cybersecurity
stuff at the IDF. I love building and that's my Twitter
handle if anyone wants to give me a shout out.
So has I said before, our goal here is
enablement. We want our platform, our internal
developer platform, to enable our developers. And what does that
mean? Basically, every developer
is different. We have the senior developers who can tweak the
helm charts and can do all the configuration that is needed,
and that is good. But we also have the junior developers who don't
care if the application is running on SBS or eks,
and the platform that we build needs
to enable both of them. So the goal here is to
create some common use cases or some golden pitfalls or some useful
tools that will help the majority of
our developers develop faster, operate faster,
and bring value in a better way.
So usually we measure our IDP by
three main aspects. Does it help our developers build
in a more secure fashion? Does it help our developers
build in a more cost effective fashion?
And does our developers have a better developer experience?
Now the first two are pretty self explanatory.
If by using the platform I'm getting automatic security
features and the code that I write is easily more secure
and I don't need a lot of boilerplate to bake security in,
then the platform is probably helping me get security makes in.
If the platform is optimized the instance size two,
the requirements of my application, it's probably saving
me some money and it's helping me with costs.
Developer experience is something that's a bit harder to measure.
A lot of people try to measure developer experience by cycle time.
How long does it take from when I start working on
an issue until it is in the air and in production?
Another way is to measure
MTTR. Meantime, to resolve how long does
it take me from the second that I found a problem within my system
until it is resolved? We can also have a look at other metrics
like how often does the pr get merging? Or how often a
release is triggered. But really, if you want to know if you
have a good developer experience in your company, you should look at your attrition.
How many of the developers want to stay within your company?
For how long a developer plays within your company? Of course, there are a
lot of other factors that affect attrition, but if you have a
good developer experience for your developers, you won't have
high attrition rates. So now that we know
what we're aiming for, why we want to build this IDP, I want
to talk about a few of the important
aspects of how to build this IDP,
and I want to start out with a story.
This is a story that we've had in Epsigon. Basically,
it was way before we had platform engineering installed.
We had a senior developer with some spare time and
he wanted to build a tool that will help developers in our
company. Now, Epsagon was a company that was building
a lot with serverless technologies. So our entire backend
was based on Lambda, and we had a lot of kinesis streams. And if you're
not familiar with kinesis, then, and I know the data engineers here
will be angry with me, but Kinesis is basically managed Kafka.
It's a streaming service by Amazon that helps you connect
different asynchronous services using streams which is similar
two the Kafka streams that you're probably familiar with.
And we were working a lot with Kafka because we were streaming.
Our solution was can observability solution, and we were streaming a
lot of traces that our customers were sending.
And what these developers thought is
that our team could benefit from
the ability to stream those traces not
in the clouds, not into lambda, but directly to our
local machines in smaller quantities using sampling,
but just so we can debug what's going on in the cloud. So he
actually went out and wrote this tool and solved it.
But after a month or two, we noticed that no one
really used it. And the problem is
that it wasn't adopted because this wasn't a real problem.
It turns out that our developers were perfectly fine
with just debugging on lambda or creating
this code, like to get traces into your
local computer. From the kinesis was relatively simple. It was like ten
lines of code. So just because a
solution is there and it is cool, it doesn't
mean that this is a pain point that our developers really had.
So this leads us to our first takeaway.
The platform has to solve a problem that your developers
has. Just because we can solve something doesn't mean that
we should solve something. So what
problems can our developers have? They can have actually a
lot of problems in a lot of different areas. They can
be wasting a lot of time on infra, they can be working.
A lot of the time can go into just building helm charts,
or configuring every single one of their pods or
their instances. And if we can provide them with some templates,
we can save time. There can be a lot of boilerplate
code. Maybe in order to set up a service, they have to do a copy
paste from a bunch of different services. And if we create a tool that helps
them create a service that will save them a lot of time, they can
be struggling with security. Maybe we should makes in some automatic KMS
solutions, like key management solutions, just so they
can use it more effectively and not have to look it up every time they
need it. Maybe they're missing observability alerts
for their service. They want automatically two be automatically alerted
on red metrics instead of having to set up for each
of their individual services. Maybe it's code ownership.
They don't know who is in charge of this library. It can be cognitive load.
They can be in charge of two many things. It can be quality. Maybe it's
hard to write tests. There are a lot of different aspects.
So how should we know where to focus?
And the second takeaway is that our developers
knows what problems they have. While the problem range
is very big, our developers know what troubles them in
their day two day. And if we just go and ask them, we'll know
where to start and where we should start looking.
So we should interview our developers. We should use
those interviews to collect data. We should sit in retros and see what
comes up. We should look at the recent
bugs that we have and understand what were the latest root causes.
And we should use all this data in order to
choose what problems do we want to solve first? And again,
don't start with the solutions. Start with the problems that you want to solve
next. Even if we found the right problems
to solve,
it is easy sometimes to just
focus on building a cool technology and somehow
solve this problem instead of just solving this problem in an
effective way. The problem is,
if we're not focused on the need of the user, we end up
doing probably one of three things. We just build every
specific thing that our developers want from us. And this
is not a very efficient way to build a platform, right? Because if we
satisfy the needs of one user, we're probably not satisfying the needs
of another 99 users. The second thing is that
we just find a cool technology that we want to mess with, and then we
spend six months trying to make this technology solve our
pain. This is something that we often do as developers, just try
to insert a cool technology that we want two use into our
problem space. And the third thing that we may end
up doing is come up with a solution that is good for
us. Because I'm a developer and I can imagine
myself having the problem that the developer that I interview is having.
I can just imagine, okay, this is the solution that would work
for me and build it, but it actually may
not be the solution that works for the developer or that can be convenient
for him. Before I go ahead and build a solution, I have to validate
that the solution is also good for the user. In my case,
the other developer in my company that I'm building it for.
So the takeaway here is that when you're building
the solution, you should focus on value. Don't go to impressive technology.
Don't go. Two, what would be easy for you?
Ask the user or the developer in your company what would be valuable
for them and then build that.
The next thing I want to tell you is a story that I've had
back in my army days, and because some of it is restricted,
I'm going to change the product domain a bit, but you'll get
the big picture. So in my
army unit, we were focused on the bagels,
let's say product domain. We were working with many vendors that provided
different bagels. Some of them were coated, some of them had salmon,
some of them were round, some of them were half, some of them were full.
And we wanted to take data from all of those different vendors.
And we built system using this data, manipulating this
data, storing this data that were
our own. So we work with a lot of proprietary bagel formats,
but we had to store it in a central location
or for several products that shared those different vendors.
So we came up with the solution of let's
write a very generic bagel library that
every project can later on use. So we went
down and understood the specs of the bagel and what is like the
dictionary definition of a bagel and how we should
treat a bagel and how we should abstract the cucumbers
in the bagel. We spent around
six months creating this library,
and the developer of course, was from one project and it has a perfect
fit for this project.
After those six months, we released this library and we asked
the other projects, hey, do you want to use this library?
And they gave it a go. And after two weeks they understand
that it is just not usable for them. It is too complex,
there are too many options, it is too generic and
they are just not able to use it. And this library was
basically neglected. So the takeaway here
is that agile is still valid.
Even when we're building a platform and not a user facing product.
We should always iterate quickly, we should
always give the user a taste of what
is getting. We should always build an MVP and let
the user try it before building the next has. So prioritization
is important. We need to understand where we're starting, what is
the most important pain point that we want to solve from the pain points that
we identified and what is the easiest solution from the solutions
that we identified and then execute
the MVP of it. Give it to the customer, which is an internal developer
in this case, but it's still a customer then get his feedback
and understand only if you're on the right track to proceed to the next step.
So just because it is an internal platform and not an
external product is not an excuse to go in the lab, sit for
six months, build out this gigantic
group Goldberg machine and then just launch it
into the internal developers of our company and have it fail.
So let's recap what we've talked about so far. So we've talked about
the fact that in order to build a good IDP, we need to first validate
the problems that our developers have. We need to understand what problems
they have and how we can solve them. Then we need to validate those
solutions that we have in mind. We need to understand that the vision that
we have for a solution is valuable for our internal developers.
After that, we need to iteratively bring those solutions
to those developers as fast as we can and validate
that we're on the right track. We still have to use agile in order
to make sure that we're not building something that is
not usable. Another thing we didn't touch about is we have
to go to market. We have to convince developers to use this.
We have to make sure that they understand that it will give them value
and help them be a better developer.
Now this job description sounds kind of familiar.
And it is familiar because it exists. This is the job description of
a product manager. We have to look at our developer
platform as a product.
And because it is a product, we should have a product manager that
leads it and makes sure that it is valuable and
for our internal users. Now when
you're going to pitch this idea to your head of product,
you are probably going to hear one of those four things.
First of all, I can bring on another product manager
because it's expensive. It's another headcount. I don't have the budget
for another headcount. Then you should go
to your manager and ask him what is more expensive,
hiring one product manager or spending
an entire platform engineering team building something that no
one will want to use because we didn't validate that.
What they're building is actually useful internally in the company.
Another thing that you may hear is that it's
an internal tool so engineering managers can manage it. There's no need
for product managers. Product managers are only dealing with
outside facing customers. Now in
my book, that's just disrespectful to PM skills.
PPM should be able to talk to users, but user
doesn't mean outside of the company, it just means someone that uses a product
and understand the needs. And this is a very
hard skill just to interview someone and do it effectively
in a way that you understand what you can do in order to solve this
problem. And if you're just saying that any engineering manager can do it without
training, I think that's kind of disrespectful for product managers.
Also, engineering managers has a lot to makes care of. They have to
take care of the development of their people, both personal and professional.
They have to think about project manager like the execution part of
the job, just throwing the additional product management
responsibility on them. This is kind of irresponsible.
Another thing I hear pretty often is that developers
know what they want. Look, you're a developer,
you use platforms. So why can't you
just build the platforms that developers want today?
The answer is why do elasticsearch have product managers
like elasticsearch? We all know it. It's a product that
is used by developers. So if developers know what developers want,
why does elasticsearch need product managers? Again, it touches back
to the fact that just because I'm a developer and I know what I
want, it doesn't mean that I know what every developers want.
And a good PM can talk to a lot of developers,
synthesize the real need and understand what we should
build. And the last, most annoying
excuse is that the platform usage is
mandatory, so we don't need a PM for it because everyone is
going to use it anyway. If that is your company's
approach, then you are going to have a bad time because no one
wants to use an internal tool that is very, very hard to use.
The platform team will have a bad time because no one will want to use
the product. The developer team will have a bad time because the platform will
actually slow them down because it's not very useful or handy.
So mandating usage of the platform is usually
a very bad idea. You should have the PM
and the platform team make the platform so useful that
developers actually want to use it.
The takeaway is here is that your internal developer platform
is a product. It should have a PM and
the PM should make sure that the platform team is building
something that the rest of the developers in the company wants to use.
Otherwise you're just going to end up with two
organizations. One of them is Dev, one of them is
the platform engineering. They're going to be siloed and the
platform engineering organization is not going to be a valuable addition
to your company. This product manager has to
be measured against the success criteria that we talked about in
the beginning, does the platform help the developers build a more secure
application? Is it more cost effective? Is the developer experience
better? Those are the things that this VM
should be measured against.
To sum it up, we need to build an IDP that
enables developer we have two come with a problem first
mindset. We need to solve a problem, not just
build a cool solution. If we want to know what problem to
solve, we should just ask our developers. They know what problems they're facing
and they know what area we're lagging in the
most. When we're building the solution, we need to make sure
that we're building a solution that is valuable, not just that is
cool or complex or using the most advanced technology.
When we're building this solution, we should iterate fast, we should
use agile methodologies and we should double check all the
time that we are on the right track. And finally,
all of this should be led by a product manager
that knows what he's doing and has a clear vision of where he wants to
take this platform forward. Otherwise we'll just end up with a fancy
group Goldberg machine.
That's it everyone. If you have any questions you can find me on Twitter or
just mail me. I hope that this was in informative and I hope
that you'll all go back to building a useful platform for
your developers in your company. Thank you.