Transcript
This transcript was autogenerated. To make changes, submit a PR.
Sash
for joining this session at the Conf 42 DevOps
2024. Hope you're having a great time.
Thanks a ton to the organizers for putting this event together
and allowing us to share what we are learning with everybody else
in the ecosystem. What a great time to be building
software. Not only every organization across the world
is now a technology company and more precisely a
software technology company, but the pace of innovation
is also just amazing, thanks in large
to the open source community. In my view,
that makes it all possible in the first place with
radical advances in AI infrastructure
and software development methodology. As such, we are
reaching the tipping point of creating a level playing field
to turn your ideas into successful ventures, no matter where
you are, who you are, or your socioeconomic
background or identity looks like. I believe
that one important factor that will make
it or break it is our ability to
learn, experiment and adopt quickly.
In this context, I would like to share our journey at
Softrams and a few strategies to
help your teams and organizations to create that culture
and the supporting systems to take ideas to ventures
and create incredible value rather quickly.
Just a little bit about me before we start, my name is Murali Mallina
and about seven years ago I joined Softrams
and since then we have been helping our teams to
build and operate a few mission critical software
systems for our federal agency customers. I'm also
the founder and CEO at Teaching for Good, which is
an ad tech nonprofit with a twist. We build ad
tech and support systems to help anybody in the
world to teams, train, mentor or coach,
and use that passion for education as the vehicle
to raise money for a nonprofit of your own choice and
the platform itself is 100% free to use.
I am kind of just shy of celebrating 25 years
of professional software development and I have been very
fortunate to work with and build teams
across the world from Germany,
China, France, India, UK, Russia, and of course
in the United States itself. I have started my career
in telecom switching software and then
went on to work on edtech, supply chain,
healthcare and civic services. If you
want to reach out anytime, please hit me up on LinkedIn.
I do have a Twitter account, but I don't tweet that much,
so please use LinkedIn for all practical purposes.
As part of my presentation, I would like to use these
five personos to illustrate various
aspects and kind of tell you the story from different angles.
Avery is our new full stack developer that's just joined
our team and eager to contribute.
Layla is our senior DevOps engineer that loves
and often fights for Emacs as the best ide ever
the Lucy and the staff engineer that is
responsible for the governance, security and various
other aspects related to compliance. Meet Sneha,
our full stack developer that is incredibly talented end to end.
And finally Emma, our product manager that
always advocates for the customer and end users to
build great experiences across the products and services.
Of course, all these names are fictitious and I
have a vested interest in picking up all female personnel for
this presentation. It is totally intentional.
I want to take a moment to acknowledge the diversity
I meant lag thereof in the tech industry and
I believe that each of us must support to make a difference.
Let me start with an example. I'm pretty
sure you can easily see and relate to some of
these in your organizations as well. Emma is our product manager
and she got a great new idea for a new service while
talking to the customers and she believes that
it could make huge difference to our customers. So she
quickly jotted down the idea, ran it by
a few friendly customers, and added few more details
so that the entire team can now visualize exactly what this means.
I would like to ask each of you to think just for
a minute in the context of your work, your organization
and your teams, what will happen next.
Assume for the sake of simplicity that
every stakeholder agrees that there is a value in this idea.
What does it take from here to build
and deploy that specific feature
that Emma has been talking about CTO, a sandbox
or to your lower level environments to be able to
do a demo to your customers and
let us take it to the next step. What does it take to push
this version all the way to production and
make it available to your customers? Just take a pause
and think about this.
Next we will take a look at the full stack. Developer Sneha
Sneha has been keeping up herself with
what's going on in the ecosystem and she has been learning about lots of different
things. Charge GPT LLMs for example.
With her extensive experience on the product itself and
relationships with customers, she identified a
few opportunities to bring charge GPT
to change the way customers experience the product and
there is definitely lots of potential CTO bring additional revenue as
well if everything works out. Given that some
of these new services are not fully vetted in
the organization, Sneha may probably need a
sandbox environment readily available that could access
some of the deidentified data. To be able to build
this PoC and show value to your internal teams
first and as well as to the customers,
think for a minute. This is nothing unusual about the
use case or need for such an environment.
What does it take in your organization to
be able to help Sneha to provide an environment,
access to some of the data, as well as, if needed,
integrations with other systems in a safe and secure manner
to be able to build this PoC? And if
POC works out great with minimal to low changes,
we should be able to productize the idea and integrate into other
systems and go for a launch as well. So take
a minute to think about what does it take in your current organization
for Sneha to be able to get all the access environment
kind of CI CD pipelines to be able to build and demo the
PoC to internal teams and possibly to some
friendly customers?
Next, let us take a look at a slightly different use case.
This may look isolated, but it's totally important
piece of the puzzle and it's very impactful factor for
teams in many organizations across the board.
Avery is our newly joined developer and
we need to onboard her to the team and
would love to have her contribute to an important product release
that is coming up past. This requires Avery to
understand the system, know where everything is,
code integrations, development processes, delivery mechanisms,
just to name a I want to request one more time to
pause just for a minute and think what do you do today to
onboard? And what does it take for Avery
to contribute her first pr to the product?
Again, please take a minute to think about all these
three scenarios. These are pretty common in
many organizations. Everybody have lots of ideas.
Developers are always learning and always wanted to try out new
things so that they can bring that innovation and concepts
into whatever they're working on. And we are always hiring people.
You always have new team members joining your team, and I'm pretty
sure some of you are personally responsible to onboard
new members as well. So in each of these scenarios,
it used to take really long time in order of months,
for example. And in many places,
while that number may be shrinking and getting
faster and faster and getting better, it is still
a few months to a few weeks per most,
and only a small fraction of organizations. Based on my
experience, you can do this in
a matter of weeks, CTO days. Of course,
there are lots of factors that will influence this
aspect. And while I'm
truly acknowledging what each of you will go
through to make anything happen in your organization,
it is possible to build these systems, supporting systems,
and create a culture where we could definitely bring
these numbers down to a week, two days,
and I would like to share our journey of bringing that down
all the way to a day, to a few hours to
get things moving. Let me take you back.
CTO 2018 we
have been discussing about certain things that we wanted to do as
a team at the time and evaluating a few alternatives,
and we have started working on some of these as well.
And I have sent this message that morning to bring
everything together and requesting for an
end CTO end demo to be able to show that value.
And there are lots of things going on in this message.
So let me break it up and we will go step by step.
I just zoomed it in so that we can actually see and read this message
that's in the slack message and let us go step by step.
All I'm asking is that I should be able to request
a new project to be created for my idea.
I will specify a name and the
system of the tools of the platform. Whatever you call it should
be able to go ahead and create a repository for me.
And this is not just a blank repository.
I'm asking for a specific service. In this case,
I wanted to build a rust API microservice, and there
are varieties of boilerplates available, and I'm going to choose
one of the boilerplates that matches what I'm looking at
to build and so it should copy
that boilerplate. Creates a branch at the
time, Dev is our main branch. We use
trunk based development and dev is our main branch.
And create that branch for me and
set up all the hooks needed for that repo to be able
to do CI as well as CD continuous integration and
continuous deployment and as
part of the CI should be able to build it.
And I push the code up as part of the PR
run teams. And once all the tests are
passed, go ahead and deploy it to an environment
and to make it available for
all the internal teams to start with, to be able to use this and
test it and do the demos. For example, we also
want to configure a cname for that sample service that
is just being deployed and of course provide
that URL with the team so that now at least we can start
with the health check. And now that this all setup
is available for me to go and start building
and iterating on my idea, everything is all set up.
Every iteration can quickly go to deployment and everybody in
the team should be able to access that really quickly.
There are lots of things going on here.
So to understand a little bit better, I would like
to go ahead and group these as capabilities that
we can build into a system or a platform,
whatever you call it. I extracted some of those capabilities
and grouped them into four different segments, starting with the
first quadrant labeled as knowledge. This brings
together the domain knowledge of the organization
into a commonplace one single place where everybody can
go and get it collects all kinds of services,
applications, libraries, components, design, systems,
starter templates, et cetera. And in
some organizations, one team doesn't
know exactly what the other team is working on.
And this part of the puzzle is instrumental
for product teams like SNeha, the full stack developer, or Emma,
the product manager, to know the lay of the land, to be able
to understand the whole system, what is currently available.
This allows them to think about maybe
they could take this new idea. They don't need to build it from scratch.
They could probably be composed from existing services,
or use one of these services and build on top
of them, for example. And the next segment going
clockwise is infrastructure, beginning with
the code repositories, creating those sandbox environments,
as well as the lower end production environments
that require to host these applications,
as well as the physical and virtual environments in the cloud, for example,
where these can be deployed, developed, build, hosted and deployed.
Typically in many organizations,
a DevOps team will handle a large portion of
this responsibility to be able to build and operate these systems.
Going to the third segment, it's about the workflow,
the eventual delivery itself, CI CD pipelines,
like I mentioned, configuration for each environment to be
able to go to production. And another critical aspect of
operations of a production system is the observability. All kinds
of metrics, logs,
ISO, infrastructure itself to be able to
scale and perform reliably. The last
one, but one of the most trickiest segments of all is the
governance segment. All kinds of safeguards, cost controls,
access controls, the new isle of our era, the data
and variety of processes and workflows with
guardrails built in. While this diagram
illustrates all the key segments, it doesn't fully cover the
list of capabilities we need to support the entire
lifecycle of any product or any idea.
So I put together this diagram and
you can see I have added two more sections to it,
if you will, research and design as well.
Typically many products and discussions in the ecosystem,
and they talk about software factory or platform engineering or internal
developer portals. They do not sufficiently cover this
aspect of it, the research and design. But I believe this
is really important to include the full lifecycle.
And you can see here in the research area,
it's very important for the product teams to have access to data
analytics as well as any of the existing research.
All the information is cataloged as well as all
the organization level policies that are available to be able
to iterate on the idea quickly as part of the research.
And when it comes to design, to be able to build
products with consistent user experience, you need appropriate
organizational branding or product branding. Design system
itself, assets compose, UA, libraries and
the guardrails that I'm talking about. So these two are in
addition to the build deploy operate that we looked at as four different segments,
starting with the starters, generators, libraries,
services that are already existing,
repository for source code,
infrastructure, environment, CI, CD and whole nine yards that we
just mentioned about. So a good
ideal state of the system must include both
research and design as well, so that we can support the product teams
to take an idea and go all the day
down. CTO deploy it into the environments where customer can access
it.
And this brings us to the next step and putting
all these things together, and we want to give it
a name. And there are multiple different aspects
and multiple concepts. People refer to these in
the ecosystem. And in my view, this is
what we call a software factory is specifically
there are three different important aspects,
tools, processes and the content itself.
And of course we want to build on
top of the existing knowledge instead of reinventing the wheel every single
time. So we are going to adapt,
assemble and configure these tools,
processes and content. CTO make everything work and
I would like to move away from that theoretical definition and
bring a little bit of extra focus to some of these
aspects and talk about how we can evolve as
a team. And these are arranged as concentric circles
for a reason, based on what we learned. Again,
totally opinionated and biased based on our context and
what we are building. Our suggestion is to start by
pulling together the catalog at first and
go outward and build each capability or system
as part of that evolution or maturity in your implementation.
And treat this whole exercise like you're building a product.
And do this in an iterative fashion, focusing on the most
important use case first, and then keep extending
it, starting with the catalog. Make sure it is fully
self service. Approval based mechanisms
are okay, but make sure that they do not introduce
an extra friction or delay the whole process.
And if you focus on self service systems
with guardrails built in, you get the best of both worlds.
Next layer is of course building the overall infrastructure that is required
to support the entire lifecycle, starting with the sandbox
environment all the way to the production environment with observability
built in. But it is super important to automate this
part as well. This not only brings
speed to your workflows, but also bring that consistency
that is required at organizational level. And of course
you can scale it once you automate it. And last but the
most important aspect is the governance. We must
tread this really carefully. We must include guardrails
and kind of set up some thresholds to make sure that
every aspect of the software is build and delivered according
to your organizational security and policies, but make
sure that they do not come in the way of accelerating innovation
itself. That's the key. And while here
I also want to bring up two other common concepts in this context,
platform engineering and internal developer portals
or internal development portals. Based on who you speak
to and variety of products that you're looking at, these concepts
are used interchangeably. However,
we do see these things slightly differently. That's why
I referred to this presentation itself as software
factories instead of just calling it as a platform engineering
or IDP. And there is no wrong or right answer here. This is
how we are interpreting it. So I would like
to go to a Wadley map and
then show some of these practices to illustrate the
evaluation, evolution and the concept
of these software factories. To be able to put all these
concepts in the context many organizations
have CI CD at the minimum and then bring the
culture of workflows as well as processes around
it. Along with the team automation, the tools
you will be able to build a devsecops culture and both
of these are very well matured and most organizations
are very familiar with these as well. Platform engineering is
referred to as the capability to build common
infrastructure, workflows, delivery mechanisms and
distinctly maintained by a separate team,
typically called as a platform engineering team and offered
as a service or product to the rest of the organization. Based on
our understanding, platform engineering is focused
as discussed in the ecosystem is more around the devsecops
area as well as building these delivery mechanisms.
But take this idea to the next level and look
at internal developer portals or internal development platforms
to bring together that knowledge, the self service aspect
of the portals as well as bring this observability
pane of glass so that product teams have access in
single point not only the knowledge,
a catalog, but also for all the deployed
services, visibility or observability and analytics that
they can use to come up with next iterations of
those ideas and products. In our internal implementation
we call codename Eagle. We bring both
platform engineering as well as IDP as one product or
service because both are equally
important for this innovation platform to
be successful in your organization.
So next, let us look at how we can build a
software factory for your own team and organization.
And before we go further in building that software
factory, I would like to remind that if your
teams currently use low code or no code platforms,
you may already have a great working version of a
software factory. However, many organizations
these local no code platforms only cover a fraction of the
workloads and applications. So we will go
ahead and look at other aspects of software factory
that you can build. As explained before,
please do start with the catalog first,
and this is probably the easiest part to get everything
together, but it's also the most important and
impactful part and you don't need a
fancy tool to be able to do it. Just start collecting
all kinds of scripts, tools, generators,
starters, as well as your wiki pages,
conference pages and altogether along with some appropriate documentation,
that itself will give you a good chunk of what
you are expecting in a software factory. In the
next few iterations you will look at other
use cases and other needs.
Then you can look for more scalable approaches and bring a little bit of automation,
for example, put a workflow around it if
you want. And if your organization build
infrastructure currently in a single cloud like us,
or rely on serverless workloads like our teams
do, you're already better off
leaning on a platform approach that is already available in
the cloud itself. For example, we use AWS
really heavily and AWS service catalog
actually takes care of almost every capability that you're
expecting in your platform or a software factory.
So before you start looking at a completely
new platform or a new product you want to bring
in to build your software factory, take a look at the service
catalog as a starting point as well. And of course,
if you're looking at a multicloud environments or
you require lot more flexible control on various aspects
of the platform, you will be evaluating some of
these other products and starting from cloud foundry
that is well known and well matured for a long time before
even we started talking about software factories and platform
engineering and the most modern
products like Humanitech for example, Ops level, these are
some of the new platforms that we have seen
in the ecosystem coming up, so please
evaluate some of these before you go and build your own.
When we started in 2018, some of these modern platforms
didn't exist and also based on our needs and
more focused serverless workloads, we went ahead
and started working on building on our own.
Starting with we started with AWS, amplify, for example,
Terrafi modules, terraform modules,
and some experimentation with a tool called Terrafi that we wanted
to build it internally. Then we ventured into AWS,
CDK for example, and finally put together our platform, which we
call it as Eagle on top of Kubernetes,
I'm going to quickly bring up another worldly map
showing the evolution as well as availability of
various aspects in the ecosystem.
And the reason
I wanted to bring this up is that the more you
use matured products or managed
products in this ecosystem, you will be the better off as well
as you will be able to bring and build your software
factory rather quickly. You may have seen
variety of services available, variety of CNCF products,
for example, all the projects in CNCF,
you have that flexibility to be able to use it.
But my suggestion is to first look for managed
services within your cloud before you venture into bringing
them onto your own surface. And essentially
no code, low code if you can. And most
of you already know that there are
a range as well as a variety of services available in
the CNCF landscape that covers pretty much
all use cases that you may be looking at to support your
workloads. However, every single service that
you bring you are just increasing your total
cost of ownership exponentially.
This requires setting them up, configuring them,
running them, operating them, taking backups,
and all the adaptive maintenance that goes with each of these software
units, managing their versions, migrations,
security this is really, really expensive to
risk, expensive risk to bring all
these services and manage by yourself.
We definitely want to suggest that do
not attempt unless it is definitely your core business
or if you must do this as the last resort,
not as the first idea itself, to be able to build your software
factory using this app. So of course
we also use a fair amount of AWS services
beyond some of the things that can be run inside a container,
since we use Kubernetes as our core platform
where anything that is containerized can run on Kubernetes.
But what will happen to other services like we
use a lot of serverless workloads as well as managed services
like S three. So we ran into
this product called Crossplane. This is a great framework
to bring cloud specific and all these manager
services into the same umbrella so that you can provide this
uniform workflow to manage all your resources and all your services
like one system. And also in
our system we used backstage as the
portal and the knowledge base and tecton
for most of our CI series workflows. And I'm not here to suggest
that any of these products are the
only products available. There are a range of these
products available, so choose based on what fits
in your context, in your organization and the experience of
your teams. So I'm going to leave the discussion on the
exact stack that you would like to look at.
However, I believe now that we
have an idea about these capabilities and
why we want to bring all these capabilities together. Whatever you
want to call it, platform engineering, IDP, or a software
factory like we call it, what we want to do
now in the next couple of minutes is to share
what works based on our experience.
As we conclude this presentation,
the most important part I would like to start with is
do not try to build your own cloud.
Want to use the public cloud
or private cloud? Make sure that you use as many
serverless services as possible as well as fully managed services
rather than creating your own cloud, even though there
are great CNCF projects available for you to bring and
run on your own. Kubernetes clusters, but that
control and flexibility they provide
are also exponentially expensive to build, operate and
own those systems to be able to continuously manage. So I
would say focus on the value rather than that control and flexibility
and start with the most important golden path.
Start with researching your current ecosystem with a journey
map for example, or a value stream map or a service blueprint
if you will first document the journey.
What does it take in your teams, in your organizations
to take that idea and go all these three steps
to be able to deployable your system and then
look at the most expensive steps in the process in
terms of time and effort, then automate them.
And of course you do it in an iterative fashion to
learn more about the use cases, what is working? What is not working?
Then go ahead and optimize, rinse and repeat.
And since a software factory brings
a huge change across the board, I would like to conclude
my presentation with this quote. You never change things
by fighting the existing reality to change something.
Build a new model that makes the existing model obsolete.
So when you introduce you may expect a ton of resistance or
skepticism. So make sure to be able
to move forward instead of directly
going and making a pitch for the entire product. Build that
initial MVP version and show the difference and
you will get the buy in. I hope this helps.
Thank you very much for joining this presentation.
Appreciate if you take a minute to share your thoughts and questions.
Have a great time at Conf 42. Thank you very much.