Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, thank you for joining us.
Our talk is about CICD and how a number of SRE and
DevOps practices are revolving around it.
Let me start with the QK introduction.
So this is pretty much who I am. For most part of my
core career I have been in SRE and DevOps,
just a small and shameless promotion. My book is book
on architecting cloud native serverless solutions is coming out this June.
If you are interested, please watch out for the release or connect with me.
Now let us move on to the talk so
SRE principles I wouldn't go into the definition of
SRE adopts. We are in an SRE conference and I'm pretty sure all
of you are aware of them. I'm also sure that we all
acknowledge the fact that SREs and SRE organizations comes in
all shapes and forms and what they do and how they do
things. Those are mostly defined by their organization and their
organizations culture to a large extent.
But there are certain principles that all SRE organizations
draw from our foundational book that is the Google SRE book.
I will do a quick recap of those in this slide.
Embracing risk SREs estimate the
cost of reliability, assess and manage the risk involved
in improving the reliability and use the error budget smartly slox
or service level objectives. It is a direct measure of customers experience
and hence it translates to the reliability of your service.
Toil eliminate toil is
an important pillar of SRE and toil is a repetitive and
manual task required to keep your production services up and
running, and this can be eliminated using automation monitoring.
Well, nowadays we call it observability and it is the key to measuring all
the critical vehicles of your systems. Release engineering
standardizing the build and release of software into production automation
automating the repetitive work to improve developer velocity and productivity,
including building platforms.
Simplicity we all work with complex distributed
systems. Breaking down them into simplified services is
key to managing a better infrastructure and bringing better reliability.
Now, there are a lot more to the SRE functions than these seven
points, but these are the most important and the most often
prioritized ones.
DevOps practices so in this slide we are going to look into the DevOps practices.
But unlike the SRE book, there is no one standard list
of recommended DevOps practices. So I have condensed some of
those commonly found practices into this list.
Communication collaboration by definition,
DevOps bridges the gap between traditional dev and Ops teams.
Now this is achieved through collaboration at all stages of software development lifecycle.
The agile methodologies like scrum and Kanban are very critical
in this phase continuous improvements.
So continuous improvements involve gradually rolling out small changes so that
the development teams can iterate their products and services fast.
And this involves tools and practices like continuous integration,
test driven development, continuous delivery, et cetera.
Monitoring. Here we see a recurring theme here. We need
observability to assess whether the steps we take for continuous improvements
are fruitful, and we also need it to ensure that we
have is one production systems and it gives us a continuous
feedback loop automation. Similar to
the SRE principle, DevOps also builds tools and platforms
for continuous improvements. Remember,
just like the SRE principles, this is not an exhaustive list,
but rather the most common and the most prioritized practices. And some of these points
could even be broken down further into its own list.
Now that we have looked into both SRE and Dow's principles, let us
take a quick look into how they relate to each other.
Class SRE implements DevOps now when we
discuss the relationship between SRE and DevOps, this is a statement that comes
up very often. The idea this statement projects is that
DevOps is a set of high level principles that should guide the SDLC
and the agile practices, and SRE implement many
of these principles by adopting them to the distributed production
services. There are many key features and
key areas where SRE and DevOps align. For example,
both value collaboration communication and they use it to build teams and
set organizational culture and both operates on a shared ownership
model along with the developers.
Accept the change as the medium for business and organizational
progress, understand and accept the risk that comes
with the changes. So this is a key principle that
changes are necessary and changes can bring failures.
Change management and software release with the right tooling and controls, using CI
CD is a critical piece of the entire software development lifecycle and hence the
part of SRE and DevOps. Automation is a
key for toil reduction and developer productivity. Building developer
tools as well as platforms for managing production services is part
of the automation initiative. Now this is where platform
engineering also comes into picture.
SRE and DevOps use data for decision making along
using observability tools. Now SRE focus
mostly on slos while DevOps focus on dorometrics.
Now this is just a primary focal point, but observability covers
a lot more than this and is required for running production services.
There are more areas where SREs and DevOps practices align,
but this should give you a good idea.
Now if you observe this relations and DevOps and SRE
in general, you will see that there is a recurring theme emerging
and that is change. Now these changes could be code,
configuration or infrastructure. Now why
is change important? Let's take a quick look.
So change is what brings business value. Any new features,
any stability improvements, and any other sort of changes, they all
work towards this one goal. Ultimately, if a change
doesn't bring value directly or indirectly to the business, that is
not a change worth pursuing. Now, does this mean that
change is always positive? Let us see.
So changes can lead to production outages,
bugs, LSL breaches and a lot more.
While all changes are well intended, they don't always bring value. Sometimes code
and config changes can introduce bugs or even cause production outages.
Now, the outages or the incidents as we call in SRE world, can directly
impact customers. Sometimes it can also set back the engineering team
by a few hours or even days while they are busy fixing
those incidents. Now, SRE builds incident management
practices to effectively deal with incidents. But in a software
system, as long as there are changes, the chances of them causing
incidents will also remain strong.
So if we need to avoid service disruptions and customer
impact as much as possible, we need to take one step back
to the stage where before the code hits
the production, let us see how that is done.
So we roll out changes that have positive impact on our products or services.
But once we do decide that these are changes to go
that we need to push through, then we need to go into the discipline
of change management. Now, change management allows SREs
and developers to assess the risk of change, evaluate the acceptable
risk, and roll it out to production.
Now, tracking of changes, along with its evaluation
and acceptance is the part of the necessary bookkeeping in change management.
Now this is usually achieved using a software change management repository service
like Git, along with pull request and peer reviews. And I'm sure
all of you are familiar with this workflow. Now, SEM process
stops at this point. From here, SRs and dobs have
to take it further with release management. Now this involves testing,
building and development and deployment.
Now this takes us to the central point in our talk CI CD.
So the central theme of everything we discussed so far is change.
Now, with all that we covered about incident management, change management,
SRE and dows principles, all this revolve around
changes. But how do we manage and roll out changes effectively?
The answer is obvious, and that is where CI CD comes into picture.
It is the vehicle of all changes in your software infrastructure.
Now, SEM is the foundation to CICD, but most
of the tooling for SEM are built around your issue tracker and code repositories.
CI CD will integrate with these tools, but it involves
a lot more than change management at a high level. It involves
verifying and testing the code, ensuring compliance,
building artifacts, and deploying the code into different environments.
And finally, that code will make its way into production if it is production
worthy.
In the title of our presentation, we called CICD the SRE DevOps
overlay. Now what do we imply with that? It is
the idea that a large spectrum of SRE and DevOps
responsibilities are influenced by or even a
byproduct of the changes, and the change cycle is managed
through CI CD pipelines. Now let us do a dissection
of various aspects of CI CD and how it relates to DevOps and SRE.
Before we move into those important points, recall that both SRE
and DevOps advises small incremental changes to be rolled
out to avoid impact and manage changes effectively.
Now, the only way to achieve this is through a fully automated
CI CD pipeline testing.
Now this is an obvious one, but we never do this sufficiently.
Functional testing improves our confidence in the functionality of our application.
Nonfunctional testing, on the other hand, improves our confidence
in the application stability, scalability and security.
Now, most of the functional tests, like unit and integration tests are automated as
part of the continuous integration phase, whereas the most non functional tests are automated
at various stages of deployment to various environments,
including production stage, UAT and whatever you might have.
Now, some non functional tests like performance testing are also done
post production deployment, and this also
generates a lot of reports on code quality, test coverage, all those
things. While DevOps drive
the adoption of most of the functional tests, SRE concentrate on the non functional
part of the equation. Now this is not a set boundary, but in
an organization that has both SRE and DevOps teams, this is how it
usually evolves.
Now there is no single standard for application configuration.
You could use any of these listed methods or combine many of them.
But remember, one of the key SRE principles is simplicity in
design, configuration, business logic, et cetera.
Whatever you choose, be consistent with it and make sure that
any changes to your configuration values can be versioned and audited.
So you infrastructure
and configuration there are different ways in which you can provision your
infrastructure and configure the application. In traditional it infrastructure,
configuration management was the standard way to provision and configure
your servers and applications.
Now if you have workloads running on bare metals or vms, make sure to
use a configuration management tool and commit your recipes to version control.
Now these recipes could be play ansible playbooks or it could be
salt stays in the salt stack ecosystem, but you get the idea.
Now, the era of cloud management and Kubernetes has brought in newer
ways to provision, configure and configure infrastructure.
Infrastructure as code allows you to declare your infrastructure components as code
and then have an automated system apply those changes to your infrastructure.
Network as code is a subset of this and applies the same principles
to network device configurations. Now there are
generic tools like terraform as well as vendor specific tools like AWS cloud
formation that are used for infrastructure as code management.
But irrespective of the technologies used, this enabled SRE
and DevOps to treat their infrared resources as code and manage their lifecycle
Gitops and CICD now the evolution of IAC
or infrastructure as code gave birth to another idea,
Gitops. It is the new philosophy of for managing
systems and resources. Now this philosophy can be broken into following
elements. Desired state the declaration of the
desired states define what is the end state of
your infrastructure and its resources. Now, irrespective of the
tech used to declare these resources, the changes should
be versioned and immutable, which naturally leads us to storing
the declarations in Git. Now there will be system
specific software agents that will pull any changes automatically and apply it
to the destination system. These same agents will also watch in
real time for the state of the system and reconcile any drips to
the desired state.
Now Githubs can be implemented in a number of ways. As long as you
have a Git based version system that allows either pull or push based notification,
that should be enough for agents to discover the changes and apply them.
But what is the most natural way to do this? Of course
it is CI CD. If you commit the changes to disabled state into
the gate, the changes can be picked up by a CI CD system,
treat it like any other application code and proceed to apply those
changes. Now this is quite convenient and uses all the tools
and workflows that SRE used in the application deployment Gitops
and the XS code revolution. Now the evolution of
Gitops and infrastructure as code brought out more practices that can
be declared as code and implemented.
Policy as code allows you to enforce organizations and security
policies on your code, configuration and resources.
SLOS code definitions and tracking of
slos for large number of services very tricky.
Codifying them into SLOS code makes it very easy.
Dashboard as code is something that has existed for a while. You might have
seen this with Grafana and similar tools. Now similarly
you could configure your observability sidecars also using code.
Even the CI CD pipelines themselves can be defined as code and this
will help you when you onboard new projects.
Now there are more to this list, but this is how you bring SRE and
DevOps through CI CD.
So, to summarize what is discussed so far, a large number of DevOps and
SRE practices can be implemented via CICD.
This helps in standardization of the best practices and improve the reliability
posture of your services. Make sure that your CICD
evolves to accommodate these practices as your SRE and engineering maturity grows.
So there are a lot of advanced SRE DevOps practices with CI CD.
And in the second part of this talk Gerima will discuss the more advanced concepts
and the futuristic look into CICD. But before we
go into that, I would like to take a minute to talk about our mutual
association with CDF, the continuous delivery foundation.
So CDF is a foundation under Linux foundation umbrella similar
to the CNCF. CDF is a vendor neutral body
that hosts a number of popular open source projects in the CI CD space.
Its mission is to bring together vendors, developers and end
users to advance the standardization and best practices of
CI CD. Now this is the official definition of
what CDF is, but the next slide might give you a better idea.
These are the projects that are currently managed by CDF,
Jenkins, Jenkins X, Screwdriver, Spinnaker and Tecton Power
CA and CD pipelines. Persha is a decentralized package
network based on blockchain. Otelius is
a microservice catalog with supply chain intelligence and domain driven design
support. Shipwrecked is a framework for building container
images and CD events is a common specification
for continuous delivery events. I'm sure this would give you a
better idea of what CDF is. Me and Garima are
community ambassadors of the CDF and work towards the
better community adoption and standardization of our projects
as well as the general CI CD best practices. Over to
you Garima. Thank you everyone. Hello everyone
and I'm Garima Bajpai. I'm here to talk about kudaneous
integration and delivery, the SRE DevOps overlay.
This is a joint topic which we have taken up together
with Safir and myself. You must be having
a good view on what the topic is all about through Safir.
Now I would like to talk about from a high level
futuristic perspective. Why should you care
about this topic? So before we actually get going and
get started on the conversation today I'm going to have with you
I would like to introduce myself. I'm Garima Bajpai based
out of Oreva, Canada. I am the founder
for the DevOps community of Practice here in Canada which has
several chapters. It has around about 1500
members and it is at various
locations. If you have not checked out this community of practice,
I would do recommend to do that. I'm also the chair
for the Continuous Delivery Foundation Ambassador Group,
which is a group of practitioners primarily
in the continuous delivery space which is fostering change,
evolution and future perspective of continuous delivery
when it comes to open source technology and tools.
I'm also a course creator and content provider with various
affiliated organizations. If you want to kind of check out my work,
you can go to DevOps Institute as well as condensed
Delivery foundation for references and content
and courses created on DevOps and SRE primarily.
I am also nominated for the DevOps Dozen
in 2022 Community Awards for top DevOps
Evangelist and one
of my core assignments for this year
is I would be publishing a book on strategizing
content delivery in cloud with pact. It is coming out in
July. If you haven't checked out that I would
recommend to do that as well. It is available on Amazon for
pre ordering. So now I get started
with site reliability engineering and DevOps.
It is amazing and hard to believe that we only started a decade
ago with all this, right? With most of
the practices and concepts which are in the content delivery space,
there's exponential growth of tools. There is also
increasing complexity. It is inevitable that
the complexity which is increasing with the rapid adoption
and it is bringing new operational challenges,
risks, overhead and cognitive load on the practitioners.
Moreover, cost with more and more services and applications
moving to the cloud, financial practices and engineering practices,
SRE getting highly integrated. So if you have
not heard of finops or cops optimization dryers
in big and small organizations, I think it's high time to
check out that movement as well. And lastly, I would
say skill shortage that there are many aspects of learning people
are getting behind due to lack of self development and increasing a
major skill shortage. So there
are challenges and opportunity for both sides,
DevOps and site reliability engineering. But before
we actually move forward, we would also like to understand that
how did we reach to this stage? So if you think about
how these two movements got started and
from a DevOps perspective, the DevOps practices were primarily
kind of developer centric. It was
a push from the developer productivity
and obviously the evangelist started
looking at how do we shorten the lead time for
our software delivery or incremental deliveries. And that's where
a lot of practices got kind of introduced and adopted.
Whereas there was a specific set of industry practices on evangelist
which were looking at like how do you bring enhanced reliability
posture when you talk about decentralizing, or when you
talk about bringing DevOps practices on the table with flow,
feedback and experimentation?
So there was like a constant push on how
do you bring customer experience, reliability posture
and increasing stability with
the increase of developer productivity in the same
context. So we see that these two mutually enforcing
practices are in waking for several years now,
and we are at a stage where we would like to
kind of discuss that, how site reliability engineering and
DevOps would be
pivoting to an optimal operating model and
to fully realize the potential of, let's say, DevOps at scale,
the integration of SRE practices is essential. Everybody agrees
to that today. Balancing investment in tools,
upskilling for reliability, visa vis rapid innovation
is needed to bring that optimal operating model
in place. And could continuous delivery
be that common code, the common trigger
to be the binding force behind that optimal operating model?
Let's explore more. So, before we
actually go further into this conversation, it is also important
we talk about the law of diminishing returns.
And why do we do so? Here in this
talk is because if you think about DevOps practices
or SRE practices, there's a certain set of
output which is envied, which is expected
or which is aligned to the business goals. And there's a
substantial amount of time and effort needed as
an input to steer that right. But the
more we actually move towards incremental
deliveries or enforcing these practices,
we will realize that there is a point where
we will have the point of maximum yield,
that now,
howsoever we provide you
with input, with time, with people, with efforts,
with practices, with tools,
we have reached to a maximum yield point. And then
from there we would be dropping
our productivity to negative returns because of
the complexity which is getting introduced, the number
of tools which we have adopted,
and also due to other factors which we have talked about,
like cost and skill gap. So in
order to ensure that we understand that there is an optimal
operating model which is needed to be
put in place for individuals, communities and systems
to be sustainable, what can
be that optimal operating model? And how do
we know, or how do I know that my organization
is ready for that SrE DevOps overlay which
we are talking about here, which can bring that optimal operating
model in existence?
So there are few things which we can do
as an individual, as teams, as organizations,
and these are questions which we probably will have to ask ourselves.
So, first of all, the main business objective,
why are we deploying and what is
our mission vision? What kind of applications are we
deploying? Are they monolithic or microservices? Where are we deploying these
applications? What Sre the core objective for organizations? How many cloud
providers are involved? How could we keep them converged?
And lastly, how often do I want
to deploy and why? If we start answering
these questions, we will come to a point where we would be
able to assess or analyze our state of nation
from an organizations individual or community perspective.
That whether we are ready for that DevOps SRE overlay and
start talking about the common
core. The common core which is the continuous delivery and associated
practices. And when we talk about that
common core, essentially we are talking about four principles,
declarative way. The entire system has to
be described declaratively. The second
principle is version and immutable. The canonical
desired state is version and does not matter which tool is
it. The third principle is pulled automatically.
So how much operational overload we
have in the system, and if we approve changes
automatically and apply it to the system, that would be
one of the principles which can help you get to that optimal
operating model. And the fourth point is continuously
reconcile so software agents to ensure that correctness and
learn on diversions. So if you think
about these four principles, make sense to your delivery.
Let's continue this dialogue.
So we have reached to a point where we have
built some consensus on the common core. Continuous delivery
practices and associated practices like continuous integration
and deployment can provide that common
core for SRE and DevOps overlay.
Now, we also have,
I would say, challenge in terms of how we measure
this progress. And primarily, if you
think about measurement perspective, a lot of things have
been done, and mostly we talk about industrial
practices or best practices around Dora.
And I would like to kind of also highlight through my
talk that if you are looking at that
common core, the optimal
operating model, it's time for you to go beyond
Dora and I would highlight some of the
functional advancements which are associated with this.
Functional advancements of supercloud, or sometimes referred to as
cross cloud, providing interoperability with
specifically content delivery capability. So we'll have
to kind of assess that posture moving forward using
an event driven approach and introducing a high level of
reusability, flexibility and full stack interoperability
for the complete software lifecycle being
the second one. The third functional advancement I
would emphasize is progressive delivery with machine learning capability
and reducing the challenge of adoption of progressive
delivery. The fourth bullet item, and that I
called silver bullet is s bomb for the future
as SBOM software bill of material continuous to evolve.
So is the framework for data exchange and the need for a standard format.
So these are the functional advancements which you can correlate
to your measurement of success in getting
to that optimal operating model.
Now I would highlight this SBOM
as one of the core of fundamental of foundational capabilities
which will help you manage the complexity and securability
of modern software deployment. Gartner believes that
in 2025, 60% of the organizations
procuring mission critical software solutions will mandate as form disclosure
in the license and support agreement, up from less than 5%
in 2022. It is essential to make
it as part a measurable part, a tangible part
of your delivery as you go along. There are
operational advancements as well, which we should see for
measuring portability being one of them.
There are variety of languages, platforms and frameworks being
used today. How do we make portability as
one of the measuring criteria for our software delivery
components? To reduce the cognitive workload
for not only developers but also for SRE practitioners,
flow optimized and observable deliveries. So whenever
we are going to decentralize our delivery, we go from monolithic
to microservices. We have a distribution management system.
We need to have a flow optimized and observable system.
So have you introduced that component in your software delivery
that can also be one of the measurable aspects of the
common core? I will also talk about resource optimized
and resilient. That means optimized posture of infrastructure.
It is not okay to add fixed cost to your products as
you go along, so you will have to look at that optimization
at the infrastructure layer. And lastly,
I would say real time and dynamic. So how real time and
dynamic your software delivery capabilities
are which can support rapid scale up, scale down and address
the real time requirements.
When we talk about all these changes, we also talk
about the infinite bandwidth and zero latency from
demand and supply value chain system. So we need to kind
of ensure that we have some supporting,
measurable general purpose
pipelines and services which can be serving
our fundamental needs or general purpose or they can be reusable.
We cannot forget small and medium business services.
So think small before you think big and high
performance community collaboration hubs which are
mostly needed for entrepreneurs,
for evangelists, for practitioners to foster
collaboration and create a possible futuristic
approach on all this.
I will not talk about more on this because
state of art AI capabilities SRE going from beyond
cloud native to edge native and full stack interoperability
and enhancing reusability can be some
of the key areas of focus
for the next generation practitioners.
When we talk about the overlay of DevOps and SRE,
we also think about net zero commitment and SRE can
lead the way there. One of the critical threads which ties
everything back together is net zero commitment. SRE can
lead that way. And what if we can create a marketplace of
carbon neutral products and services where we can cascade the impact?
If we would intend to do so, we might have
to consider future evolutionary changes and some of the next steps
could be as follows, like mapping your carbon footprint for
products and services and identifying hotspot features guidelines
for financing of ICT products and services,
certification and decarbonization of ICT services and
products through tools and processes. So we
have tried to kind of ensure that we bring here
and now aspects of the SRE DevOps overlay through
Safir's conversation and also try to ensure
what is in stored for future from an optimal operating model
perspective. When we talk about SRE DevOps overlay and how
continuous delivery stack can create
that pivot or create that binding
together perspective around all this
for the future. If you like this conversation, do follow me
on LinkedIn or get in touch with me or contend
through our social handles. But for
now I would say goodbye and thank you for
listening to our talk.