Transcript
This transcript was autogenerated. To make changes, submit a PR.
Welcome everyone to my talk on securely unifying deployments in an
organization for increased governance.
I'm Hariharan and today we'll explore how secure deployment practices can
enhance governance within organizations.
Let's get started.
A quick introduction about myself.
I'm currently a software engineer at AMD and have previously worked at Athena
Health, Bain Capital and Bose Corporation.
My interests span DevSecOps, Distributed Systems, Applied AI and Embedded Systems.
Feel free to connect with me on LinkedIn or check out my GitHub repository
for more insights into my work.
So here's an agenda for today.
We'll start with the organizational journey and then discuss
the importance of security.
Next, we'll deep dive into the unified deployment model we implemented
based on case studies, governance practices and wrap up with the
future directions and key takeaways.
So let's get, let's begin.
Organizational journey.
The deployment landscape before unifying deployments or before the unification
was fragmented and passed several challenges, including inconsistent,
processes and vulnerabilities.
This drove us to embrace a DevSecOps approach to unify deployments
and address these challenges proactively and comprehensively.
So if you look at this slide, with multiple teams working on deployments,
each team had their own strategy to deploy artifacts or containers.
Some of them decided to deploy VKS, some of them deployed ECS, and
some of some teams had their own homegrown deployment strategies.
So as we had some metrics And some, monitoring tools to track the number of
deployments and the type of deployments.
We realized that we did not have a unified way to act, to govern all
these deployments or set rules or strategies or address vulnerabilities.
And so on, which drove us to the DevSecOps journey.
So our DevSecOps journey focused on shifting security to the left by
integrating it into CI CD pipelines.
Collaboration between developers, security and automation teams was
essential to achieve security across the software supply chain, commonly
referred to as SSC, especially with increasing reliance and an increasing
reliance on open source software.
So there was a rapid increase in the usage of open source software as And
the number of packages and versions in open source software released per
year was also rapidly increasing.
So let's try to really understand why is security supremely important?
What is the need to, what is the need for security in deployment pipelines?
And how is all this helping in governance, right?
So, first security is crucial because modern applications heavily depend on
open source software and third party components at every stage of the SDLC.
So, if you take a look at the next slide, at every stage of the SDLC, third
party components play a pivotal role.
This dependency exposes to vulnerabilities and addressing these
proactively is a non negotiable.
So, that's a growing attack surface, right?
So, what, I mean, when you look at the attack vector in the deployment pipelines,
they can be supply chain attacks, they can be secret management attacks.
There can be misconfigurations, such as giving excessive permissions
or using unverified images.
For example, if you take a real world incident like log4j, that had,
that had tremendous impact, right?
So the idea is to follow some key security practices, by ensuring that, we, care
about security at each stage of the SDLC.
For example, we can have immutable infrastructure and container scanning.
We can have, we can have RBAC, role based access rules set.
we can have digital signing of artifacts.
we can have automated compliance checks, right?
So, and shifting, security to the left, we can involve SAS and DAS scans
at very early stages of the pipeline.
And finally, we should also like inculcate a DevOps culture, DevSecOps
culture across the entire organization.
So the software development lifecycle begins when anything enters the
ecosystem and ends in production.
Threats like CVEs, malicious packages, and supply chain attacks.
underline the need for robust reduction and remediation strategies.
But when you zoom out and look at it at a very high level, right?
All these threads that come into the ASDLC, they can be classified
as source threads or dependency threads or build threads.
Sorry, source threads.
Some examples of source threads are submitting an unauthorized change,
compromise, or the entire source repository is compromised, or you're
building from, or you forked a repo and you're building from a modified source.
Dependency threads are essentially like using compromised dependency.
our build threads can be compromised build processes, compromised
packages, compromised, or someone had, you know, inadvertently uploaded
a modified package and so on.
Right.
But the current approach in towards security as a whole is
to detect it and remediate it.
But, but the idea that should trickle down is to prevent it, right.
So that it doesn't even happen in the first place.
So that, so let's try, let's dive a bit deeper in understanding, securities.
Vulnerabilities, CVEs, and then how we could potentially address them, right?
So the rate of public CVEs is rapidly increasing every day creating a constant
pressure on dev and security teams So this data is like publicly available
and the The irony is that many of the critical CVEs that are reported Not all
of them are fully exploitable, right?
So only fully exploitable, fully exploitable CVE is really
dangerous, but then majority of the CVEs are not exploitable at all.
Like for example, if you look at the screenshot here you can, you
can notice the message saying that the vulnerability has been
modified since it was last analyzed.
It's awaiting reanalysis which may result in further changes.
Right.
So given in the last few years, given the rise of ML models and generative AI, even
machine learning models can be weaponized.
Public repositories for models have become targets where
attackers embed malicious code that executes when the model is loaded.
This shows the sophistication of modern threads and the importance of How we
should care about security at every stage of the software development life cycle.
So for example, this example is like if you take a look at this model called
totally harmless model, completely named, for the irony, It looks like it looks
like a very legible model, but then when the model loads in the machine, The
malicious code actually executes, right?
And the malicious code is actually hidden in the binary data.
So there is also generative AI tools are now introducing vulnerabilities as
well, such when they hallucinate, right?
So where a developer inadvertently installs malicious packages.
for example, if you take a look at this flow, the attacker essentially asks that
GBT or any, any, or an equal engineer tool a question, and then when it hallucinates,
it responds with an non-existent package, and then the attacker decides
to publish that package into a, into the respective container registry,
you know, with malicious code in it.
At a later point of time, a developer asks the same question and then the,
and then the generative, A model replies with, the same malicious package name.
And then the developer goes and grabs it from the container registry.
So this flow, of injecting malicious packages and then affecting
developers, this highlights the urgent need for secure coding
practices and dependency management.
And also, right, so the next question, the next natural question that comes
to our mind is, do generative AI tools, can they write secure code?
Can they ensure that the code written, or, is devoid of, you know,
vulnerabilities themselves, right?
On the, on the left, if you ask, chat, GPD or any other generative a tool, write an
endpoint that turns a file, the generated code is vulnerable to path revers.
And even if you explicitly make it to write a secure endpoint, it still doesn't
help because it's still vulnerable, right?
So we should really be careful when trying to, use the generative AA tools as well.
and look at it from, look at, look and analyze these scenarios
from the security posture.
So, taking a look at the various software supply chain security types, supply
chain vulnerabilities include known CVEs, zero day attacks, even human
errors leading to malicious injections.
Understanding and mitigating these risks is vital for building security systems.
we'll talk more about this in the upcoming slides, but, how did all this happen?
Right.
So this is, I mean, by the, by this, by this point of time, it must have
been pretty evident that the code, a particular developer rights, contains
probably code artifact and, you know, some packaging materials and, you
know, some security aspects as well.
But the other stuff that, or the dependencies that were
pulled in during the build.
Definitely can have a lot of vulnerabilities.
Moving on to the next slide.
So what can we do better?
Right?
So to combat these issues, I sort of like, put in a seven step process
here where we educate ourselves.
We don't, we don't rely solely on public reports.
We manage our dependencies and permissions, keep up, continuously
maintain our code base.
Regularly scan our libraries and packages and optimize our CI and CE
deployment, continuous integration, continuous deployment pipelines to
regular scanning and optimize, but all these seven steps are equally critical
and these help in com and these seven steps are, I would say the first step in
the right direction because they help in combating common vulnerabilities, right?
So if you take a look at common vulnerabilities, they are cross
site scripting, SQL injection, Yeah.
CSRF.
so organization organizations must prioritize awareness and adopt
guidelines such as these, such as, such as provided by organizations from by o
OSP to medicate, medicate those risks.
But this also sets, the path, or like rewrites how developers
operate on a day to day basis.
so I mean, this whole, this diagram, under underlines the importance of declaring
dependencies as we write code or as we build code or as we run code, right?
so if you see the first and third column are just highlighting the fact
that we have to declare dependencies before writing and before writing
code and before building code.
And then only code is run and, you know, it's actually like deployed.
So, diving a bit deeper into software chain, software supply chain
threat types, CVEs are generally reported on vulnerabilities.
they can be intentional, intentional vulnerabilities or
unintentional vulnerabilities.
Intentional, unintentional vulnerabilities are security bugs.
Intentional vulnerabilities are a backdoor into the system.
But on the other hand, malicious payload CV, right?
So this distinction is critical as you know, we'll talk more about it
in the upcoming upcoming slides.
So how can, how can we shift left with developers?
So this screenshot, so in one of the previous organizations I
had worked, we extensively used artifactory and its ecosystem.
So we, artifactory comes with a tool called x ray.
So X Ray had its own, plug in that can be installed in IDEs.
So I'd actually installed the X Ray plug in in my, in my
PyCharm IDE or my CLion IDE.
And as I was, as I write code, the X Ray plug in, act, looks out for
vulnerabilities, in my code and then, you know, reports it into my, reports
it in my developer console that I can actually like fix and before actually
checking my code into the code repository.
Yeah, another big component, in ensuring go, you know, ensuring that,
we care about security governance and during unified deployments is
the software bill of materials.
So what is software bill of materials, right?
So let's try to understand what that is.
Software Bill of Materials is like a list of ingredients that makes up what's inside
of the software and different people look at the software, software bill of
materials with a different lens, right?
So, but before that, it's sort of, it's a list of ingredients
that make up what the software is.
It includes libraries, modules.
it can be open source or proprietary, free or paid, and it can be, it can be,
it can also contain data that's widely available or access restricted, right?
So it, it'll also, also contain, characteristics like, settings, versions,
environment variables, and so on.
So as I mentioned, different cons, there are multiple consumers
of software SBOMs, right?
So for those who produce software, SBOMs are used to assist in the building
and maintenance of their software.
Okay.
Right.
Including upstream components for those who purchase software, right?
SBOMs are used to inform, pre purchase assurance or like can help
in negotiating discounts and so on.
For those who operate software, it can inform them about their vulnerability
management and asset management, managing, license and, compliance and to
quickly identify software or component dependencies and supply chain risks.
The overall benefits of SBOM is.
It helps in identifying, mitigating known, and avoiding known vulnerabilities, right?
And quantifying and managing licenses.
It also helps in enabling quantification of those risks
inherent in a software package.
It, in the long run, it can also contribute to lower operating costs.
because it improved efficiencies and reduced unplanned and unscheduled work.
overall, it provides a compressive, comprehensive information about
the overall software package and environment and what setting, you
know, it's actually being used at.
But we'll talk about how SBOM play a critical role in the unified deployment
pipeline as well, or like how they fit into the unified deployment pipeline.
When we look at a.
Detail diagram.
Yeah.
So this was the diagram I was talking about.
so this is, this is one of the architectures we had implemented in
one of the previous places I'd worked.
So.
So what we decided that since multiple teams had different deployment strategies,
like I had previously mentioned, when we started talking about Unified Deployment
Models, there was a movement to move, to ensure that all teams deployed to, EKS or
like Kubernetes style deployments, right?
So we created this flow where a developer checks in A piece of code.
And of course, while he checks in a piece of code, he'll be having
an IDE that is fully furnished with, x-ray plugin that can detect
vulnerabilities in a very early stage.
And, it, the code is actually checked into the application code repository and
one, once code is actually checked into the application code repository, we have
the, CI tool Jenkins, that constantly.
You know, that's a web book listener to the GitHub, GitHub rep, to the
application code repository can be GitHub or Bitbucket it, and then it starts,
running the ca build for that branch.
Once the ca build completes, it deploys an artifact into the contain registry.
Here we use artifacty and there, depending on the environment, we would
like to deploy that artifact too.
it runs X, it runs X ray scans based on rules that we have
set for that environment.
Once the scans, finish an S bomb is generated, which contains all the
details that I previously mentioned that informs the developer and also the
team as to, whether, that particular artifact is deployable or not, or
what we could do with that artifact.
Assuming everything is great with that artifact, the, there is a
deployment orchestrator as well.
The deployment orchestrator, is you can, visualize it as the
choreographer of all deployments, right?
So the deployment orchestrator has a webhook listener as well.
And once the S forms generated, it decides to kickstart the act of deployment.
The deployment orchestrator then decides to go and talk to a
compliance, compliance advisor service.
The compliance advisor service over here is, is an open policy
agent that has a set of rules for any kind of deployment, right?
So these rules are done in a, in a different language called Lego and,
and every, any deployment across the org has to adhere to these rules.
So the, the deployment orchestrator, who's our choreographer of deployments,
goes and asks the compliance advisor, Hey, can I actually go deploy, deploy
this artifact to this environment, or is it, is it, is it following all the rules?
And once it gets a response to, you know, assuming we get, assuming we
get a response saying like you can deploy, it's, it begins to start.
you know, checking in the HEM release manifest or the Kubernetes,
appropriate Kubernetes manifest to the GitOps repository.
And once that manifest, you know, actually gets checked in, we have,
we typically use Argo CD or Flux CD to, to manage our deployments.
And then the Argo CD or Flux CD constantly listens to the GitOps
repository every 30 seconds or 60 seconds.
It then decides to apply that, you know, manifest onto the cluster
by which resources actually get deployed into AWS, Azure.
or any other environment that 13 cares about.
So this is the, this is the, this is a very zoomed out high level flow, but
as we actually dive deep into every component of interest in this diagram,
there are multiple layers of policies that map to security, governance, you know,
unification of, deployment and so on.
For example, if you take, if you zoom a bit into the artifactory,
the, the folder structure, is, neatly manipulated based on the environment,
the artifact should be deployed to.
Each folder structure, each folder, is, you know, tightly, set, tightly
follows RBAC policies that have been set.
And accessing artifactory itself, is, highly restrictive, and is, you know, is
managed by IAM policies and credentials that are only, that are also, that
have restrictive access and so on.
So, the point I'm trying to drive home is that, this is a very zoomed out
view and then, It, there are multiple layers to this diagram as we like.
Dive deep, dive, deep dive.
Moving on to the next slide.
let's talk a bit about governance and how this, how the governance
fits into how, how the governance fits into this entire puzzle.
Or rather, if you, if you visualize this as like Lego pieces, how all
the Lego pieces fit together, right?
So, so, I mean, yeah, so governance, is not just a buzzword, right?
So governance, helps, it's about ensuring that every deployment aligns with security
standards and has clear ownership.
This minimizes risks and streamlines operations.
On one side, unified deployments help ensure governance by incorporating
auditability, compliance, and accountability into the pipelines,
as we saw in the previous diagram.
The unified deployment framework allows us to detect issues early
and enforce standards effectively.
Governance is about minimizing risks and streamlining the operations.
Governance plays a pivotal role.
Unified Deployments align with organizational and
regulatory requirements.
Let's talk about the some key components of governance and also talk about some
real world examples of what how they are actually like implemented or enforced.
So when we talk about governance, we have to talk about auditability.
You want to talk about complaints with standards, clear ownership and
accountability, centralized logging and metrics, policy informants,
enforcements, and, continuous monitoring.
So when we talk about auditability, auditability is about the ability to
track who performed specific actions, what changes were made, who made those
changes and when, when they occurred.
The importance of auditability, auditability is.
It provides transparency in the deployment process, aids in troubleshooting issues
by tracing the errors to specific changes.
And it's, essential for, regulatory and internal audits.
So recollecting from memory, talking about a regular example.
Deployment logs are stored in an, in an deployment.
Logs are stored in an immutable data source.
Like for example, if you take like Amazon S3 with versioning and link
to specific and GitHub, right?
So these are auditable.
Now let's talk about like compliance and standards, right?
So.
The reason why compliance and standards matter is that regulatory frameworks
like GDPR, HIPAA, mandate adherence to specific security and privacy standards.
Non compliance to these standards can actually result in hefty
fines, reputational damage, and even legal repercussions.
So, it's very essential that we encrypt sensitive data, ensure that
there are retention policies for logs and, You know, other artifacts,
that follow on all artifacts aligned with the duty requirements.
And we, I mean, there are tools available to ensure this happens.
Like, for example, if you take, HashiCorp Sentinel or AWS config rules, they can
automate complaints validation, right?
a real world example would be, Kubernetes, Kubernetes, admission controllers
enforce SOC2 policies, ensuring containers are scanned and certified
before actual deployment takes place.
Moving on, let's talk, clear ownership and accountability.
I mean, this is very evident from the name itself.
It assigns very clear rules, to ensure unauthorized, changes, do not
happen and you know, and so overall accountability is ensured, right?
We do that through setting our back row role based access control rules.
Defining ownerships using IAM policies or Kubernetes namespaces, right?
So for example, one of the ways it is, one of the main, one of the ways it
is enforced is developers have access only to staging environments when
operations teams, operation teams handle production and production deployments.
And then we have three more, right?
Policy informa policy enforcements, centralized logging,
and continuous monitoring.
Policy enforcements, as you as shown in, like, one of the previous diagrams,
like, we had a separate tool called Compliance Advisor Service, right?
So, policy informants are typically done through open policy agents.
They enforce policies across Kubernetes clusters, or,
like, CICD pipelines and APIs.
We can also have, AWS config, which continuously assesses configurations
against given set of complaints rules.
and as previously mentioned, we can also have, Kubernetes admission controllers
that validate deployments during runtime.
With regard to centralized logging, some, teams have their own, single source of
truth to aggregate logs and metrics.
But the commonly most used tools are Datadog, AWS CloudWatch, or,
you know, a typical ELK stack.
Centralized logging, like, for example, if you take a Kubernetes setup,
logs from the Kubernetes pipelines are aggregated using the ELK stack.
Providing a single pane of glass for analysis or dashboarding or, you
know, generating metrics and so on.
But doing all, I mean, I also put it, put the final one at the top
layer, which is the continuous monitoring and feedback loops.
And if you look at the diagram, right, it's like a top down approach.
Top one is probably one of the most important ones, which is Governance
should not be like a one time activity.
it needs to, it should, it should be an ongoing process.
So we need to, we need to have feedback loops where we regularly review governance
policies and adapt them based on changing requirements and environments.
We have to incorporate feedback from security teams, developers, and
auditors to refine the overall process.
with regard to continuous monitoring, we can use tools like Prometheus,
which monitors resource utilization, while Datadog, alerts, and others.
On unusual deployment patterns moving on to the next.
So I hope that, you know, painted a picture on what we mean by governance
and how it really fits into the piece of unified by unified deployment
pipelines and the whole security angle.
So what is our way forward or like, what's the future direction that we want to
take regarding this is, looking ahead, we can enhance, DevSecOps pipelines by
integrating threat modeling, container security, continuous, feedback loops.
These make security an intrinsic part of the, intrinsic part of every
stage of the SDLC processes, right?
So we can plan and code, build and test, release and deploy, operate and
monitor, and feedback and improve.
At every stage, we care about security, governance, and
unifying all the deployments.
So that we have a single source of truth and we can generate metrics
and have an established feedback loops so that we iteratively improve
our overall deployment pipelines.
Overall key takeaway, is, security is everyone's responsibility.
automation integration are key to embedding security
in the CI and CD pipelines.
We have to design for failure or like consider, and adopt, you know, strategies
like multi AZ or multi region strategies.
We have, we need to have well architected reviews and also even like some sort of,
put some sort of kiosk into the mix and continuously improving our practices and
ensure robust and reliable deployments.
I did, I added a few references that helped me, that helped me, I
encourage you to explore resources like, the OWASP, Kubernetes logs,
and also JFrog conferences for deeper insights, in securing deployments
and improving overall governance.
thank you for, thank you so much for attending my talk.
Thank you.