Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi there, my name is Ismail. I am a cloud native developer
at Wescale. In today's talk, we are going to present you the
notion of compliance in the cloud and how to think it
efficiently. So first of all,
what is compliance? When we think about cloud,
we think about a cool place where we can have
id and implement it into a business in a matter of hours
or even minutes, because the cloud will provide us with hardware
and managed services so that we can focus
on the business side of your code.
But it could also be a very dangerous place
where is located my data? Is it encrypted? Does the
workload, does actually what it says?
And all those questions must be answered
from a client point of view because we need
some guarantees that we are not in a malicious
situation, that we are in a safe place,
and this guarantee is represented by compliance from
the business provider. Compliance is all about
managing the risk. It's a set of rules to abide by
in order to provide that we took the necessary steps
to protect our client in
the consumption of our service. It is usually legally
driven. We all know RGPD, but also PCI,
DSS for instance. It could also be
internally driven with human resources or environment policies.
As a set of rules, compliance can be seen as an obstacle
to your business or innovation process.
But you should really create a win win situation where you
embrace compliance as being part of your business,
where you think it continuously and in an
automatic manner, so that any
audit is a non event and you can get rid
of of countless errors of meeting and in the end
have kind of compliance governance.
Of course, compliance is nothing new for cloud providers
aws. They already provide you with a shared responsibility
model that tells you that the hardware and managed services
provided are compliance regarding RGPD, for instance,
but you are still in charge to implement yourself,
your part of compliance on the cloud platform,
on each data or workload that you deploy on
this platform. So the question really becomes
how to implement this compliance into
the platform. And in this schema, what is interesting
is that we want to really
bake the compliance into the cloud platform
so that each time we deploy a data
or workload, it is by design,
compliance. And by exploding the notion
of compliance into policies as code,
we begin to see that we can
apply already proven solutions such as
DevOps that will enable us to apply the CI CD
process to our policies and
in the end our compliance. So one question
that we can ask is how does code
that we store in these repositories translate
in terms of cloud vocabulary?
With Tongi, we see three layers model.
First one is identity and access management.
That is all about trust. You give trust to
identities so that they can act on the cloud platform.
It is cloud platform provided and we
obviously need to control that trust. And this
is where the policies intervene. We distinguish two kind
of them. First one, passive platform policies is
also cloud platform provided and is about configuration that
you can set on the different services provided
by your platform. For instance, you can forbid the creation
of an object bucket inside that region.
The problem is it's really loosely coupled to
the platform and we may
lack of expressivity when we
think in business term. That is to say the
cloud platform won't be able to follow you in
all of your needs. And this is where reactive platform policy
shines, because the platforms produce
events when something happened, where in
the capacity to consume those events in order to
trigger action that will implement
one of our policy and in the end enable the
compliance. And we can take simple example
on the GCP, for instance, we may want to
stop each evening our SQL instances and
start them up again each morning. Usually we
will see the following implementation. We call that
orchestration choreography. We will choreography
different services with one another in order to get
a certain policy in place.
And in the case of GCP, we could very well
implement two entries inside a scheduler.
One of them would be in charge to call a top
SQL instance cloud function and the second
one would be in charge to call another cloud function that
would start the SQL instance each
morning. Of course we would use infrastructure
as code through terraform in order to easily reproduce
this architecture inside another project.
So far so good, but we see three
main drawbacks. The first one is that
we have to think about this architecture and
it's very subjective. In fact, some of you may have
another approach with the same result. Second drawback
is about the code that we put inside those
function. It's up to you to code these functions
and you could very well set bug or
bugs. And last but not least, we have
a risk of duplication between different teams
because of a lack of communication with Tangi.
We discovered a nice tool that could
provide you with answers to the different problem we
previously mentioned. First of all, it enables you
to switch from an imperative pradeem into declarative
one. Clocksodian will provide you with a domain specific language
that will enable the describing of policies through
a YaML file and with your own business term.
This is really interesting because from
there you don't have to do anything but to write
a YAMl file and deployed infrastructure
will be taken in charge by cloud custodian
AWS, long as the code that also will be deployed inside
those function. So we tackle the
problem of the architecture, we don't have to think about it.
We tackle the problem of the code inside the cloud function,
we don't have to think about it too. And we
tackle the problem of duplication because now
we are able to expose in few files
or policies with business terms
and those files are easily shareable amongst
all your team so that they can benefit from the same
treatment and the same approach of your
policies. So Tongi, can you present
yourself and tell us more about what
cloud custodian is? So cloud custodian
is an open source initiative launched by Capital one.
It mainly consists of a Python library based on
YaML files. In entry, each YamL file will describe a
list of policies that will help you to set up and ensure
two types of compliance mentioned by Ismail just before
reactive compliance and passive scanning compliance.
In few words, cloud Kosian can be interfaced with the three main public
cloud providers I mentioned here,
AWS, Azure and Google Cloud platform.
Note that Google Cloud platform is now in
Alpha stage for now, but soon to be fully released.
Since it's written in Python, it can run everywhere.
The project is currently in the CNCF
sandbox initiative and have a
release frequency of a month.
So cloud custodian under the hood a
cloud custodian policy can be described as shown here.
We can see the type of resource that
will be targeted by the policy. Here we're talking
about Microsoft Azure disks filter.
Also we can
filter to select only the resource that matters for us.
Here we want to select all the Microsoft Azure azure disks
that are not attached to any other resource and
also actions to perform on
the found resources. Here we want to delete
them. We can do a quick parallel
with SQL language with
the select from where syntax like
the select is the action we want to perform from
is the resource we want to target and where
all our filtering conditions.
Now let's talk a bit about numbers.
Here is a summary of all the resources,
actions and filter available for each cloud custodian service.
The project was launched for AWS in the first place,
so that's why there is a lot more settings for this cloud
provider, but it kind of represents the market shares
between cloud providers. The big point here
is to show you that the coverage of
all the possibilities is already consequenced, so most of your compliance
cases can be fulfilled with cloud custodian.
Let's talk a bit about execution modes.
Cloud custodian is defined to be completely agnostic
of its run place as long as you have a patch and
virtual environment. The most interesting things
here is that the projects can also
rely on cloud providers several services to set up more complex
workflows. It's really easy to make it all together with
cloud custodian just by specifying a mode
with several arguments in your policy. Cloud custodian will automatically
deploy on the cloud providers multiple
resource to cover the need described in the policy.
Depending on what is specified,
the deploy action will provision triggers and serverless application
directly in the cloud. For AWS triggers,
you can see we can use cloud watch events
coming from cloud trail or
schedule events to
trigger lambda AWS lambda functions.
For Google Cloud platform, cloud custodian will
use cloud security command center logging
or a cloud scheduler to trigger Google
Cloud functions. Eventually on Azure, the same
workflow is applied using Evan grid, a cloud scheduler and
also azure cloud functions.
So move on to filters.
Let's talk a bit about filter types.
There is about three filter types
you can filter on specific value filter
that there are filter that will make an API call an API
call to the cloud platform to check a specific
setting for an identified resource.
You can also specify naven filter
which will only check incoming events and do the verification
with the value provided in the policy. And there is also specific
filters to do more
complex filtering.
Let's see a little example that use those
three types of filters.
So here we have a
little policy. This policy will check
and enforce that log file validation is enables on
a cloud trail trail at least what we want is to have
notification when
the cloud trail trail is updated to disable log
five validation. So you can see here the resource targeted is AWS
cloud trail. The mod is a cloud trail mold so
we will react on an event and the event
we are watching is update trail.
There we can see unlike
some filters, the first filter here is a filter
on a tag. So it
is a value filter.
Here we want to know if the
tag trail to watch is to true.
After that there is little specific
filter. This filter is a specific filter on the trail just to see
if the trail is logging or not. Then there
is two value filters. The first filter is to see we
want to know if the trail is military regional and the second filter
is to assure that this
cloud trail trail will log all actions on a specific
s three packet name.
The last filter is an event filter. So as you can see
we use gms pass syntax
to reach in the API event in
the cloud trail event, a specific setting
and to check its value. Okay,
so now let's do a more complex
example. Let's extend this a bit.
In some cases you can't apply remediation
on a resource at creation. That's because the resource takes some
times to be available and cannot
be modified until the resource is available.
What we can do here is create a mark for up
workflow for delayed actions. First we
need a policy to detect the creation of a resource
and check that the resource is wrongly configured.
The policy will apply a mark for tag
on the resource with an operation with
an operation to perform, and also a minimum time period
to apply the execution. Then we wait for the resource
to be available using a periodic policy which contains a filter
on the resource states. Once the resource is available, the periodic
policy will apply a remediation. To finish
the workflow we have to delete the
remediated resource. And then finally
you can take a coffee because any resources that it wrongly
configured will be remediated.
Let's see a little example of this workflow.
So here I made a little use case for you.
The need we are trying to answer here that we
want that all our RDS DB
clusters must have continuous
backup backup retention period enabled to its maximum time,
which is 35 days. So here
you have the first policy. The first policy is
a cloud trail policy reacting on two separated
events, a create DB cluster and modified DB cluster.
Because we want also to apply the remediation when
someone updates this setting, it has only
one filter which is a value filter on
a backup retention period with an operation and the value.
The thing is that it say that we
select only DB cluster that have
backup retention period less
than certified date. Here we
have two actions. The first action is a macro up action.
So we put a tag named backup retention compliance on
the LDS cluster and the retention
operation to change. And as I
said,
we have to set up a limit of time until
the remediation is applicable. So we want
to do it as soon as it can be applied.
And also a second action which is
notify. We want to also notify that Closkojan
has found a non compliant
RDS cluster. This will help you to treat
afterwards like CI CD deployments that
may have wrongly configured resources.
Here the second policy,
this policy is not triggered by cloudtrail.
It's a periodic policy triggered every
two minutes. The main goal of this policies to
find marked for up DB clusters
with the tag we saw before and
DB clusters that are available.
So we need here three different filters,
specific filter which is a mark four filter,
a value filter for backup retention period and
also a value filter on the status of the
DB cluster. Here the only action to do is to
enforce backup retention period to 35 days
and to finish this workflow. As I said before,
we have another periodic policy that
will just filter on resources
that are now compliance but wasn't before.
Since they weren't compliant before, they have a tag
and the only action of this policy is to remove the tag to prevent
unwanted actions on compliance
resource. So another big objective
of cloud compliance and auditability is to bring Githubs
in the game. With this library you can also have
a compliance driven by your favorite cvs.
The main goal here is to reassure auditors that conformity
is applied since your code repository is the truth and
also the true state of your platform.
There is also different businesses and operational needs that
you can answer. With cloud custodian you can achieve
finops fulfillment by starting and stopping
development instance at night.
Then you can also use this library to detect
malicious actions made from within the cloud platform
and send an alert. As you can see,
there is multiple integration available using slack,
splunk or Datadog.
The main goal here for compliance is
to help you leverage on cloud custodian to
bake your own rule of compliance into your
cloud platform. Cloud custodian is open source by nature.
If you identify a specific need, it's up to you to develop
a new feature and give it back to the community.
The first thing you have to do is to
fork the GitHub repository, then develop your
feature, make and pass the sets and
open a pull request. A little
story here with this mail we figured out that there
were no start action on Google Cloud platform SQL
instances. Let me show you how we managed to
develop this feature and to add it to cloud custodian.
So the library is written in Python,
so it's really easy for you if you develop
a bit using this language to understand how
to use this library and how to add some features.
First we made a good use of cloud custodian prepared classes,
functions and registries to add the
new action here.
Then we also developed the related test.
We can find it here.
The test is really easy to understand. We have to create
a policy. This policy is the chisen
tradition of what we can write using Yaml.
Then we run the policy and we assert that
the number of resource identified is to one.
All information for development are
really well explained in the developer manual.
Even the stubborn test stubbers are tests that
you can see here. Here are the records of
the stubborn they are API
calls. They are the response of API calls made to the Google Cloud
platform. So the first thing you have to do
is to use a function named
record flight data that will output
the API call results. And then in
your test you use replay flight data to
only replay the response of the API call to mock
the actions made on the platform by your policy.
These turbo are really useful
because you have to apply them once and then
the API call results recorded are run
on each test suite.
Now I leave you in good hands with Ismail
who will show you a little demo, live demo of what
we can do with cloud custodian on the GCP. Yes.
Thank you Tangi for your cloud custodian
presentation. So right now we are going to show
you how to use it with a concrete case which is
on the Google Cloud platform to forbid the creation of compute
instance with public ip. To illustrate the case,
we would use cloud shell, which is a dev environment as
a service provisioned with a certain amount of tool. And we
have also installed cloud custodian in it
to deploy our policy. Before deep diving inside this
policy, I want to show you that we don't have any
cloud functions deployed. So in
this kind, yes, zero items nor any cloud
scheduler jobs. So why do I want to do that?
To show you that indeed it
is cloud custodian that would be in charge to think the
architecture and deploy the code on the cloud platform.
So before doing
the cloud custodian part, we have to mention that we
have to use a certain amount of API on the cloud platform.
So we used terraform in order to activate
some APIs and also create identity
so that cloud custodian could use it, and so
that we can on our side apply the least
privilege principles with workload
that cloud custodian will act with.
So our policy
called forbid public ip on compute instances is
made of four different policies.
Before going into the escalation,
I want to deploy it because
it takes a certain amount of time to provision the resource.
So let's do that. We already have
installed a virtual environment with custodian, cloud custodian
so forbid public ip. Okay,
and okay, it's running. What happens
under the hood is that cloud custodian will
provide, will implement different
resources and we think about cloud
functions, but also job scheduler and we'll see
why. So what the
chaining of our policy is doing is the following. First policy
is about to listen
for audit log on the event
of insertion of compute instance. Basically it says for
every creation of a compute instance do this action,
we don't have any filtering because we want to
apply the action of setting labels on
all compute instance. Those labels are state
the first and next policy, check public ip. The second
policy this time will also act on
GCP instance, but is of type GCP periodic,
meaning that every minute it will apply this
specific filter, looking for instance
with a label, next policy and the value check
public IP but also exposing
a public IP. This time the action would be a mark
for which is a syntactic sugar. In order to
apply an already formed label that
we can use in the next policy, we also notify
through pubsub dedicated email address,
but it could be really what you want. A third policy
will still act on GCP instance is
still periodic, so we have a scheduler that
would trigger every minute a workload and
we would filter an instance marked for
up with stop. So we notice that we also
have stop here. What is happening is that we
are chaining those two policies with one another,
and this time we effectively apply the action stop.
A final policy is here to create
a specific case where my also
GCP instance is called unstoppable,
following five digits. And in this case
we consider that we want to start the
instance again and remove the label that were chaining
the policies in order to avoid to
fall into an infinite loop. This time we are not periodic,
we are of type GCP audit, meaning that we
are listening for dedicated stop events
in order to apply the filters and
finally the actions. So let's
see it in action. In the above
window we have complete instances list command
that would display the different instance
and state that would be creating.
So this time we want
to show you that we have
the function that are provisioned.
So functions list okay,
and we have four different methods,
each of which with a different trigger. We have HTTP trigger,
but also event trigger. HTTP trigger
are for the periodic policy,
whereas the event trigger are
for the policies who are listening to the
audit log. So for the GCP audit
type also we have two different jobs,
jobs lists corresponding to the HTTP trigger
that we just previously mentioned.
So we have the custodian auto check public IP,
which is corresponding to the policy that would
check each instance if it has public
IP, and stop instances with
public IP that correspond to the third policies,
the one with marked for up that would be triggered
every minute. So let's create
an instance called Toto,
okay? So it will provision on my cloud platform
compute instance, exposing by default a
public IP. So if we follow the workflow
that we previously described,
we have this indeed new
label applied on this instance.
Next policy, check public IP so that it can be filtered
by the next policy. Here we go.
That would be applying the Mac for up.
So we have this specific label related to cloud custodian
or source policy stop. And it
would be serving the third policy,
the filtering of the third policy to apply the stop
operation. So if we run it
manually, we can see,
okay, this one, okay,
it would be triggering new methods that
would be looking for instances with
the marked for up. And we see that Toto is
now stopping indeed, because it was marked for
up with the operation stop.
And now if we create nonstoppable
instance, okay,
followed by the digits, this time it will
follow the same workflow, it will start, it will be
branded with the labels and then it will be stopped because
exposing a public IP, but because it is called
unstoppable 12345,
it will be restarted. And we will
also see disappearing the different levels that
are necessary to apply the stopping
policy. So to avoid to fall into an
infinite loop, we would be removing
the level. So we are
going to manually run our
jobs. So seems that it was check,
okay, it was already triggered and
we have the mark four up for the stop operation that is appearing.
And this time when we stop the
instance,
this instance will be filtered and stopped.
Okay, we see that. But because we
have this name, the final policies will be
called and it will start the instance again.
So it takes a certain amount of time, but not so much.
Let's see it in action. So right now it's stopping,
it's terminated. And because the force
policy is listening for the stop event and filtering
on this specific kind of name, it is now restarting
again our instance,
which is now running. And because we don't
want to fall into an infinite loop,
we should also see that the label would
be removed in the end. So it may
take a certain amount of time, but in the end it would
finish the job. In the meantime,
I want to show you dedicated policies
for a specific use case of cloud custodian.
When we want to batch operation on the cloud platform,
we usually go through scripts,
but cloud custodian through the filtering and the action is
able to really to batch your action. And for instance,
here I have a policy to filter on GCP
instance labeled Devfest in order to delete them.
And right now it will services us in order
to clean our cloud platform
from those specific instances.
Before applying this policy, we can see that extra
labels that we were mentioning were indeed removed from the unstoppable
instance. So we avoid the infinite loop. And now
let's apply this dedicated policy to remove
those instances labeled with state equal
defest. Both instances are in
this case. So if I run this
policy we can see that the filter is indeed counting
two instances and it is stopping
the instances and then removing it.
That would conclude our demo and
I leave the lead to Tangi now. Tangi, up to
you. Thank you very much Ismail. I hope this demo
had shown you all what is possible using cloud custodian with unified
language cross cloud providers I
will now present a bit where we are from Wescale
who are we? Born in 2015,
Wescale is a company that have built a community of 50 experts
who helps you to become cloud native. We advise and
help our clients to think, build and master their own cloud
native architecture, always in correlation
with their material availability in the cloud. We are currently
CNCF service providers and also Azure
Corp, AWS and GCP partners.
We actively hiring in France in and
remotely. Wescale has also a training program
for cloud enthusiasts. All training journeys about GCP,
AWS, Kubernetes and Nashikob
technologies like vault and terraform will help you to
master cloud technologies and DevOps methodology.
Thanks a lot for your attention. If you have any questions, feel free
to contact us. We will be more than happy to answer.