Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi and welcome to my talk entitled Use Falco
and EBPF to protect your applications. First, who am
I? I'm Tomana Bowsias. I'm currently OSS and ecosystem
advocate at Sysdig, the original creator of Falco.
I was SrE for over eight years, so I know what it is to
run stuff in production. I'm also contributor to
Falco and the creator of Falcopsychic and Falco psychic ride to measure
components of the Falco ecosystem and you can
reach me on these social networks if you want.
First we need to define what is runtime security. What it means
runtime security are all the tools and procedures
you can put in place to secured an application in a
corner or not during its lifetime in production.
It's different of what we currently do in
our CI pipelines with an image scanning.
It's also different from what we can do with Kubernetes or gatekeepers
to create policies to enforce good practices
in our clusters. It's totally focused on what
happens when your application is disturbing real
customers, is using real traffic.
For that of alcove we rely on Ciscos.
Cisco's or system calls are basically the way you
program have to ask the kernel some
accesses to those resources. For example,
if your application needs to create a process access
to the network, read or write into
a file, your application needs to ask the kernel
the access and to ask these accesses
you use system calls. Basically you can see
the system calls as the kernel API.
If you are familiar enough with the Linux ecosystem,
you already know about Glipsy or muscle for IPM.
Basically Glipsy is the
library used by your applications to
call the system calls. You can see the step calls
as an API and glipsy as an SDK.
So for Falco, Falco is a CNCF
incubation level project. It's a cloud
native project in the CNCF landscape for
securing running applications. Right now it's the most advanced
threat detection engine you can run inside Kubernetes EBPF
EBPF for extended packet filter.
It's the Linux kernel feature which allows
you to run a program in the kernel without
any change of the code or without kernel or
any load of a module like we did before.
It enforces the stability and the security.
It's really useful for security, for monitoring,
troubleshooting. You also have to know right now the
core maintenance of Falco are developing a new
Falco EBPF probe. Basically the features
will be exactly the same as the current ones,
but it will also use the core reparation compile
ones run everywhere. Right now you would need to
build the EBPF
rod for the exact version of your kernel. In the future, since the
version five eight,
you will use the same role for any kernel.
You just have to download it or build it only once and it will
run everywhere. For the EBPF OE does
the collections of events. Basically in
EBPF world you have hooks. Hooks are endpoints.
For example, you can hook
basically your probe and collect events. These events can
be syscolls. They can be related to file system, they can be related
to network, almost anything. If a hook is not
already there by default, you can create your own.
It's really convenient and really to ensure
the stability and the security. All the code
you write for your EBPF probe will be verified
by the Linux kernel. So you code your
probe with everything to the
hook you want to use, the data enrichment,
everything. It will be checked by the
kernel. If the code is approved, it will be
compiled into bytecode and injected to the kernel
and it will be run inside the sandbox.
The verification is there to ensure you
don't have any security flows. You don't create infinite loops,
you don't create overhead and
bad performances in your system. Everything is
there by default by design to ensure stability and
security. For Falco itself, the architecture
is there. You have the kernel and the EBPF
probes is to collect the SyS course from the kernel.
And then Falco, thanks to rule set,
will trigger lets. If one event from the kernel
from the SYs course matches with a rule,
Falco will output an alert.
This alert can be in standard, but a file program syslog
HTTP sent to an HTTP endpoint or GrPC.
If we take a deeper look at the FICO architecture.
FICO is composed of three key elements.
Lipscap two libraries Lipscap Elysiums hello regime so
regime is basic and Lipscap is in charge
to the inverter collections, elysium to the data enrichment and the extractions
of field. You can see we have the ab prep code
in the kernel space and Falco itself in user space.
It's really important for us to as FICO is
a security component to be as secure as
possible. This is why FICO itself is running at the user
space, so with less privileges.
But the EBPF
probe is running in candle space, but thanks to EBPF is
secured and stable by default. So we
have the first library, Lipscap, aka library
for system captures. Lipscap is in a user space
library. It communicates with the drivers. Basically it reads the syscol's
events from a rig buffer if exposed
by the driver and then these events are forwarded
to listimp. Listinf aka library for system inspections
is in charge to receive the events from cap and to
enrich these events with machine state. Basically, if your
application is running inside a containers, this containers
is part of a pod. In the Kubernetes cluster you
will have for your rules and for the lets,
the containers id containers name, the pod name, the pod
namespace, the pod levels. All these elements
will be there to create nice rules
and to be able to know what is
the context of the audit. It will also perform some
event filtering and extract fields from busy events.
These fields are then used by the rule. So if
we take a look at our first rule, for example, this one terminal
shell in a constant, we have the name of the rule,
the description for us human beings. It will
not be used by any system and it will not be the
final output we have, the condition we'll
see later, and an output. The output is the exact message we
will get. At the end you can see some
fields starting by percent person.
This field will be automatically replaced by Falco in the output.
It means at the end, in the alert, you will get a real username and
not this token.
Each rule comes with a priority. In this case,
running these priorities are useful for you to filter
which rule you want to receive. And we
also have tags. The tags are useful to understand the
context of the rule, what is supposed to detect,
and you can also set Falco to
just enable a subset of rules. For example, you can enable
only the rules which concern the
contract or network or else.
So for the rules you can use lets
and macros. Lets are pretty obvious.
It's just an array of AI
items. In that situation is a list of possible files
you can use in your system. Remember, Falco rules
are yaml files, basically so you can
override anything. And you can also append items
or append rules or macros. It's really convenient and
it will allow you to reuse
macros over your rules and not copy past or duplicate
codes. We also have this macro,
shellproc, and you see macro name. Macro name is
a built in field from Falco you
can use in your roles. Even if you are not really familiar with Falco,
if you're not familiar with Linux, Cisco sort of stuff,
it's quite easy to understand that plug name means the
name of the process. You also have proc id for
the id of the process. Or plug pid for the
id of the parent of the process. It's really convenient and easy
to read even if you are not a specialist. We also
have this macro containers if containers id already a
built in field is different from host, just means
if we have something different from a hash,
it means the applications or the events happened
inside the governor pitch abuse and we have spawn process
with a tip typo and we also have event type of
use and easy v art are real
system calls. You can see these exact names inside
the kernel code base if you want and we have event deer.
It's just to specify if we want a question to the kernel or response
from the kernel. Even if the rules are
convenient and easy to read, we know it
would be complicated to create new rules.
This is why Falco comes with default rule set.
Right now he has almost 70 step rules
and they cover most of the techniques
practices used by the attackers to do
privilege escalation, to read or write sensitive files or directory
to spawn a shell,
exfiltrate data, start ransomware,
that kind of patterns. For example,
right now we have all these rules 79
so we can see some of them are disabled by default.
It's just because they can be noisy if you don't happen
the exception list with your own context.
So we prefer to disable them, but they are there and
you can use them. We also have tags,
so if we take a look at the full switches,
the condition is a little bit different because my slide is quite old
now, but basically the idea is the same. We have macros,
spawn process macros is there governor, governor,
shellprocess, et cetera, et cetera. And the output with the token
to replace everything is there. You Falco have
tags and if you are familiar with the meter framework we are trying to
cover as much techniques as possible and
you can find which rules is related to which technique with
the tags, meter, underscore and t number
after having
lets is nice,
but we need to use them, we need to exploit these alerts.
Here comes Falco psychic basically
forwards the alerts from your Falco instances
to your ecosystem so you
can forward the lets
to a chat system, logs system like elasticsearch
loki or a queue system or streaming like kafka
nats pub sub. You can also forward
lets to a function as a service serverless
pycopsychic also exposes Prometheus endpoint.
It's useful if you want to create and do some statistics about the number of
alias and so and for the SRE or
devsecops or health of setups.
You can also trigger your own call system with Falco
right now with Falco psychic right now we have pager duty,
opsigenny and Grafana on call and you
can also do call storage in s three or s.
Basically we have one Falco instance per node
because it relies on the kernel and the kernels are
not distributed. So we have one Falco
instance per node. They can forward all their events
to single
deployment of Falco. You can pull Falco to
get metrics and you can send all the events to elasticsearch
for data analysis for long term storage, but only
alerts with priority above critical to your on call system.
You can also add static speeds or else really
convenient. So with Falco we have the detection.
With Falco Psychic we have the notification.
If you forward this event to
serverless or to a function as a service system, you can react
as long as you are able to write your own
reaction. You can do whatever you need with
lambda, openfast, knative, argo,
workflow, Google function,
everything. For example, you can terminate a port.
You can create a network policy to isolate a port.
You can also scale in or scale out an autoscaling
group, whatever you need, as long you are able to write your
own function.
Falco psychic comes with a specific output called Falco
psychic Ui. And basically it's
a basic interface with statistics, with pie charts.
And so to have in few minutes
an overview of what
has been detected by Farco in your environment, it's pretty convenient.
It's not used for long
term storage or else, but at least you have a quick overview.
It's pretty convenient to use.
At the beginning Falco was only for
system calls. Then we introduced a web server to collect the
Kubernetes audit codes, but it came with
a lot of drawbacks. So in the last year we also
introduced a plugin framework. Right now we
are able to collect cisco thanks to EBPF.
But Falco is also able to collect
any kind of events you may have. So by
events we often think about logs for example.
So plugins are shared
libraries used by Falco to collect insight from three more events.
Right now we have plugins to collect Amazon,
EKS, ODi cloud, to collect GitHub,
webhooks, docker events, and even nomad
events. We developed these plugins with Ashico.
So with EBPF you
collect the Cisco. So with EPPF and Falco you protect your
applications. With the plugins you can for example the
Kubernetes or deploy plugins. You are able Falco to protect
your kubernetes clusters. With the Amazon cloud trial
you are able to protect and detect suspicious behaviors
at your account level. And with the GitHub plugin you are
able to detect strange
situations in your CI or in your pipelines
or in your repositories. It means right now with Falco
you can protect all stages from the
development to the production.
So the situation now with Falco is
we have the EBPF probes for these discord collections, we have
the plugins for the events collections,
Falco and its rule engine, and to manage
the plugins and the lifecycles of the plugins and of
the rules we introduced a few months ago,
a tool called Falco CTl, Falco Kotle.
Basically it will install plugins and rules and it
will also track new versions of the rules to automatically
download them and reload Falco. So your cluster, your Falco
fleet will always be up to date.
So another few of the architectures basically
same idea, that behind it. And once again the plugins are
running in user space, so without any
big privileges, once again for security purpose,
time for a demo.
So in this demo cluster I have
two nodes and like I said, falco relies on
kernel. So two nodes means two Falco
pods.
Basically they are deployed as a demon set to have one
Falco planet. It's quite obvious.
I also install Falco psychic, Falco psychic drive the
front end and Falco psychic drive the storage
backend is a radius and another deployment
of Falco with the Eks plugin.
So imagine you
have this pod is your critical
application. It can be WordPress,
Drupal, anything you can run and
exposed to Internet. So an attacker gain
access to this docker, to this customer.
As you can see, when I created
my shell, it has been detected immediately. So we have the priority,
we have the exact output message with the
user root, the namespace default,
the pod name, even the containers id and
what shell has been used and what command line has been used
to start the shell. All these elements are there also as
output fields. They are used by Falco, Falco psychic
for routine. So now I will
add curl can
see it's automatically detected in
real time once again thanks to EBPF. So right now it's
an error with packet management process launched
the containers and once again the user the exact command
that has been run and
the containers name is there, the images,
everything. So we'll try to reach
the Kubernetes API. Now thankfully
in that situation the API is protected,
but at least we have detected it.
Unexpected connection to Kubernetes API server from a containers
we have the exact command once
again the namespace as a pond name.
Imagine overriding a critical
file.
Five below OTC has been opened for writing and
we have once again all elements got
the name, image, the pod,
et cetera. And if we take a look,
Sci-Fi cosychic.
So we have all
things that happened in the last five minutes,
15 minutes. We have the
pie charts, statistics, but the policies, the tags,
the source. We can filter on the source,
we see what I did and if we want more
details there.
Right now we also have terminal shell in
a containers. It's exactly what we saw in the logs, but in a more
formatted and nicer way. With the tags we
can filter for example other namespace.
Then we have the installation of girl, the attempt to reach
Kubernetes API and the override
of the file. Everything is there. And we also have text to the
communities audit log.
We have the details about someone attached
to attack or executing
something into a pod. We have all these details.
In a real world it will be a web shell
or else. But in my example, like I did an exec,
we can detect it. And once again we have the
target name, the pod name, the data, the namespace. And so
if you want to start with Falco, the easiest way
to install and start with Falco is to use the official
m chart. By setting these
values you will install Falco Falco
psychic Falco psychic UI and use EBPF
prop in namespace called Falco.
In less than two minutes everything will be set on
and running and you will be able to access to
the web UI with a powerwall.
If you want to contribute or know more about Falco,
you can join us in our Falco Slack
channel. You can take a look at our new website.
A total revamp has been made in the last month, so we hope
it's better for everybody and we also on
GitHub. Thank you and
have a good day.