Abstract
“Did you know that, every day across the Internet, each IP address is scanned hundreds of times? Or that more than 2,000 attacks are perpetrated, stealing 1.4 million personal records? That’s right, every single day! Today, there is a way to rebalance the odds and protect our resources through crowdsourced security and reputation.
In 2020, our ways of living and working turned completely upside down in a matter of days. We all brought our companies home and our homes in our companies’ systems. Staying connected to our colleagues, friends and family became a critical necessity, which opened the door for hackers to cause disruption and we saw a huge increase of attacks all around the world.
Even though worldwide spending on cybersecurity is predicted to reach $1 trillion in 2021 according to Forbes, the game will still be asymmetrical and all companies will keep being hacked regardless of their security budgets. Expensive security doesn’t mean better security. A new approach is needed and this is why we created CrowdSec.
Join us for this talk so we can explore why a collaborative approach to security could contribute to solving the problem and how we could make the Internet safer together.”
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everybody, glad to be here with you today. I will be speaking
about CrowdSec. CrowdSec is an open source security engine that can
help detect and respond to attacks in real time. The project then aims at
building a global crowdsourced reputation system around it.
Why did we created CrowdSec? Because we do believe that a few elements
are in the favor of the attacker. The time time is in favor of
the attacker. The delay between the publication of a vulnerability and
a weaponized exploit is often way shorter than the delay
between publication of said vulnerability and the application of patch
in all the systems. Then unfiltered access. As we have
seen in higher high hackers recently, a lot
of compromise are done through access and applications
that are not filtered, which makes firewall useless in a lot
of situation, or at least a firewall as we know it,
then the perimeter with the cause of public cloud
and various architectures such as internets, et cetera. It's a
lot harder to have one central point of control for your architecture and to
filter out the malevolent traffic. And last but not least,
money. While hackers are using their own time, stolen resources and free
or stolen software. In order to do this, defenders, they need to
have teams, licenses and maintain systems. And last but not
least, when you are attacking you need to be right only once, while when
you are defending you need to be right every time.
So we do believe that the castle strategy is like a Walkman needed
CTO from the thats every asset on the information system
needs to be able to defend itself on its own. And we
do take the bet that HTTP is the most common language
spoken by both the most vulnerable Unix as well as
the latest smartwatch. And this is why we created protect
and we aim at creating the ways of firewalls. So we combine local
behavior that is created and available in the open source
software with a global reputation system that we can redistribute
and share with the community by aggregating signals that are sent by all
the users all around the world. So it's a software built by
secops for DevOps. How do we aim at achieving
this? So crowdsec itself, the open source software,
can be seen a bit like as a fail to ban. It's something that is
going to read logs in order to detect attack patterns and then
react to those. When we speak about reading logs here, it can
be things as simple as a log file on a web server, but it can
be more or less any stream of information that we consider being logs. It can
be your AWS, cloud trail or anything then those logs
need to be normalized and reached before being matched to scenarios.
And this is where the community aspect starts to kick in is that
besides being under permissive open source license,
CrowdSec aims at building a community altogether. So we have
a hub where people can find scenarios and parsers that are
fitting their needs either because they need to cover a given technical stack
or because they need to address a specific business need.
Once those logs have being normalized and a pattern for example thats been
detected in a scenario user wants to react to the attacks that
it just detected and we do believe thats first
of all you don't react at the same place where you detect most of the
time. And second of all how you react to a given attack will vary a
lot depending on your environment, either technical or
business. For example, someone doing ecommerce is not going to react the same
way as an attack as someone that is managing mail servers.
So with the approach of the bouncers which are software components thats
can react to a given attack, the user can choose. Sometimes you want to ban
an IP, sometimes you will want to simply present a captcha to a user
to ensure that he's not a robot. And in other more
corporate environment you might want to reinforce the
security of the target rather than trying to can the offending IP that might
be part of a botnet. And so your action might be enforce multifactor
authentication on the user that is being targeted.
And last but not least, and this is the main point of
the project, is that you share your own sightings. Don't be afraid,
logs are not shared. The only information that is shared is that whenever you are
blocking an IP you are going to share with us. I blocked this IP at
this time because it triggered this specific scenario
and this is the data that is going to be then crunched and
redistributed to all the users once it
has been curated. So that if you are using for example WordPress specific
scenarios, you are going to be fed in real time with the ips that we
have seen attacks other WordPress and have been reported attacking other WordPress
in a reliable way. How does the software architecture
itself work? So as fate to band, we aim at doing something with a very
low technical barrier in terms of installation. However every
component are all staying together through an API which enables
you to have more distributed architectures as we are already seeing users
using it. So the crowdsec agent itself here
is in charge of passing log and reaching them and matching them against scenario.
And the local API here is in charge of taking decision based
on alerts it's received and giving back those decisions to
the bouncers and staying them with a central API. Bouncers can
be at Vius level of your application stack because we aim at being able
to speak to a lot of values personas. So for example you can have
a Crowdsec stack bouncer when you are going to inject Captcha to
the users that has been cooked doing batting. While you might want as
well to filter directly at the firewall level if you are protecting
larger infrastructures and you are as well pushing the
signals. So this metadata I was just staying about to the
central APIs that is going in exchange to share back with
you the signals and the reputation feeds.
The behavior engine itself aims at being able to detect various
scenarios. You can detect things as of you such as brute
force, et cetera, but the engine is powerful enough to allow
you to detect more advanced attacks such as distributed
in your service web vulnerability scans,
specific targeted exploitation or even more business focused
aspects such as credit card or credential stuffing.
The software itself is true open source software under MIT license
as free as it can be. And we are aiming at building here a true
community and not simply pushing open resources software to
users. So the technical barrier is as low as possible and
we might created contribution around us. We already succeed and
manage into having external contribution in things such as scenarios,
parser or even bouncers. A short demo often being worth
1000 slides. Let's directly jump into a practical example of
using and deploying crowdsec. So here what I'm going to do is
that I'm going to deploy CrowdSec on a very typical setup. On the top it's
a Linux machine with Nginx, MySQL, SSH and
so on. So simply installing crowdsec from the repositories as you
are going to see the setup is fairly automated so that the
technical barrier for the user is as low as possible. The setup
process through the wizard is going to identify the services that are
deployed on my machine, Nginx, SSH, MySQL and debian
distribution. And for each of those it's going to spot the logs
and install what we call collection which is a current and table of
configuration to help you CTO protect this attacks. I can immediately
out of the box take a look at the logs and we are going to
simulate an attack on the web server using a good old Nikto
which is a web application scanner which might not
be very modern but has a very typical behavior of a web
application vulnerability scanner. Here I can see that the tool is
being detected for coming with a non bizuzargent, trying to access a lot
of files that don't exist, or trying to crawl non static resources,
or even access sensitive files. So here
through CSLI, which is the main tool for system administrator to
Internet through Cross Sec in the command line, we can see in the decision
that MyAP should be scanned for a few hours.
First of all, because of this bad user agent, we can
as well look at the other alerts that were triggered. And here we can
see the various alerts and we can even deep dive into a specific
alert. For example, let's say I want to see more
into detail what happened and why the sensitive file scenario was
triggered, which is here alert number five.
And here we can see for example the value sensitive files that were tried to
be accessed on my web server. However, if I try to access
to my web server as an attacker, I still can, right,
because CrowdSec is only in charge of the detection. So what I'm going to
do is that I'm going to jump into the hub and find a bouncer
that is suitable to my technical environment here. As I'm using Nginx,
I'm going to use the NgInX bouncer which leveraging UI
integration within Nginx to provide the blocking capabilities.
I'm simply going to download the provided table and
deploy it on my machine. So I can now simply
restart my NgInx service which
thanks to the bouncer edition is now going to allow NgInX whenever
it is an ip that it doesn't know to interrogate,
protect local API and ask whether or not it should let this
ip go through. So of course now if again
I try from my attacks point of view to access to my website, I'm blocked
and I get a four or three because my ip is
still bad. So what I'm going to do is that I'm going to remove
the existing decision on my ip. And here now we can see
that I will be able to access again to my website as I remove
the decision. One thing is that I configure
my iptables firewall to log establish
or attempting to be established connection. And this is a great insight
from a security point of view. So again we are going to jump into the
hub and find collections that will allow us to take
advantage of this. So of course there's a collection for
iptables, it includes parser for iptable
logs and as well a scenario that is going to allow me to detect
port can and as you can see here installing
a new collection is as simple as using
CSLI to install the collection. This aims here at reflecting the fact
that your technical stack are changing faster and faster. And so
you need to be able CTo easily adapt security software to
new changes in your infrastructure. And now that we restarted
the service, we can simply launch a port scan
with a good old Nmap and see that now CrowdSec
is able to detect this kind of behavior. Installing the collection taught crowdsec
how to understand these logs and how to detect these kind of patterns.
So it's simply an example to show the evolutivity
of the software. And now my ip again is can, but this time for a
different reason. And again trying to access Nginx is going to stop us.
One more thing that I want to showcase is visualization, right?
Is thats we know that the dashboard is something that is
sometimes or often missing in open source software and it's very important for
some users to be able to have a visualization of the data.
Here we are using metabase. Metabase is a great open source software,
a bit like, let's say key banner or something like this for those of you
that are not familiar with it, that allows you to create fancy dashboards.
And we are using metabase in combination with Docker
to simply being able to deploy out of the box some fancy
dashboards for the user to see the activity of timeline and
see what is going on. Metabase is now being deployed and we can
simply access it through the web interface.
Credentials are provided and as you will see, it gives us a
very good synthetic view of the activity of the machine. So of course there is
not much activity to be seen right now because we simply and
just deployed the machine and we can see our
attack from IP. And funny enough, another IP attack that's during the
demo, we can see the timeline, we can
see the kind of attacks, the sources of the attack and so on.
One more thing that I want to showcase is that we do know that
cause positive are something that is very often very frightening for users.
And CrowdSec has the ability to work on call logs as
it does on could logs. So it provide us the ability, whenever you are
trying out CrowdSec or writing your own scenarios, to place those
scenarios on past logs and see whether it would have led to false
positive, false negative, or simply have an overview
of the activity. So here I'm going to ingest within
cross sec my log of 2019 of my web server within
the existing instance. And as
we see now, the attacks that are being detected
are the attacks that were happening in January 2019.
And so, of course, if we jump back
to our dashboard right now, we are going to see, and we change
the period time to be last three years, we are
going to see the activity that was ingested, and here we see all my
could events being ingested and the timeline being reconstituted,
and here we have the visibility, et cetera. So I guess it's a
great way for users to familiarize themselves with the software.
One last thing I want to show is the metrics here.
Okay, here's through the command line. But actually, CrowdSec is
instrumented with Prometheus, which is a tool that
ops people love. And it gives us some good metrics on what is going on
here. For example, we can see the values, resources of logs that we are
reading, how many lines are read, how many lines are passed, how many lines are
even being connected into budgets, which are existing scenarios,
giving you a good idea of whether your configuration is appropriate
or not. And same goes for all the components.
Now that we saw the open source software part, what are we trying to achieve
here? We're trying to create a CsCTI crowdsourced cyber threat
intelligence. We do believe that not only running
honeybot, but having thousands of real users,
exposing real services and facing real attacks every days is
going to significantly increase the efficiency and the accuracy
of our cyber threat intelligence approach.
And this is what we want to do with thousands of users. We are going
to create a very accurate CTI mechanism.
And so how do we then mix all this information together?
Because why do we believe that the crowd is so important?
Because context is key, and context can be gained through the crowdsec.
An attack is something that is very time dependent. Can IP that is
malevolent right now was legitimate a few days, a few hours. A few
weeks ago, it was a legitimate asset that most likely got
hacked, is now taking part into attacks. And once a legitimate
owner will be made aware of the behavior of his asset,
he's going to clean up his mess, and the IP is going to become good
again. That's why the crowd is so important, in order to evaluate
who's good and who's bad at a given time. And currently, we do consider
that an IP that we didn't see attacking for 72
hours are going to became good again. Being able to curate
all the reports from the user in real time is a real change here.
How do we deal with false positive? How do we deal with poisoning?
What is happening here is that the users thats are participating in
the network are given a trust rank. It's based on things that are hard
to fake. It might be things such as persistency. For how
long have you been sending information? Or consistency? Do the IP to
report are being actively reported by other users? Are they being
reported by users with higher trust rank? And so on. So this is
a mechanism for us to fight against poisoning.
And then as well we have our own omnipotent network. At the beginning
it was a way for us to bootstrap the consensus chamber.
And now it has all used for two purpose. To being
able CTO artificially increase our presence in some part of Internet or some part of
the network, or to being able to fill in the technological
gaps that we might have if tomorrow there is, for example, a dramatic vulnerability
that comes out on Drupal. Our honeypot network will allow us to
easily deploy hundreds of vulnerable machines very
quickly amongst various public clouds in order to be able to capture emergent
signals or their emergence threat. Then we have the canaries.
Canaries. It's an analogy with the canaries in the mine and it's something that is
part both in circumstances and in the open source software.
It's first of all a promise for the user of
dealing with false positive easily. You can tell the user out of the box.
You are never ever going to ban Google's SEO boat or your
PSP because of crotch, because are part of the whitelist.
And for us it's a good way to fight against false positive and
more specifically to be able to identify easily which
scenarios created by the community are subject to false positive.
And a good way as well to fight attempts of poisoning.
Predictive algorithm is a way for us to address the
thats that are under the radar. A lot of groups are using value assets,
CTO being able to perform the attack. One ip is going to come here,
do the scooting or fingerprinting of your web application.
Another one is going to exploit the vulnerability, one shot and a third one
is going to be able to access the backdoor. And the predicted algorithm based
on the huge amount of data that we're dealing with, aims at being able to
identify the low signal and aggregate them together to identify
more advanced actors. These are the various
mechanisms that allows us to lead to this reputation database and how we
can have a consensus curate the data and then redistribute
it to the community. In case my accent didn't give me
away yet, we are french and so GDPR and data privacy is
a big topic here. CrowdSec never ever sends your log to
the platform. Your logs are never exported to us. The only information
that you give us is the ips that you decide to block the scenario
and the timestamp. And this is everything we need to perform this
consensus and then redistribute the information. CTo the network. You might
legitimately ask what is our monetization and
what is our business model? So it will be on one side, fleet features
for people managing huge fleets of a CrowdSec installation,
as well as things such as self monitoring. And most
of all, and this is the corner case here, it's API
access. People that want to be able to access the reputation database without
contribute to it. Thank you very much for your time. Hope you enjoyed the
presentation and hope to see you soon. Either hop into the GitHub
on thats course on GitHub.com. Find us. See you,