Transcript
This transcript was autogenerated. To make changes, submit a PR.
Thank you for joining us for zero trust security with IoT session.
I am Syed Rehan Armasi, IoT developer evangelist
within AWS IoT service team. You have my QR code on the right hand
side here. You can connect to me using LinkedIn or twitter.
Ask me any questions related with AWS or WSIO IoT or zero trust or
in general. So let's begin. So in
this session today we will look at
zero trust and its protection principles. We will also
look at NIST and NCSC zero trust design principles.
These are the cybersecurity organization in UK and US.
We will also look at AWS IoT security best practices.
These are the ones which I basically advocate to
my customers or AWS in general advocate to customers
as part of the security best practice to adhere to. Then we will look at
a demo and I'll walk you through using AWS Iot green grass
and AWS IoT device client these are the two open source AWS
IoT projects which you can basically use to emulate a real
hardware or real device, a real Iot thing.
AWS IoT, the Green Cross is actually edge
runtime environment where basically you can run lambda
components to compute at edge.
Within this demo we will basically look at how to secure these devices using AWS
IoT policies and how does it actually adhere to
AWS IG adhere to zero trust? Then I'll
take questions if there are any so what is zero trust?
So from a NIST, which is a cybersecurity
organization in the US, they always
say that the cybersecurity paradigm,
you need to make sure the network based parameters to focus on users'assets,
and resources, right? And you need to make sure that there is no
implicit trust granted to assets or user based
on their physical network location or the asset ownership. For example,
somebody who is basically a root user or admin user,
and you give them complete network access and allowing
them to access your backend, it is against your trust guidelines and
principles, whether they based over VPN or
they're basically doing it over physical based on the office location
and whatnot. Similarly,
NCSC, which is a cybersecurity organization,
a government agency in UK, they say
that there shouldn't be a trust, direct trust
and inherent trust in the network is removed and
should be removed. Always assume network is hostile and
each request is verified. So for example, network is assumed hostile. You may never know
who's actually snooping and eavesdropping and always request each
request is verified, basically backing that up with the policy and backing them
up with the authentication authorization to make sure it's a right user.
Right level of privileges has been applied when the user is accessing
the environment. So protection principles
for zero trust. So these are some of the protection principles.
When I have a discussion with my customers, I always look at and I
basically tell them that to look out for within your own protection
guidelines or security guidelines, make sure
you're always paranoid in a sense that Internet and external
threats always exist, right? Never assume that
whether the internal user or external user, they are verified and
they are valid users, so there won't be any compromise
on situation, right? Assume hostility.
Always assume each device is hostile. So if we take zero trust principles
in hand, we will look at to see that
every device is hostile, right? And this goes hand in
hand with the rest of the other principles which we're going to talk about
to make sure that if the device, whether IoT is actually owned by
it director or IT administrator or a developer
or whatnot, you need to make sure that every device is hostile
because we never know what device could be used
as a bed or a conduit for can attacker
to basically gain access to your environment. Gatekeeper.
This is one of the rounding ones where we basically
look at authorization and authentication every user,
every request to make sure that you have a gatekeeper in place.
Most organizations do have gatekeeper in place, but usually some
of them actually has a caching mechanism where they say, okay, yeah, no,
we have authorized and authenticated this user or device and we
will have this available and valid for next five days and whatnot. You shouldn't do
that. You should always make sure that you authenticate and authorize every
single time to make sure that your gatekeeper is always up
to date. Trust issues I think we talked about earlier from the NIST
principles as well as NCSE that
trusted devices is never trusted. So for example, as I mentioned earlier,
if somebody is in management or somebody is basically a
developer, anybody within the company
is basically trying to connect to the environment and they have a trusted device
and they have root access. They shouldn't basically get automatically,
inherently gain back access to environment
because these are against zero trust principle because this can lead to causing
of your back end to compromise.
And then we can also look at God update and change policies dynamically.
For example, if a device has been compromised, you need to make
sure that you are updating the policy dynamically. If a user laptop has
been stolen, you're updating the policy dynamically. If the device
IoT device, Internet of Things device basically has been compromised, you're updating
the policy dynamically. Taken what I've just mentioned,
let's take and apply security best practices when basically
talking about Internet of things or an asset, basically trying
to connect to your environment, whether it's a cloud or endpoint,
always decouple ingestion from processing. For example,
we have two types of control plane. We have a data plane and we have
control plane. Data plane is where the device is actually sending you sensory data,
telemetry data. And then we look at control
plane where you control the authentication authorization
and whatnot. The category of the device to basically allow to
basically connect to the cloud or endpoint is where we decouple
this from processing. For example, let's say if
a device becomes compromised and you know that you have a control
plane separately, you can basically lock the device out and
we will look at rest of the best practices below.
You can basically always make sure the device is actually out
from the environment, but is actually functioning as an offline behavior.
Gives a segue into the next point which I was going to talk about design
for offline behavior. Make sure your device is actually capable to
run offline. Let's say, as I mentioned earlier, device being compromised and
you want to basically apply the control plane and lock out from the environment,
from the cloud or your endpoint, your network and make sure it
works offline behavior. For example, I actually have humidifier
connected to the Internet. I can control it using my
mobile app. I want to make sure that if the basically device becomes
compromised, I would basically still have the normal regular functionality
available from the device and it shouldn't be basically be
compromised. That I shouldn't actually have a device which is
basically not performing what is basically devices and tasks to
do. Design lean data enriched in the cloud.
So for example, any IoT device or in fact
any asset, whether it's connected to the cloud or on
prem environment or endpoint or whatnot, always will generate
tons of amount of data. We could go to zetabyte,
petabytes and whatnot. Right? And we need to make sure that
we remove the right data or rich data with
a noisy data. So for example, if you have a device connecting
to the cloud, you need to make sure that you are taking
the lean data and applying it into the cloud and
basically using it to do bi on it. You don't want to basically
have a noisy data as well as rich data coming
to the cloud and then you enrich it on the cloud. No, you should basically
apply the design principles to say you have a lean data at edge
so you basically could use something like AWS Iot greengrass,
which basically works on the edge and offline to give you
enriched data. So basically you can run compute at edge
using AWS IoT greengrass, an example of that. So you
could basically have a lean data coming to the cloud, which you can enrich by
applying, I don't know, normalization. You could apply kinesis,
firehose, or any other sort of data lake putting into data lake and enrich
it for your bi users, analytics and whatnot.
Personalization, for example, you need to basically handle
each device differently. You can't basically have one co
solution for all. For example, you may have a connected car,
you may actually have a connected fridge, you may have a connected microwave.
They all have a different functionality. You need to basically
handle personalization, tie them up with the offline behavior,
tie them up with a decoupling injection from processing.
For example, a car is basically a totally different machine compared
to a microwave or fridge. Fridge main task is
basically making sure keeping things cold and not to make sure they go rot.
Similarly with a microwave and similarly with a car. So for example, you wouldn't
want to basically compromise a car. And you would basically lock that car
down to make sure it is not working at all. I mean, carbon is actually
moving. It will create telemetry data in terms of where it is
actually going, the motor, the battery,
the brake pads, the fluids, the pressure,
so many things. The moving parts within the car will basically give you a telemetry
data, which obviously you can enrich an edge, normalize it,
and then send it to the cloud for you to utilize that. And also you
can apply personalization for it. For example, you can apply EV
car, which is electronic car battery running out, and you're basically
notifying the user and routing them up to the local closest
EV charger. You can't do that obviously with the
microwave it's totally different story. So you need to basically handle personalization
also to make sure that they allow the
devices to basically work offline so there is
no compromise of feature and usage
of the device. What is designed to do and also ensure devices
regularly send status check. For example, some customers
basically send status checks like every 24 hours.
Well, that's fine, you can send over 24 hours. But then
that depends if the asset is actually moving. Asset is portable asset,
what's the priority of the asset and whatnot?
Some customers actually send it every five minutes. Some customers send it every 1 hour.
It all depends on how you want to basically make sure the
device is actually sending the status check and regularly sending the status check.
Also look at device security lifecycle holistically. So the device security
lifecycle holistically ties in very well with one of the other
principles, which is identify lifecycle management.
So for example, you actually have a device, and it is a greenfield device.
You want to basically make sure it's gone into the field. You are having a
proactive action on the device, whether the certificate becomes compromised or somebody's
trying to clone the certificate of the device, or if the device basically has
become too chatty and whatnot. You basically compromise
the device connectivity to the cloud by saying that, okay, I'm going
to apply privileges and the policy by locking IoT down and
then implementing device identify identity lifecycle management.
For example, if a device becomes chatty, you want to basically
put the devices into a quarantine group or quarantine zone,
where basically you can apply restricted access to
the device in terms of it can just send you the data and cannot receive
the data from other devices or from the cloud and whatnot.
And it is functioning,
but it is not basically functioning to the 100%. You have
not basically made the device redundant. It is capable
of carrying out the function it is designed to do. But you want to
basically make sure that in terms of the device security,
you could basically use, for example, in this case Edujase IoT secure
tunneling feature to log into your device using SSH tunnel
over MQTT. And the device basically will say,
okay, you're connected to me. You can basically restore its firmware. You could
basically restore its state to the previous working state or look at what
actually caused this anomaly. Right? So it goes tie
in very well together in hand in hand. Lease privileges
again, start with least privileges and grant them elevated
privileges as and when needed. You shouldn't give them.
If you're talking about from AWS policies
perspective, if applying a star for everything, it's just
not the right way to do. You need to basically make sure and understand what
the devices is actually connecting is, sending, what is publishing,
what is receiving and whatnot. So apply the bare minimum
policies and permissions for the device to basically connect to and be
functional for your environment. Secure device credentials
is trust. So for example, you may have a device which
basically have a TPM, which is the hardware module for you
to encrypt the device credentials or certificates when
it's trying to connect to the cloud.
If you don't have a hardware TPM, you could also set up
a software based TPM within your environment or within your devices
and whatnot. And this will basically encrypt the security credentials
and always making sure that it doesn't get compromised at rest.
I've already touched that about implement device identity lifecycle management.
It goes hand in hand with device security, where I mentioned that you need
to look at the complete holistic view of the identity of lifecycle management.
Also tied with the previous practice
I've talked about is that each category should be personalized
differently, whether it's a car, whether it's a fridge, whether it's a microwave,
kitchen appliances, external lights or whatnot. They all should
be maintained and managed differently based on
their identity, lifecycle management and machine learning.
So as human, our level of reaction
and response can never be equivalent to
the computer. So if you actually have, let's say, machine learning, right,
and you actually have a compromised situation within your environment, or your
devices get compromised, if it's a machine learning, it can immediately
notify you and apply the lock and basically quarantine
the device so it doesn't basically violate and create security
risk for you, for other devices and within your environment. So this
is where I always say to my customers, use machine learning where you can
always apply machine learning where you can, unless the devices are not
capable to. I mean, there are two types of machine learning when we're talking about
here. One is at actually cloud side machine learning,
where you have the capability to train the model and apply cloud side machine
learning knowledge. And then we also have something called machine learning inference,
which basically you can apply machine learning model, train in the
cloud and apply the inference of it at the edge devices. These devices
could be can MCU or small raspberry PI.
They are capable of running machine learning inference because you're not training the model,
you're applying the inference at edge.
Finally, take a holistic view of data security.
You need to make sure if the device basically is connected to the cloud or
in fact interconnected to your environment or on prem wherever endpoint.
You need to make sure the data never gets compromised because you
don't want to basically lose the faith and trust of the user if
the data basically gets compromised. So have a holistic view of
data security and making sure it ties in with the device credentials
by using TPM, the hardware module. So make sure the devices never
actually comes to a state where it can become compromised for
a snooper or attacker who's trying to basically obtain the identity of the
device and obtain access into your environment.
Let's look at telemetry before machine learning. So I mentioned earlier, apply machine
learning. So if a customer basically is unable to apply machine
learning due to the capability of the device,
or the device at edge is not capable to handle machine learning, then you
can basically use something called AWS IoT device defender.
It gives you the capability in terms of security profile to apply
the data set, apply the algorithm to say that if
the device becomes chatty or if the credentials has been compromised and whatnot,
notify the user through Amazon SNS and
log it into Cloudwatch. And then also send a trigger using lambda
back into the device to say that you need to basically stop
sending this data. Also, when it's connected to the cloud part, you can
basically use lambda to move the device into a quarantine zone.
But obviously these are the manual steps where you will need to do as part
of your setup if the machine learning cannot be applied. Now let's
look at defending devices with Amazon machine learning, right?
So if you have a device which is capable of sending you data to
the cloud, or an asset which is sending you data to the cloud,
you could basically use the data, put it through Amazon
Sagemaker by cleaning up the data into AWS IoT analytics,
having the channel, the pipelines data store and creating a data set. Now this data
set will be used by Amazon Sagemaker to create the model.
You can basically take the model and apply into Amazon S three and
then this model can be sent back to the devices at edge to
say that okay, we have created this ML inference
and for you to basically apply this locally. So this could be a microcontroller
IoT, could be a raspberry PI, it could be
a full blown industrial edge environment running in
a normal operating system. These can be sent back downwards to
the device to basically handle this. You can see here we also talked
about AWS IoT device Defender, which will give you the capability
to look at certificate the policies, audit it in terms of
whether the device has basically been compromised or not. Make sure you grant permissions
based on what the user basically need on need to know basis.
Finally, let's look at automated eliminate risk by
applying inherent and inbuilt feature of AWS
IoT device Defender ML detect. Now in this feature you will see
that AWS IoT device Defender ML detect contains policies,
permissions, audit and takes the holistic view and take
all of that into a machine learning model. It takes a machine learning model,
creates the model for you and takes proactive action. It gets
to work for you. So for example, if there's been a compromise, the machine
learning model will notify the user using Amazon Connect or
alexa devices and notify the user using
a mobile app which have through Appsync and whatnot and
take mitigation action for you to notify that okay,
the device been compromised, there has been a violation.
You basically have taken the action and moved the device out
from a normal zone into quarantine zone until you can basically rectify
the situation with that. I'm going to move into a demo.
So this is a demo architecture which I'm going
to show you what we're going to look at. So I'm going to use
AWS IoT, greengrass and AWS Iot device client to show
AWS IoT or IoT things. You can basically use these
open source services, open source software I should
say, to emulate Iot things.
We will look at device policies and permissions
by applying lease privileges and then basically taking this
and looking at the visibility of the violation of this device is
actually creating for example looking at device Defender ML detect or
rule detect for you to know that, to apply the algorithm
to see if there has been any compromise so you can take those
actions. Finally, we will look at security hub,
AWS security hub, which is one of the security services
we have in AWS, where you can look at complete single pane
of glass, whether it's an IoT device, whether it's an asset,
or as a computer connecting to the environment to your whole aws.org.
You can basically see the billions showing up in there, whether it's EC two and
whatnot. So giving you a complete holistic view in a
single pane of glass. So with that, let's switch over to AWS
IoT core or an AWS console to look into this
from AWS Iot console, let's look at connecting to a
device, connecting a device to the environment.
You have can option to go through this as a visit,
creating a new thing, give it a thing name.
The visual will guide you through to set up the SDK
platform. If you need you can set it up for node js, Python,
Java. So just go ahead with Linux and Mac,
download the connection kit and complete
it and
we're done with that. Okay, so we also
have a test client. So this will basically allow us to
look at the data coming in into our cloud environment.
So you can see here, I'm actually using AWS IoT device client to
send some sensory data over here in terms of temperature,
pressure, humidity and data time.
So these devices are backed by our
security certificate. So if I go into security,
if you go into certificate. Each device basically has
a certificate and the certificate ties in with the policy.
So here, if you look at deny all
policy, which means that we are blocking the device in here
and not basically giving any actions or resources to do in terms
of denying action. Right. We also have
something called AwSiot device finder audit feature.
And I usually say to my customers, run schedule can
audit. So for example, if I go ahead and create an audit here,
we basically give you 14 best practices for
you to run audit against. You can run it ad hoc, which is just now,
or you can run it biweekly, weekly, monthly or daily. So I'll just do
once ad hoc one now and
I'll look at the one which I created earlier and we can see what
are the non compliant. So if I go into that one,
we can see that there are some devices basically sharing a certificate,
there is some overly permissive policy, there are role aliases
and so on. So if you basically were to mitigate an action, we will basically
go ahead and run things and start a mitigation action. And this
overly permissive policy will basically get mitigated. And we
can basically run this mitigation action to say go ahead, do this
and apply the reason for it and so on. So it will complete the task
and it will block the policy, making sure that this
is not overly permissive. Right. So let's go into a security
profile, which I mentioned about the machine learning part. So we have two
options here when it comes to machine learning. We have rule based setup
for security profile, which means that if I basically go ahead
and say it needs to connect to all the things or all the devices which
are talking to AWS IoT. So let's say confole two
profile, and I can
select the metrics. So we support cloud side metrics,
device side metrics, and custom metrics. So for example, if you
have devices, if you want to understand the cpu usage and whatnot, you can do
that. It's custom metric. And remember the architecture
diagram I showed you earlier where you had SNS topics, so you can send
sns to the end user. So for now,
I'll just leave that for now. And then the significant
difference between rule based and machine learning
based is here. So, for example, in authorization
failure, you're actually creating an algorithm to say that notify
if this happens absolutely or relatively,
and based on greater than, less than, or equal to, let's say a value of
three in the last five minutes, ten minutes, 1 hour, whatnot.
Right. And you have similarly for disconnect, you have it for message
size, you have it for rest of the other metrics. So I'm going to cancel
this one for now and basically go and show you the difference between this one
and the machine learning based one. So let's look at machine learning based one.
So if I select it for all the things and give it a name,
and let's select all the metrics. And in here,
similarly, just like that, you can set up an NSNS notification. And let's
go into the next section. And in next section you can see there's a start
difference here. We basically just take the data points
for us to trigger the alarm. And when we should basically clear the alarm,
rest of it is basically taken care of by machine learning.
So let's go ahead and complete this one.
So if I create this ML profile, it basically allows you
to basically have this machine learning model created.
So if I go in here and look at the behavior
ML training, so you can see that it's pending built, it's not being created.
So it needs 25,000 data points for it to basically
get triggered and active in. So I usually say to
my customers that once more is getting built, create a
rule based profile for timing until basically this kicks in.
So the moment you basically have 25,000 data points, it will trigger the
action and it'll basically go ahead and start taking actions for you, notifying Iot
for you. Okay. So the other thing I
also wanted to show you is effectively if the device basically
becomes chatty, so we can basically look at alarms.
So we can see, let's say if
I basically look at a historic alarm and see the alarm behaviors
actually has happened in the last 24 hours. So I have been very good.
I've tried to make sure my devices don't compromise any shape or form.
So if I basically look at a previous alarm, if monster
basically loads up. So we can see here DDML six is
a previous alarm. Let's dive into the
profile policy here. So if I go in here and look
at it and see why basically got triggered an alarm.
Right. So let's go into the Iot thing itself and
look at the certificate. Basically it's attached to it.
And within the certificate, let's look at the policies basically using,
which is causing the problem. So let's look at this policy.
So we can see here that there is complete
violation of this policy of zero trust. And we need to
basically make sure that this device basically shouldn't be allowed
all action on IoT resource. Right. You need to basically
lock this down. So we can basically run a
mitigation action on this device by creating a mitigation
action, let's say action.
And basically you have multiple options here. So I'll say add things
to the thing group, and I'll basically use IAM
role, which I created earlier, allowing me to basically carry out
this action. And we can basically use DDML group. This is where
the device basically existed. And if you run this,
and we can basically block the devices.
So let's come over to thing group.
So basically I have something called quarantine group, which is where
I basically make sure all my devices which
are compromised goes into a quarantine group and
it has a policy attached to it to make sure that it has a deny
all. So that means the device is actually isolated, can't do anything.
So now let's look at the final part of the diagram for the demo
where I mentioned having a single pane of glass. So here we
are looking at AWS security hub. So if I go into security hub,
you have options to create integrations to many,
many different services. So in here, if I basically click on
integrations, you have chatbot card Iot, firewall,
detective health. So also AWS Iot device, defender,
audit, detect, Macy and whatnot. Right? So I have some
of them enabled and you can see here, it will show
up AWS. If the button shows up, I stop accept finding.
That means that it's basically accepted finding, just like it says over here,
as a status. Right. So if I basically go into findings,
this is where we'll get a single pane of glass of everything which
is happening within the AWS account,
whether it's EC two, whether it's s three, whether it's Iot. So we
can see here we have an EC two environment security group.
We also have IoT devices
which are basically writing to security bucket and whatnot.
So you can basically look at all of these in a single pane of glass
to have a single security view for you to
notify if there's any situations been happening and to
basically take actions on that. So with that,
let's switch over back to our presentation.
And within the presentation, we can see here,
we just completed that, the demo. Now, I wanted
to share some of the links with you. On the right
hand side, you will see a YouTube channel. So as
part of AWS IoT devices developer
advocate team, within AWS IoT service team, we have our
AWS channel, where we basically
occasionally and routinely publish videos. So it will
be good for you to subscribe and you also have devtool space for you
to basically look at any microblogging which we do
on the left hand side. The right hand side is a QR
and the left hand side is a workshop, links and some other links. So there
are three distinctive workshops which I wanted
to mention. First one is get started with the awsiot.com
is where you can basically get started and learn about how
to basically connect devices to the cloud, how you basically can leverage
and set up IoT. If you want to learn IoT,
then we have security focused workshop, AWS Iot.
Zerotrustworkshop.com. It is completely designed
from ground up to be focused on zero trust.
And then finally we have Greengrass Workshop, which I mentioned earlier,
our feature, our open source software,
greengrass, which runs on edge. And you can also look at
the source code for these open source projects, which we have, which is
AWS IoT device client and AWS IoT
green grass. Both of these are used within these workshops.
Finally, thank you very much from me for listening
to me and AWS. I mentioned earlier,
if you want to connect with me on LinkedIn, these are my QR code for
Twitter as well. I'm always happy to help to anyone who's basically
looking to seek information or have any blockers where I can unblock
them. Thank you very much. Have a great day.