Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, and welcome to this session. I am Nidal Albeiruti.
I am a solutions architect with Amazon Web Services.
Our topic is about one of the important concepts in machine
learning, which is feature engineering, but with a
special focus on how to extract more
from a specific type of sensors, which are binary
sensors. We use them in the Internet of Things context.
Therefore, in our agenda, I will take
these concepts and discuss them, starting from the general topic
down to the specific. So I will start first by
discussing IoT solutions in general and
their different challenges that must be thought of.
To build such solutions. We will funnel down
to the role of devices used there, mainly sensors.
After that, binary sensors will be defined to
clarify what these are and what make them different
from other sensors. Then we will move on
to introduce what features engineering is and
by knowing what the feature is first.
Later, we will discuss the additional features that
can be created or extracted to help us to
build more efficient machine learning models when dealing
with binary sensors. Finally, we will introduce
Amazon Sagemaker data wrangler as an efficient tool that
can be used to make feature engineering tasks easier and
repeatable. Let's make a start. So,
first topic is Internet of Things IoT
IoT solutions build on data received from
the deployed sensors, and to receive this data,
a connectivity solution in a way or another,
must be established. The received data will then
be analyzed to act on the different outcomes found in this
data. The data or the actions can be forwarded
to the integrated solutions to draw insights
from them, and those insights can then support
decision making or go back to the edge devices
to trigger actions through the actuators.
And that shows us two things. First,
how IoT solutions can be complex and multidimensional.
And second is how important is the role of sensors
in IoT solutions? The explosive
growth in IoT use cases and the sheer number
and diversity of devices out there has been phenomenal
in this slide. These are just a few examples
of how AWS IoT is helping a IoT of
customers solve their business problems, to mention a few.
There is the use case of optimizing manufacturing.
Then we have the use case of remotely monitoring
patients and healthcare in
the context of the medical field,
tracking inventory levels and managing warehouse
operations connecting homes in
these context of ambient intelligence or connecting
buildings or cities growing healthier crops
with greater efficiencies, managing energy
resources transforming transportation
enhancing safety in working environments such
as the worker safety and
of course to monitor and manage electricity and water networks
to achieve energy efficiency.
AWS. You can see our customers have different use cases,
but all use cases are taking data from
sensors. The different types of IoT devices
available are powered by embedded processors.
We call them microprocessors in the case
of sensors, which is at the middle of the slide,
or microcontrollers in the case of actuators,
which appear at the left hand side of this slide.
Both of these devices have the necessary device software
that enables them to integrate with AWS IoT.
So where do we get these sensors and devices from?
AWS has a device qualification program
and qualified devices get listed in the AWS
partner device catalog, and that helps
you discover qualified hardware that worlds with AWS
IoT services, so you never have to worry
if your selected device will work with AWS IoT.
The URL for this catalog is devices amazonaws.com.
So binary sensors is part of the bigger group of sensors
used in IoT solutions. They report
the state of the monitored entities back to
the IoT solutions. The peculiarity of
their readings or values is that they can be only
one of two mutually exclusive values,
hence the name binary, which is demonstrated
in the examples on the slide.
So now here are the examples of binary
sensors that are widely used in IoT
environments. First, we have the
passive infrared sensors, PIR sensors.
These are motion detectors and they
can report back movement or no movement.
Similarly, we have pressure sensors and they can
report pressure or no pressure connectivity sensors.
They can report if there is a connection or no connection.
Same for vibration. And finally, we have the
example of smoke sensors, which are smoke
detectors. They can report if there's a smoke or
no smoke. Now we'll move on to the topic of feature
engineering, which is under the context of
machine learning topic. So feature
engineering is part of the process to get data ready for machine
learning modeling. After locating data,
you first need to work
on the various formats from the different identified sources,
such Aws databases or data warehouses,
which may require creating complex queries.
Alternatively, data may exist AWS CSV or compressed
format files on s three, for example,
in data lakes. As part of
this step exploratory data analysis,
EDA is executed, and it's about exploring
and analyzing the raw data even without
domain knowledge. Once data is collected, it needs
to be transformed into a usable format.
Transforming your data requires you to write code
to do these tedious tasks, for example,
converting numbers into floating point dates into
timestamps, or converting category text labels into
integers. And all of those are some well
known feature engineering tasks.
After the data is transformed, you then write more code
to create visualizations to inspect and analyze data
such as quickly detecting outliers or
extreme values within a data set, which is part of feature
engineering as well. Once you have prepared your data
in your development environment, you must make the data preparation
work in production. This requires help from it operations
team to schedule the data preparation to occur. AWS needed,
such as on a regular calendar schedule or
when the new data is available, or to translate
the data preparation code into a more scalable language.
Machine learning algorithms take a
representation of the reality as vectors of
features, which are aspects of the reality
over which the algorithm operates.
Pedro Domenigos says, some machine learning projects
succeed and some fail. What makes the
difference easily is the most important factor,
which is these features used. So the
definition of the feature you see here concurs with
what Pedro has said. And on the right hand side of this
slide you can see the different types of features.
So what makes a good feature? A good feature
must be informative, describes something
that makes sense to a human. A good features must
be available. You must have as many instances as
possible or you need to deal with the missing data.
A good feature must be discriminant, which divides
instances into the different target classes
or correlates with the target value.
Good features allow a simple model to beat
a complex model. You want features also to be
as independent from each other and simple as
possible. AWS better features usually means
simpler models. We have to make the difference
clear between features with hyperparameters.
Hyperparameters are a set of parameters that
are not determined by the learning algorithm, but rather
specified as inputs to the learning algorithm.
Hopefully the difference is clear. So, back to features.
Now let's see some examples of features
here. In this slide you can see the examples of features
that can be created or extracted of your collected
data, whether it's coming from sensors or other sources.
In noniot solutions, the examples
include in the case of images, we can extract
the colors there, the texture, the contours.
In the case of signals, the frequency, these phase,
the samples, the spectrum, time series we have
text and trends and self similarity
between different time windows of the time series,
biomedical context, the dna sequence and genes
for text. One of these well known features to
be extracted is POS tags, which refers
to the parts of speech, which is the process to apply
word classes to words within a sentence.
For example, you take nouns,
verbs, prepositions and tag them
within statements.
Great, now we know what the features
is. So what is feature engineering?
Feature engineering is the process of representing a problem
domain to make it amenable for learning techniques.
This process involves the initial discovery
of features and their stepwise improvement,
all based on domain knowledge and the observed performance
of a given ML algorithm over specific
training data. So features engineering
helps improve results by modifying the
data's features to better capture the nature
of the problem. And feature engineering tends
to bring out performance gains beyond tweaking
the algorithms themselves in machine learning
context. Sometimes feature engineering is
referred to as data munging or
data wrangling. Regardless,
features engineering is a process to select and
transform variables when creating a predictive model
using machine learning or statistical modeling.
And future engineering is an art like
engineering is an art, like programming is can art,
like medicine, is an art. There are well
defined procedures in future engineering,
and these defined procedures are methodical and
provable, and they are widely
known and understood, and future engineering
is sensitive to these machine learning. Algorithm being used.
AWS there are certain types of features, for example,
categorical, that fare better with some
algorithms, for example, decision trees,
than others, such as, for example,
svms. This slide lists the following
subproblems, and they are as well known
as subdomains or subproblems
of feature engineering processes. I will
go through some of them just giving examples.
Starting by feature creation.
Feature creation identifies these features in the data set
that are relevant to the problem at hand.
Moving on to feature importance, which is
related to wrapper methods, it is possible to
take advantage of some of the algorithms that do
embedded feature selection to obtain a feature importance
value as measured by the ML algorithm.
For example, random forests produce
an out of these feature importance score for each
features as part of their training process.
Feature transformation manages replacing missing features
or features that are not valid. Some techniques
include forming cartesian products of features,
nonlinear transformations such as binning numeric variables
into categories, and creating domain
specific features. Moving on
to feature extraction, it is the process of creating
new features from existing features,
typically with the goal of reducing the
dimensionality of the features and
in features engineering worlds the curse
of dimensionality is a well known
term referring to having numerous features and
not restricting or reducing these dimensionality
of these features that you have. While feature selection
is the filtering of irrelevant
or redundant features from your data sets,
this is usually done by observing variance or correlation
thresholds to determine which features to remove.
So let's discuss now what are these suggested
feature engineering techniques? And to
do that, we'll introduce these first set of the
well known feature engineering techniques, such AWS
imputation handling outliers binning
log transform one hot encoding grouping
operations feature split scaling extracting
dates there are other ways for feature
engineering, but those are a good and well
known examples. Feature feature feature engineering techniques
binary IoT sensors now into the related features
engineering techniques that we use with sensors.
So what can we do with the data that we get from sensors
in the IoT world? These slide shows
an example of applying Fourier transform
which is a mathematical transformation. It is
widely used to take the sensors reading,
which are distributed along the time domain, and transport
them to a frequency domain. This is a
very helpful transformation in the case of any sensors,
including binary sensors. But what
can we do for binary sensors? Is it more
than on off indication only?
Let's see specifically for binary sensors,
what I'm suggesting here are these techniques
in this slide and upcoming slides.
First is a way to cut up
the observations into a series of time windows.
These inside each window you ignore
the temporal parts of your sensor data.
These technique can include same or different types
of sensors in one bag and
hence its name as in bag of features.
Another way is for a series of sensor events
can be time windowed and the challenge then
is to select the appropriate length of this time window.
So we are segmenting the series of sensor events.
Please note that you should do some exploratory data analysis
to find the typical sensor activity durations,
and then you can select the time window length. It can
be 5 seconds, 10 seconds or more after
segmenting the series into time windows,
which is out of the values, the available
values that you have in the time window,
you take each time window and then you can abstract
the whole of it by representing it into
a single value. For example, you can take the
median of the values available within
the time window, or in other cases you
can see the presence of one value can
be taken to represent the whole of the time window,
for example by applying minimum or maximum operations.
Let's see more techniques that we can apply to binary
sensors. In this case as what we
see in the slide. Binary sensors can be used to identify the
location of the monitored entity by applying
localizations in regions.
Technique binary sensors can detect the presence
or absence of a particular target in their
sensing regions. They can be used to partition a
monitored area and provide localization functionality.
Even more, if many of these sensors are
deployed to monitor an area, the area itself can
be partitioned into subregions,
and each subregion is characterized by the sensors
detecting targets within that region. Let's see
more techniques as well. Some of these new features
that can be extracted to help your machine learning
models. We have the location area,
which may be associated with a unique sensor number
within the ensemble of deployed sensors.
The total elapsed time of each continuously
happening event when these sensor has switched
to the on state. Because we're speaking about binary
sensors here, this will indicate the length of detecting continuous
sensor state we spoke about time window,
but time window aggregation where we group events together.
We can then compare similar periods
of the day, for example, together.
Finally, merging the binary sensors events
with other sensor events to add context to the
binary sensor event to extract new features.
For example, having a binary sensor associated
with a light sensor, for example, to differentiate
between an event happening in daytime and
another event happening at night, for example.
Now, taking the PIR sensor
case specifically, and when detecting
activity using passive infrared sensors,
PIR sensors here are some examples
of the features that can be extracted for PIR
sensors. For example, these activity
level within a region, an area
which is the number of events firing or happening
or coming from a PIR sensors can be one
of the extracted features.
For example, as well, we have the elapsed
time between sensor events. So if we
have two consecutive same value events from
the same sensor, this means that the entity in
the same region or subregion is
active within the region. So those are two consecutive
same value events from the same sensors. And the time
between those two events can be extracted aws,
well as inactivity time within the
same region or subregion. In another
way, if we have two consecutive same value
events from different sensors,
this means that the entity being monitored here is moving from
one region to another, and this is active
time or changing these region can be another
feature and a very helpful feature to identify
activity between different regions. The frequency
of movement events within the same region, as we
said, can be extracted as an activity time
or no activity time. This is very useful
in the case of ambient intelligent intelligence
solutions, which is in the home monitoring
contexts or environments. Finally,
we have a very important event which I
found it to be very helpful in ambient
intelligence and behavioral modeling
techniques is the no sensors events,
which means that we don't have any PIR sensor
firing coming from any deployed or ensemble
of sensors. So any no
sensor firings at all can be
considered a features, which means
that the monitored entity is not available as
well, which helps behavioral modeling.
So we spoke about feature
engineering. We defined what a feature is, and we defined what feature
engineering is. Now we're going to introduce Amazon
Sagemaker data Wrangler as part of
the Amazon Sagemaker studio.
Amazon Sagemaker is a service with a lot of different
features and capabilities in it. We typically
talk about those capabilities as falling into four
categories. We have on the left hand
side of the slide the data preparation and
then these model build phase. Then we move
on to training and tuning and deployment
and management or hosting of the model.
These four categories really address the need that
machine learning builders have when
dealing with each stage of a model's lifecycle.
Since we are discussing feature engineering, the highlighted
column on the left shows the prepare these and
available services that are used there.
Related to that, we can see the
data angler which is Sagemaker data angler and
Sagemaker feature store.
Amazon Sagemaker Data Wrangler is the fastest
and easiest way to prepare data for machine
learning. Sagemaker Data Wrangler gives
you the ability to use a visual interface to
access data, perform EDA exploratory
data analysis and feature engineering,
and seamlessly operationalize your data
flow by exporting it
into an Amazon sagemaker pipeline.
Amazon Sagemaker Pipeline is
one way of exporting your data, but you can export
as well to Amazon Sagemaker data Wrangler job
or a Python file, or to SageMaker
feature group. Amazon Sagemaker Data
Wrangler also provides you with over 300 built
in transforms, custom transforms using
Python Pyspark or Spark SQL runtime.
Built in data analysis such as common charts,
custom charts, and the useful model
analysis capabilities such as feature importance,
target leakage, and model explainability
all are available for you as
part of the SageMaker studio.
Finally, SageMaker data Angular creates a dataflow
file that can be versioned and shared
across the teams for reproducibility.
With SageMaker data angular data selection tool,
you can quickly select data from multiple data sources such as
Amazon Athena, Amazon Redshift,
AWS, Lake Formation, Amazon S
three, and Amazon Sagemaker feature store.
Recently, Snowflake has been added as a
data source for Amazon Sagemaker data Wrangler.
You can now quickly and easily connect to Snowflake without
writing a single line of code for
other sources. You can write queries for data sources
and import data directly into SageMaker from
various file formats such as CSV
files, Parquet files, and database tables.
This data is imported into a secure central data
preparation environment where users will have access
to a variety of pre built tools to prepare their
data. To transform your data,
Sagemaker data Wrangler offers a rich selection of
pre configured data transformed. For example,
you can convert a text file column into a numerical
column with a single click, or author
custom transforms in Pyspark, SQL or
pandas to provide flexibility across
your organization. This streamlines the
process of cleaning, verifying,
and visualizing data without writing a single line of
code. Once the data is transformed, Sagemaker data
angle makes it easy to clean and explore the data
with data visualization in SageMaker Studio.
These visualizations allow you to quickly identify
inconsistencies in data preparation workflow
and diagnose issues before models are deployed into
production. Finally, data Wrangler makes it easy
to create a pipeline for data preparation.
You can export your data preparation workflow to a notebook
or code script with a single click,
efficiently brings your data preparation workflows into production
without manually sifting
through and translating hundreds of lines of data preparation
code. So this slide summarizes all the steps
taken by Sagemaker data wrangler and provide details
on what functions data Wrangler performs at each step.
The key takeaway here is that these steps can
be iterated on in whole or in part,
to quickly build a strong set of data transformation code
and features to improve the performance and quality
of your machine learning models.
As we mentioned before, you import from multiple data
sources, but here we mentioned Amazon Sagemaker
feature store, which is just mentioned as an
example only at the right hand of the
slide you can see that to operationalize your data
you can integrate with these other Amazon services available,
or you can export to pipeline your
tasks that you have created in data Wrangler to pipeline them
and convert them to a script or python file as what we
have described before.
So we have reached the end of this session.
Thank you for taking the time to listen to
this session. We have highlighted the different additional
features that can be extracted and created based
on binary sensors data, and of course all
has happened utilizing different feature engineering
techniques. So binary sensors are not about
zeros or ones, only much more
can be extrapolated and extracted to help
your machine learning models. Please use these
additional features to enhance your models and
get more accurate results depending on the selected ML
algorithm. Thank you very much.
Have a nice time. Thank you.