Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. My name is Melissa McGregor,
and I'm a developer advocate at iterative AI.
But I specifically work on DVC, which is
a data version control like open
source tool you can use to, well,
version control your machine learning projects. But today
I'm going to talk to you about convolutional neural networks
in action. If you have any questions at any point,
feel free to reach out to me personally on Twitter
at flipped coding, or you can reach out to the whole team
on twitter@dvc.org but
just so you have a little background about me, I have my
master's in mechanical and aerospace engineering.
Then I did some machine learning work in robotics,
where I was able to work on this cool autonomous car that
interfaced with pedestrians and passengers.
And from there I've done things on front end,
back end, DevOps, database,
admin stuff. I've just kind of been all over the place in
tech. But convolutional neural networks are actually
something that I work with a lot on my personal projects,
which probably says a lot about what I do with my free time.
But I wanted to talk about them with you all so that you
can see how they're actually used in action.
So just a quick overview of what we'll be talking about today.
I'll give a quick background on neural networks in general.
Then we'll go over some basics of cnns,
and I'll touch on a few use cases for cnns,
and we'll actually make one. Well, I'll walk
through the code that you would use to make a CNN in Python.
And we'll run just one quick training experiment
with DBC so we can look at how well our
model actually is. And finally, I'll wrap up with
just a few key takeaways, some stuff that I really
hope helps you after this is over. So to
get started, little background on neural networks.
These are basically just algorithms that can be used to make
predictions. So they're made of these
multiple layers of nodes. And this is
what one node looks like. So the goal of
a neural network is to take advantage of deep
learning to try and imitate the
way our brain works. So each node can be
like a node in your brain or a neuron or something.
You give it a certain number of inputs,
and these a value is assigned to how important those
inputs are to the problem you're trying to solve.
Then some crazy math happens, and you go through an
activation function to finally get your output
or your prediction or whatever it is you're looking
for. Now let's talk about some basics of CNNs.
So convolutional neural networks, network has convolutions.
And these are just math. So it's a
linear operation that uses
multiplication on set weights with inputs.
But basically that means the filter is smaller
than the input data. So you take the
multiplication between youll filter
and the filter itself, like the filter
patch. So the little part of the image that your
CNN is going over, you take the filter patch and the
filter and you get the dot product. But here's a
picture to show what that looks like a little bit better. So these
squares that you see in orange or yellow,
I'm not sure which color that is. But the squares
that youll see in there are actually the
filter sized patch of the image. And the
filter itself is this three by three
matrix that is over that image. And as
we perform these convolutions, you go across,
youll image in a step called a stride.
So if you have a stride length of one,
you would just can from this three x three
section, and then you'd shift the whole three by three matrix over
to these next set of three squares until you
run out. And then you drop down to the next row and youll
repeat until youll have scanned these whole image with
that filter. So when you do have your convolved
feature at the end, basically this is just a smaller
representation of what that image is.
So usually we're using convolutions to pick out
the features in these photo like edges or maybe
large landscape features, things that
are really big and help define the
overall image. Basically, because with CNNs,
the gap that they fill in is that when you have just
a regular neural network or you're using some kind of other algorithm
to classify images, they might
take that image and make it into this one dimensional thing,
which if you take an image and make it 1d,
it takes a lot away from the context.
So you need to have at least two dimensions with
your images so that you get that spatial and temporal
perspective of what's actually happening in these image.
And when we use convolutions, it helps us
both take advantage of a lot of the preprocessing
that we get with CNNS. And it helps us
get through our data faster because it takes these image
and it squeezes it down into the features that
really matter. That's what we're doing with our convolutions. So something
else that's a major part of CNN's
is the max pooling layer or multiple layers,
if that's what your model needs.
But max pooling is actually how we decrease
our computational load that we need to process all
of our data. So the way that this works, it returns
the max value from the portion of
the image covered by the kernel. So if I
go back to this, the filter that's over
our orange area is the kernel. So our kernel
is three, because we have this three by three chunk
out of this element. And what this is saying is that
it returns the largest value from that
kernel. So with this one, this might not be
the best example. This is a picture for something else, but this would
return a one here just because that would be the largest
number in this filter size patch.
So that's what's meant by max pooling.
And when it's choosing this maximum
value from the kernel, this helps act as a noise suppressant,
so that max value represents some really bold
feature in an image. For example, that might
be a large these, that really cases up
a good chunk of an image. Or it could be something
like a person that's standing in
these foreground of an image it's going through and
picking. But the most important features in each of
those kernels as we go through the
image with our convolution scan.
So there are a lot of different types of CNNs. It just
depends on the model you need and the problem you're trying to tackle.
And 1D CNNs are usually used on time series
data. Like I mentioned a little bit earlier,
1D isn't the greatest to process images
just because you lose a lot of context of what's happening in an image.
But if you have some kind of time series data,
like maybe, I don't know why you would be interested in
the weather changing in a way that you would need a CNN,
but I'm sure there's an application for it. So anything that
is time dependent, one CNNs
will probably do a good job. And then two CNNs,
which is what we've been talking about the most. These are used
with image labeling, classification problems.
It's kind of like the standard when we're trying
to classify images now and then there
are three dimensional CNNs. This is getting
into some more high tech imagery. So you'll see
these a lot in healthcare with things like ct scans
and mris. Youll probably see it in some kind of
crazy advanced scientific labs where they're doing
stuff with electric fields, maybe.
Not sure. Whatever it is that they use to make
three d images, you might be able to process it
with a 3d CNN, and you've seen what a 2d
CNN looks like. When you're scanning through with the convolutions and
doing your maxpooling. But a 3d CNN would
look like a cube, and you would scan through different chunks
of that cube and do your convolutions and your max
pooling to get the most important features out of it.
So that's how they use 3d CNNs for things
like tumor detection or weird
tissue issues. Tissue issues,
sorry. But that is a real practical
use for convolutional neural sets.
And just so we're clear, there's a few differences between
convolutional neural nets and regular ones. But a big one
is that CNN save time on pre processing
data. So when they're scanning through the
image and looking at the kernels,
it's actually doing some feature extraction for us.
So you don't have to have a predefined
set of features that you're looking for.
CNNs do this discovery as they're going
along. And I just spoke a little bit ahead of my
second bullet point, but that's okay. But like I was saying,
CNNs, they figure out the important characteristics
as they go through that convolutional part
of the process. But there is
one thing where neural nets do shine a little bit
more than CNNs, and that's when you don't have super
large data sets. So typically anything
under, I think it was 10,000 images.
You might not get the best accuracy with cnns.
So convolutional neural nets do need a
lot more data than regular neural nets to be super effective.
So here's a few use cases for CNNs.
Maybe you want to recognize different handwriting,
which kind of segues into the MNIsT example that
we'll be going through shortly. Or maybe you're
working on something like an autonomous car.
Or maybe you're trying to get a computer to
identify certain parts as they pass a camera.
This is one of those times you would consider a convolutional,
convolutional neural networks net. And you
might also use this to help prevent bank fraud.
So reading the digits on checks is actually a really important thing.
If you've ever deposited a check in an ATM
or on a mobile app, there's probably been some
kind of CNN behind that. And again,
post offices have a lot of mail that these
handle throughout the day, so they need some help
when things are going down. Conveyor belts, I imagine.
So youll know, it needs to make sure that the
zip codes make sense and addresses make
sense for handwritten letters or even
labels that have been printed off still need to be processed.
And this is where CNNs really shine. So now for
the fun part. I tested this literally
1 minute before I started this talk.
So this live demo should work,
but we'll see how things go. Let me
go ahead and switch screens.
Okay, so you can see my instance of vs code.
And I want to make sure to emphasize right now,
you don't have to use visual studio code to
do any of the things that I'm doing. You can use whatever ide you
prefer. Nothing I'm doing is vs code specific.
You can even run all of these commands that I'm about to run
in a regular old console, but I just
like vs code anyways. I have these example
convolutional neural net setup for a MNiSt
example, but I also have it inside
of a DVC pipeline because this is how I like to track
my experiments, to see what models or
what code changes, what hyperparameter values or
what data sets really make a big difference when I'm training
my model, it just makes it easier to track stuff. But let's
break down this convolutional neural net. So to
start with, I'm just using Pytorch, and this
is all built in. So this torch neural net module,
all of these convolution max pool linear
things, this relu activation function,
all of this is just part of Pytorch. But what we're doing initially
is creating the neural
net itself. So up here, this is where you'll have
to know something about your data. So the data set
that I'm working with, I know that I have one
in channel for my image. Like you might have multiple
channels for your images if you're working with rgb
images, like colored images.
But if you're working with something that's black and white,
you probably can get away with just having one channel in.
And then we'll have eight channels going out,
and we have a kernel
size of three. So the reason we have eight channels
going, but is just because I know these size of my images,
and this is about how much we'll
be able to scan over and get from our convolution.
And then we've added this padding because we only have
eight, we want to make sure that we're capturing the edges
of the image in case there's some important information there.
So that's how we made our first convolution.
Next we have our max pool layer. And then
this is where we'll go through that kernel size and
make sure we have the max value for
each kernel size patch.
So this is where we get our noise suppression.
This is how we make sure we're pulling out the more
important features of our image. And then we have our
next convolution. So you can have as many layers of
convolutions and max pools as you need.
You can throw in some batches if you need. They can get really
intense if you're working with some crazy images,
which it doesn't take much. So don't
be afraid to play around with multiple layers of convolutions
and max pools. But one thing to keep in
mind is that the input for the next convolutional
layer matches what the but was for
the previous layer. So in this case, you see we're
getting more defined
on the image area that we're going over. So now we've
started with eight channels coming in and
we're going to go over 16 predictions.
So we're trying to get more definition and
figure out what those important features are. But we'll
keep the same kernel size. We'll do the padding just
to make sure we're not losing anything. And then we just have
a few linearization layers at the end just
to normalize some things. And last,
we have our forward network. And this
is how we really get the features and build
the model. So we have activation layer for
this first one, and that's just a way of getting a
positive value or a zero if there isn't
a positive value. And youll see relu activation
used pretty much standardly in neural networks
just because it gives you that impulse
value, it's either positive or zero, so you
don't have to filter out as much stuff. But we're
doing the activation function, we're handling the maxpooling
on the image, we're going through activation for
the next convolution, we're doing some more stuff,
and then finally we return the model.
So that is how you make convolutional neural
networks net in Python using Pytorch.
And a lot of it really is just dependent on how well you
know your data. So if you're a little bit confused
about what numbers you should use for your convolutional
layers or your max pool, take a look at these pytorch docs
and then just play around with them. Pretty much all of
model making and machine learning is just experimenting
with different values. And that's why I like
DVC. So for example,
I'm just going to run an experiment to show you what it looks like
I told you, I tested this not too long ago,
like literal minute before,
and okay, it's not broken, but haven't
made any changes. So let's say I didn't know how
many out channels to have, and I'm going
to put nine here just to see what happens. So when
I run this experiment, DVC detects
the change and it runs this training stage.
But you'll notice that we have
can error for this because I changed from eight to nine.
And it'll show you what the expected is for
whatever our given image is. And we just change
these back. Maybe we change these kernel size and I
try to run that again. Let's see. No, that didn't
work. So we'll change that back.
Let's change something down here. Let's say we want
ten instead of 16. Wonder if that'll work.
No, but the good thing about
DVC is that if any of those
code changes were to have run, maybe let's try
changing this to 64, see what
happens. Did that run? No.
Good. So you see, you really have to know the data you're
working with to get the right values. But let
me just try one more thing and see if that works.
All right. These is exactly what it's like
sometimes when you're training a model. But basically
I'm going to come change a hyperparameter,
going to change my learning rate, just to show you all what it's like
to run an experiment, because this is something else you might change if
you're working with convolutional neural networks net or a lot
of other machine learning problems.
So I changed that learning rate because I'm trying to figure
out the best model for my MNIS data sets.
And I'm going to do this a lot.
It's a common problem in machine learning to just do this,
changing values, changing code, maybe adding
new data to your training set and seeing how that
affects your model. So in this case, youll can see we
have some epics that are running, we have our loss,
we see how accurate it is, and that's
with this learning rate. So I'm going to stop these training
run and just show you our table real quick.
So, yeah, if you look up
here, you'll see my last run with this
training rate.
Yeah, it's not as great as we
would want it to be, but 83% accuracy isn't that bad.
But now that you've seen kind of how
we make a CNN, how we run, can experiment,
I'm going to switch back over to the
slides and we can go ahead and finish up.
So a few things I hope that youll take away from this.
Make sure that you compare a few different algorithms before
you decide what's best for your application.
So a CNN might be great for most images,
but maybe you find something else that works better for your data set.
It's okay to play around with different things to find what
gives you the best and then take advantage of
what's already out there. You don't have to write everything from
scratch to prove that you're just this great ML
engineer. It's fine. We already know that you're great.
Just take it easy. You don't have to do the
hard math stuff anymore. Use those existing
libraries like Pytorch and maybe even tensorflow
if you want to take the time to learn that. And then when you have
a problem, especially in machine learning,
try breaking it down into multiple steps.
So that's something that a tool like DVC can
really help with is just you're able to reproduce
every experiment you run. So if you're looking at your
model and you're trying to figure out what value you changed
to make it this good,
then you'll have a record of the exact changes
you made to make such a great model.
But that's all I have for you today. I hope you
were able to learn a lot about cnns. And again,
if you have any questions for me, feel free to reach out
on Twitter at flipped coding. Thank you.