Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, everyone, and welcome to this session. History meets AI,
unveiling the secrets of ancient coins. My name is
Nico. I work with the EMEA public sector at Amazon Web
Services. So today we're going to split this session into
two parts. The first part is going to be talking about this unveiling
of the secret of ancient coins. We care going to explore the challenge
that we had at hand, and we are also going to explore the
solution that we came up with. Then the second part of this session is
going to be dedicated to hands on examples where I'm
going to show you some ways that you can build your own application to
solve challenge similar to this. So we're going to be focusing on
three things. Number one is going to be image classification.
Number two is going to be background removal and image segmentation. And number
three is going to be how to build a visual search engine.
So with that, let's dive right into the challenge that
we had at hand. So first, let's start with these.
Why? So the University of Oxford houses 21 million
objects in the collections of its gardens, libraries and museums.
Glam for short. One aspect of their mission is
that they want to preserve these assets and make them accessible to
the world for education and research. But of course,
there are only so many space that you can have for these.
So the organization only has enough space to display
about 10% of its holding at a single time. And there's
an enormous backlog of artifacts still waiting to be cataloged.
So to optimize the access to these collections for digital
teaching and research, Glam asked the question,
can we maybe use machine learning to help us?
If we are successful, that will reduce
the time that a research department needs to identify and
catalog an object. But before we even
think of that, the first thing that we had to identify is a
suitable, well cataloged collection that will become
the prototype candidate. So that candidate was
the roman provincial coinage. Digital collection. This is a
world renowned research project in numinastics.
The team included a curator with previous experience in
developing digital collections from the ground up. This person is
Sharon Meirat. He's the curator for the Herberdin coin room
in the Ashmolian Museum. So the first step in
any machine learning project is to decide what you
want to predict. In this case, Anshanesh Babu, who is
the system architect and network manager
from clan, wanted to predict a very simple outcome.
Heads or tails. That is, is the specimen
that I have in front of me that I'm looking at. This photograph is that
of the overs or the reverse of a coin,
which is another way of saying that given a known training data,
can we have a machine learning solution? Predict the right side of a coin
with a high degree of a crest. So now
that we have the why we want to do this,
let's move into what are the actual things that
we want to solve for. So this is the moment when the Ashmore
Museum came to AWS and together we started discussing
what is a normal day for the
people who are working at this museums, what care, the challenges that they are facing?
What are the limitations and constraints that they have.
So we knew from before, the Asmonian Museum has built the world's largest
digital collection of roman provincial coinage that is open to
anyone to browse online for free. Now,
getting an item into this collection requires expert input
from curators, but these people are highly skilled
and very care, making this task very difficult to
escape. So the way that this works is that, for example,
you may have a multitude of physical specimens and you want
to catalog them. AWS items. Maybe the item that you
want to catalog this for already exists in the collections and
this is just another specimen to that item. Or maybe
the item is completely new to these digital collection. Some cases
you may have all the information available for these specimen,
or maybe in some other cases, you may lack some other
information. Also, something that might happen is that
other research institutions, or maybe even individuals might reach
out to these ashmolian museum with the simple question, look,
we have this item. Do you know what it is? And the
answer to this question is at times very complex
because of the sheer volume of items that need to
be processed. Oftentimes, groups of
people who want to help out. The mission of the university volunteers
to help out with this task. But normally,
because of the way that this is established, in some cases,
even the most simple task cannot be accomplished
by a single person or a small group of individuals.
Right? When I mean task, I mean getting a specimen
and identifying the right item that this specimen belongs to.
So what we wanted to do is not automate this
task, but augment the humans behind it.
Build tools that can support these people who
are working with this every day in a way that they can focus on
more relevant tasks, avoiding, for example,
spending hours and hours rotating photos so that
they are aligned perfectly before they move to the next task.
In this case, these customer objective is to reduce the
time that it takes for the correct appraisal of a single specimen.
Currently, these is estimated between 10 minutes and several hours
for each item. You can imagine that you get an item
and you want to spend some time corroborating
that the information that you have at hand that is available matches the
one that you have in the collection. And if it doesn't match,
then you need to figure out what is that missing
information. And for this, you can have a
multitude of combinations, making this exponentially
difficult when you have items that are not
the standard ones. Right? So the difficult items will require
an enormous amount of time, and normally they
will require an expert who is very
scars. These sense. So let's take deep into what
we are talking about. So, these are two screenshots from
the digital collection. The image that you see on the left are
three items. So three coins.
And you can see that we have the overs and the rivers on the left.
And then we have information on the right, where you
can see, for example, the inscription that is written on the
overs and the rivers, the city that belongs to the region,
these province, even the person that is in the
image. Now, the item that you see on the left, on the right.
Care different specimens to this same item,
right. So you can see how the quality of the specimen varies
in a big way. These, what you can see, care four
photos of the same coins. So you can see
that we have a very high quality on the left, where we can
figure out the text that is written. We can very
easily figure out the person. But then when we look at examples
like the ones that are in the middle, right, in the top and the bottom,
we are having a pretty difficult time discerning what is what in this picture.
So when you're presented with an item like this and you have to match it
to the image on the left, this quickly becomes a very difficult task.
So this comes back to the question, can we use machine learning to
solve this? Why? So now we know the why. We know
the what. Now let's move at how did we
solve this? So these
first thing that you can see is that these images,
right? So let's take the image on the right doesn't
quite look the same as this image.
So this one on the left, for example, this has been taken with a professional
equipment, has been taken without a background.
So you can see that the illumination is very constant. You can see that
this is very high resolution, and there's also
no blurriness, and everything is on focus.
So this is not always the case, especially when we get
images. The Ashmore museum gets images that belong to
individuals or other research institutions who might not have the same
researchers for capturing this information. So some
of the technical challenges that
we can face is that first, the image will be very low
resolution. For example, let's say you take it with a smartphone,
maybe it's blurry or noisy. There is also a
very inconsistent illumination across the image, so some
areas might be darker than others. Also,
the physical condition of the coin might be that the coin might be highly deteriorated,
right? So this will play against actually
finding out similar items. And also, the problem itself is
very hard because we are talking about coins or objects that
are more than 2000 years old in some cases. So in
short, photos that are taken by non museum personnel look
very different than images within the digital collection,
making visual search very challenged. So the way that we thought about
this is that we should first split the task into two.
First, one is let's improve the base image quality and let's
make this coin look as much as possible to
look as similar as possible to the one that we have in these
digital collection. Right? So we want to. For example, in this case, we have
a blurry background, we have a rotation, we have low resolution.
So we want to account for all of those things and create
an image that is very similar to the ones on the right. Once we have
this image, we can extract features out of it
and search in the collection to bring back the most similarly
looking items. So this is an example of
what we're doing here. You can see that we are detecting
the shape of the coin and then we
care coins, all of these activities at the same time,
right? So we are removing the background, we are rotating
these image, and then we are also increased the resolution
of this image. So that way we have the item on the right,
which is more similar than the image that we had on the left.
Once we had this image on the right,
we come back to this metadata that we have also extracted
from the image, right? So we know if the image is heads on tails
or tails with a 95% aggressive. So we know that
in this case, we are looking at the overs of a coin and we can
scan through all the images in the collection,
but only at the overs. We don't need to look at the back aws well,
and we can also use this other
information, like, for example, the material, the region, the city,
the province, and the person who is at the coin to make the
task of identifying this item easier. So with this,
let's move to a very quick demo of what this
proof of concept was. And just
have to say that this demo has been produced more than a year ago.
There is a new open source solution that is in the works.
It's not going to be restricted only to coins, but rather any collections
object that you will have either physical or digital,
say gems or fossils. Any object
with the idea, the care concept that you want to visually search for
similar items inside a collection. If you're interested in something
like this, keep posted to this video. We're going to add
any news that come out. Anything that
is released will be added in the comments below this
video. So with this, this is a web application created
using streamlip and show you.
So the idea for this is that we can interact with these models
that we have created in a way that we can,
for example, upload a picture. This case, the first thing that we
want to do is either choose an example from a library or
upload a picture. In this case, we choose the image that we saw before.
Blurry background, image rotated, low resolution.
So what we want to do is first find a region of interest
out of this image, remove the background, auto rotate it,
and these finally apply some deep blur and upscaling to
the image. So once we have finished this and this is all
happening in real time, you will have this image.
That is the output of this process.
Once we have this, we want to extract also metadata
out of this image, right? So for example,
is the overs, who is the person
that is in the image? What is the material that this coin is made
of? What is the reason that this belongs to? And so on.
Once we have this metadata, we care going to use the features that
we have extracted out of this image. So this is,
for example, the faces, the eyes,
the way that these are placed in the image. We care going to use this
to look inside our collection. And you can see that we are
going to come back with eight results that
are similar to the image that we are looking for. So in
this case, volunteers, for example,
doesn't have to go through thousands of images. They only
have to focus on eight images. And they also have information
that can point them in the right direction. Right. So they have
the region, the person who is in the picture, and also they
have similar items. So maybe when they see an item that
already exists, this is just another specimen, they can quickly
attach this to that one. So what are the benefits of using
aws for dash model? So the first one is that this is very quick and
easy experimentation. They built and deploy eleven machine learning models in about ten weeks.
There's a smaller workload, right. So you can imagine that saving
up minutes of every task in a
pipeline. In the end, when you have a large volume of
items that you have to digitalize they adapt to a lot of time.
In this case, it's estimated that they will save up to three years
of work cataloging a collection of 300,000 coins.
Less time. The coin analysis is expected to take just
a few minutes versus times that are ranging from 10
minutes to maybe hours. And also more value, right?
So this is complementing the work that is already being carried out
by volunteers. This is not automating anything, this is augmenting
the people, the humans who are behind this.
So these are some quotes of this. I thought this project
would be complex and time consuming, but using Aws made it easy.
Another one, this comes from Jerome. Now we
can focus our volunteers on other steps that add value machine learning process
improves the workflow and productivity and adds value for the public.
With this, let's have a look at this very small task
as an example. We want to remove the background of this, right?
And we are going to see in one of the examples how we can do
this actually technically.
And doing this doesn't have to be all
done by yourself. For example, there are some solutions available
in the marketplace that you can use an out of the box,
right, for background removal. And in this case, this one,
for example, at this time you have a price for every API call and you
can just subscribe to this one. So if you have images that you
want to remove this background for, then you will just subscribe
to this API and then just run them through this
service right through this endpoint. Another way that you can do it is of course
you can build your own algorithm. And we're going to see an example
actually out of this, where you pick up, for example,
this data set that is an image segmentation data
set, open images, and you have more than 600 classes
where these segmentation masks are available. Then you
use an algorithm. In this case you're
using mask or CNN. And we can use different machine learning
frameworks, Pytorch, Tensorflow, Mxnet, together with
Sagemaker and different ways of doing training.
So with this you can build also very
easily, you can build your own custom pipeline. These shows us some resources.
I'm going to add these to the description of the video anyway,
but just to give you an idea of other things that you can do,
there is a recent collaboration between hiringface and Sagemaker
that is very useful. It's very robust,
secure, and also I
added some documents and repositories for
deploying your very own web application for machine
learning. So with that, I'm coins to actually change the
focus to the second bit of the presentation.
And I'm going to move to this one. Okay,
so now we want to explore these idea of
building your own machine learning solution,
right? So we want to build the same thing. How can we do,
we want to create these heads versus tails model
the classifier. We want to remove the background and also we want to visually search
these images in a collection. So how can we do it? Okay, so let's
focus first on the first one. Okay, so image classification.
So for that one, I'm going to show you now,
Sagemaker Studio. This is an end to end platform
for machine learning from AWS. In this case, I'm not
going to go into a lot of detail about what thing
does what or anything like that,
but I'm going to show you that there is something called
Shamstat. And what is that? Basically when you click
here, let's take the first one. Right? So model popular image
classification based on. Okay, so that's exactly what we want to do. We want
to build an image classifier. Let's do some more exploration.
Okay, so when we explore it, you see that,
for example, I particularly like this architecture.
Efficient. Net has a very good performance.
And you can see how you have different versions of this
available out of the box. So these one are feature
vector extractor. Right. And we'll get to
why this is important in a second. But just
keep them in mind for now. What we want to choose this is we
want to choose the biggest variation, the b seven,
these most performant, and we want to use these for our
model. So once you click these and you have selected
these model or these,
then we can either deploy
the version that is available without any changes.
This model has been trained with imagenet.
So let's go back for a second. So we have this.
What is this? So this is jumpstart is a repository of
solutions and models that you can quickly
deploy with one click. In this case,
we want to look at vision models and we want to look at
solving the task image classification, right. So we
also have the data set that this model has been trained on
and we know if the model is fine tunable or
not. This case it is. Right? So the same as this
model that we have here. So how can you fine
tune it? Well, you just go here to fine tune
model. You choose the data source and you find your s
three buck. You choose it, choose the directory name where you have it and
then you can choose the instance that you want to use to
train and then the parameters that you want to use and
you will train it. And once this is trained, you can deploy it
as an endpoint and use this model for inference.
So how should you position your data?
So you would have your input directory.
This is the s three bucket that we were talking about before. And these you
will have two folders. First one will be the overs and then
you will have your examples and then you will have the reverse.
Right, an example. And with that you don't have to
do anything else. You can directly train it from this
screen. Once this is trained and deployed,
you can deploy it. And what you're going to see is something like this.
So you can see that this takes around 10 minutes
maybe to deploy or even less than that. This is using
a CPU instance in this case. So you don't have to worry about
GPU or CPU. You can use both.
And you have an endpoint, you have a notebook that will
show you how you can
use this,
how you can use this endpoint,
right? So this one you see that we have two pictures.
In this case we are using the original
model. So the only thing that it has to do is pick up
that this is a cat and a dog. And then you can see here
top five model predictions or tabby, et cetera and so on.
Top five model, et cetera. If you were using your
own model, these classes
will have been drivers and overs,
right. So with that, let's actually move to
the second model that we want to do. So we finish
an image classifier and now we want to move to a segmentation model.
We want to remove the background. So you
can use your own segmentation
model or you can just check other solutions
that are open and available. So this is a website
that I really like, papers with code. The task
that we want to solve for is saliency detection.
And you can see that you also have here
available things like YouTube
net. This one is very successful at detecting background
and removing it. So choosing the most important object in the
image and these removing the background. So this
is also something that you can use with Sagemaker and then deploy
it as an endpoint. Because we want to build something
that is very custom.
We care going to go through a different route and we care going to use
an open data set, in this case these open image
data set. We're going to look for coin, but it can be other things
as well. And you can see that we have here
the segmentation mask and they are available. So we are going to use this
segmentation mask to train our model. So for
this we're going to use this repo that we
have here. I'm going to put these, this is in the links that are available
in the presentation and will be made available, the description of
the video. And I'm just going to walk you through some of the steps
that we want to do this, you want to do here. So we're
going to use custom library that is called Ice vision. This is
built on top of Pytorch and it's on top of Pytorch lightning
and also fast AI.
So it uses both things for training. And at
the same time it has available many,
many algorithms out of the box. So for example,
factor CNN or Mascar CNN. So this is these thing that I'm
going to be using for training in this session.
So the first thing that we want to do is we want to
download that data set and all the
images, but only for the class coin,
right? There are more than 600 classes available,
but we only want to use this one coin.
And 600 are these, these are the 600
like person, piano, et cetera and so on. So we only want
to train this model on coins.
Okay, so the first thing that we do, we download the data and
extract these images and the segmentation mask, we save them
locally and then we convert these annotations because originally
they use one vocabulary for this
annotation and we want to move it to another one. So we move it from
something that is called Pascal to another one that is called cocoa.
Common objects in comma. So once we do
this, we upload the data with this one
line of code, right? We upload the data to a string,
which is our object storage, storage.
And we define what
are the resources that we want to use for training. This case we want to
use CPU instance. So we use this, these p, three,
two, x large. And I'm not going to use a spot,
but you can think about spot as a way for going
into an auction for unused compute capacity.
And you bid for this unused capacity.
Normally the savings range from 60% to 90%.
So this is whatever the on demand price is,
60% to 90% less than on demand price.
And the only caveat that you have is that these resources,
because you are bidding for them, once someone wants
to use on demand researchers, your capacity will be
taken away and given to them. So effectively your training will stop.
The good thing is that all of this is already taken care of on
AWS and you are saving checkpoints as you are
moving on with your training. So in this way, if your training
suddenly stops, for example, once this
compute capacity becomes available again, you can start using
it one more time. So I would recommend you to use
these things because with only three lines of code you
can save maybe from 60% to 90% of the cost.
So once we
have set up that configuration,
we go here and we create something that is
called an estimator, right? And we take our
train script which is this one,
the source directory where everything is.
Let me show you this case. It's only two files,
requires TXT and train and we pass
arguments parameters to these
training shop. So what this is going to do effectively is create a container
new, different from what you're seeing here. Another instance only
for this task and you will only have to pay for the amount
of time that you've been training, not more than
that. So with that you can see that we
create this estimator and then we fit to the data that
we had. So inputs, this is the data that we downloaded and
then uploaded to s three. And after some time
this is going to finish and it's going to tell us that
it was successful. Of course you can also track this if
you go to the AWS console and you
can see the shops here for example, you can see
how much this training took. This is around 22 minutes and we were
charged for 22 minutes. If we were using spot instances
we would have had reduction of
around 70% of the cost in this case. So once this
is finished training we want to deploy this model and
run our predictions. So for that we can use this other example
where what I'm actually doing here is I'm creating a
container but I'm running this model,
right. So you can see all the steps, just want to show you don't want
to stay on the details too much. You can
explore this at your own time. But I just want to
show you the results of this. You can see that the
actual time that it takes for a prediction is quite quick,
right. And the quality is quite
good right. So we have the image on the left and we only want to
pick up one coins. So we pick up the one on the right and you
can see how the background has been removed completely
and the image is clean. So with
that and conscious of time, going to move to the last item today
and that is how can we build a visual search engine.
And for that we care going to follow this blog post
building a visual search application with Amazon,
sagemaker and elasticsearch.
So basically what we want to do is you have, and this is using
an open source data set from clothes fashion.
But of course you can think that you can change these
things, these images, to the images
that you have, for example, coins, right. So what this is going to do is
it's going to run a convolutional neural network against
these images. It's going to extract these feature vectors. And this is
going back to that model, right, that we were talking about,
the Shamstad model. Right. So we
have this feature vector structure.
Right. So we can actually deploy this and
we don't have to do any type of custom modeling. We have the model right
here. So once we have these vectors,
we input all of these vectors into elasticsearch,
and then we do something called k nearest
neighbors search. So we look at the images
that have the lowest distance between them, between the feature vectors
from these images and the reference image
that we have, right. So if you go through these steps,
you will see that clicking here,
launch a stack. This is going to open up this
screen where basically we just create the resources
that you need to run this. So it will create an
S three bucket, it will create a sage maker notebook. And then the only
thing that you need to do is actually, let me show you,
open your notebook that was recently increased,
increased. And you care going to be presented with
this repo, right. And this repo is this one.
Again, the link to this is,
you can find it in the description of the video.
Let's dive right into it. Right. So we have this image,
visual image search. The first thing that we want to do
is get these trend data, right. So this is almost 10,000
high resolution images. In this case. In your use case,
this will be your images. It wouldn't be these 10,000 images, it will
be yours. And you can see
that the first step that we do is we get this
data and then we do some transformations, and then
we upload this data to a string where we will have it.
That will be the location that we are going to read from once we want
to train our model. So once we have these images,
we are going to be using a pretrained model
that comes included in the Keras libraries.
This case, it will be Resnet 50. But like we
were seeing before, you can actually, instead of doing all of these steps, you can
just use the model that we saw before.
Right. Let me go again. So this model, you can just, once you deploy
it, you click deployment, you will be presented with
an endpoint URL. And that is the one that you can use to do
this task. Otherwise, let's continue with this custom
implementation. So we take this Resnet 50
and we want to deploy it as
an endpoint, right. So that's what we do now
we use this piece of script. It's the one
that we are going to be using to pick up the model, load it into
memory, and then run all of these images
through this and only return the feature vectors,
not the actual label, out of it, just the feature vectors.
So that is what we do here.
So in this place we are going to deploy the model
as a sage maker endpoint. This normally takes around 10 minutes.
You can see that I'm using CPU instance.
This case I'm going to deploy only one, but you can
change it if you want this to be quicker, for example. So all
the requests will be routed to one instance.
In this case we are going to be using an example image
and this is the result that comes
back from these input. So these are the feature vectors.
Once we tested that, this actually works. We want to build
this index, right? So we want to first get all
the images, all the keys of these files on s
three. And then we want to basically
process all of those images. We want to get the feature vectors out of all
of those images and we want to upload them or
get them into this elasticsearch index.
So once we have done that, you can see
that that is what we are coins here, we care, importing these features
into elasticsearch. And the next thing that we can do is
now we can do a test. So you see that
we have the first image, the query image here,
and now we say, okay, so bring me back
examples out of your index, bring me back
the most similarly looking images, right? So you can
see that you're only returning outfits
that have all of these patterns. So these are very similar
between each other and the same
thing. We use a different method, but these is the same
result and you can see what this looks
like, right? So in your case, using your own data,
this will be presenting one coin as the reference image,
the input image, and then returning
all of these most similarly looking images
in the collection, right? So the good thing about this application is
that it actually also involves this implementation, that it
also involves deploying a full stack visual search application. So this is great
if you're doing a demo. So what
you will do is these are several steps
for creating the architecture.
But basically once you run all of these steps, you're going to
be presented with an application that looks like this.
And I actually have it running locally here. So you
see that, for example, you will choose how many
items you want to return out of this. And these
you can choose an image,
and you will just submit your shop and then
get the results back. Let me show you, for example,
in here. The way that this will look is something
like this. So, at the end of your experimentation, if you want,
you can delete all of the resources that we created, and then you
will just finish with your
experimentation. You wouldn't have any extra cost out of this.
So, with that, I actually wanted to come back to original
presentation, and I wanted to thank you for staying with us so
long, and I hope you find this presentation useful. Thank you very
much.