Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm Kevin Scott and I'm excited to talk to youll about
upscalerjs, a new open source tool that I wrote for doing
image upscalerjs in the browser using machine learning.
UpscalerJs is built with tensorflowjs and it lets you upscale
images to two, three, or even four x all in your
browser. In this talk, I'm going to discuss a
little bit about how I built it and how you can start to leverage the
power of neural nets in your apps. I want
to start by outlining a use case that I think is really appropriate
for something like this. This use case is inspired by a situation
I encountered at work. Let's say you're working on
an ecommerce platform. Images are critical for attracting
people to products images sell. It could be real
estate, fashion, software. Almost anything performs
better with images. But if you're dealing with user
generated content, you know how difficult it can be to get high quality imagery.
A lot of the time, you take what you can get, which frankly, is not
much, and you design that site with high quality, beautiful images in
mind, and your design looks great. And then you get to deploying the actual
site and suddenly your users are uploading low quality,
pixelated images. It's not their fault. It's probably all that they
have, but it kills the design. So this actually happened to me.
I was working with a team and we put up a site that was extremely
image dependent. It looked amazing in the designs, but when we
actually got to deploying the site, it fell flat. Without high
quality imagery, the design just didn't work. So what's a nontechnical
solution to this problem? Well, I can tell you, because we do this today,
you go back and you ask for better images, and sometimes people can oblige,
but often they can't. The images they've given you is all they've got.
Maybe it's an image screenshotted from a PDF, or maybe it's
an old image and they can't get a better one. Even if they can
get better images, it's a labor intensive ask of your
users to go back and fix their images for you, even if
it's for their benefit. So what else can we do?
Well, there's a whole realm of research in machine
learning called superresolution. The idea is to
take a low superresolution. It look, well,
higher resolution. Can you enhance it? Can we enhance this?
Can you enhance it? Hold on a second, I'll enhance. If you've watched CSI,
you've probably seen this fake technology in action.
This technology is now real, and though it's not perfect,
it's pretty good and it's getting better. One option is
to apply it on the back end. There's lots of
techniques for doing this, most of them are in python.
Applying this technology on the backend has a number of things going for
it. For one, backend code can benefit
from beefy hardware. This lets you run the most accurate,
most powerful models. If getting the most highly
accurate images is important, this is probably the way to go.
Also, a lot of use cases are you upload once,
you display it often. So processing the images ahead
of time, even if it can take a while in processing time.
It's not a big deal because you only do it once.
But there's two big drawbacks that I see to deploying
this on the back end. One, it takes a lot longer to
get immediate feedback. If I'm a user of your site,
I upload an image, it has to go to your server, get processed
there, and then get sent back down to my computer.
There's also the issue of deployment. This can be nontrivial,
especially because so many cutting edge implementations are at
the bleeding edge, with unstable dependencies and changing requirements.
And if your deployment requires gpus,
that could end up being can expensive proposition as well, and hard to scale.
So what about deploying this on the front end? Would that work?
The issues with deploying this on the back end motivated me to explore
whether it'd be feasible to perform upscaling in the
browser using JavaScript. And it turns out that it is. And I'll
discuss some of the technical hurdles and the code in a minute. But first,
let's talk about the advantage of running this technology in JavaScript.
First, that issue of deployment from the previous slide
is gone. Running it in JavaScript means relying
on your user's browser. There's no gpu to provision or keep running.
It all happens on your user's computer, and in fact you
can go to this link right now and upscale can image without
installing anything, not installing anything.
That's a really powerful argument, particularly if you don't have any machine
learning experts on your team. The second big argument
is immediacy. In the back end example, whenever a
user uploads an image, they have to wait for the round trip experience.
They have to upload it to your servers, get it processed,
which depending on your technology, a server might need to be provisioned,
or you may need to wait for a gpu or a lambda spun
up, or any number of other technical issues before you send it back
down the pipe to their computer. If you do it in JavaScript,
it's already there in your browser. No waiting around. It only
takes as long as it takes to do the inference.
The third, and in my mind most compelling argument for
doing this in your browser is bandwidth savings.
In the back end example, we can upscale images ahead of
time, but the image we're sending down is still the full superresolution.
It could be a megabyte or multiple megabytes large.
If you do it on the front end, you can send a smaller image,
sometimes a much smaller image. That's huge. Let's say
you're doing four x scaling. That's an image that is potentially
16 times less file size.
That's a huge file savings. But that's only assuming
that, one, the front end can perform decently fast, and two,
that the image that we upscale looks good.
Can we depend on that? Turns out that mostly we can,
and where we can't, things are getting better. So now, of course,
there are some drawbacks to doing this on the front end.
One drawback is that if you do have those coveted machine learning
experts on your team and those capabilities, front end performance could
be worse, particularly if bandwidth is not a concern.
Maybe your users primarily use desktops, then keeping
things on the back end will probably perform better, and doing so
gives you access to all the latest cutting edge techniques that might not
translate yet to the browser. The second big
concern is that neural networks running on devices benefit
significantly from cutting edge hardware.
The good news here is that consumer companies, namely Apple and Google,
have invested huge sums in increasing the power of their
devices hardware, specifically the ability to process neural networks
on device, what's also known as edge AI.
The downside is that because the improvements are so
significant year after year, it makes the disparity for users running
older devices that much more significant. Some older
devices will just be awful. So if you want a consistent experience,
youll may want to look at superresolution. So while there's
absolutely tradeoffs to be made between back end and front end,
the point is that JavaScript is absolutely a first class citizen
when it comes to applications of machine learning and neural networks.
No longer are you forced into some heavy duty back end
solution. You can run this technology right now,
today in your browser, and in this case in
particular, doing it client side can be a much better choice
than keeping things on the back end. Now, if you're a JavaScript
developer and you're ensconced in the world of JavaScript,
maybe you're wondering how you go about hearing about new machine learning
technologies, how you know whether they're
applicable to your work or whether you can use it. How would you even know
super resolution is a thing unless you happen to see it on
that CSI video? Hold on a second, I'll enhance so I want to briefly
touch on how I became familiar with this research and how you
might leverage a similar strategy to learn about opportunities
that might be relevant to you. The first thing to know is that almost all
machine learning research gets posted publicly and is
accessible for free. This is academic research
papers that can tend to be theory and math heavy and
sometimes pretty hard to penetrate. And this can scare off a lot of
people. It certainly scared me off at first. I don't want to minimize the
importance of fully understanding the research. If you have a deep
understanding of the theory, that can often lead you to novel insights
and new development that's relevant to your field.
But youll don't necessarily need a
full understanding of the technology to use it in your work,
particularly if you're focused on implementing prebuilt models like
we are here, you can rely on others to evaluate the research as
well as implement a lot of the code. For you.
I like to rely on a website called papers with
code. This website lists cutting edge research
organized by topic. You can see the latest papers measured
against metrics. You can also see available code implementations,
as well as information about the frameworks that they're using. In our
specific example, super resolution, there's actually a whole
category dedicated to that research, and we can see the various implementations
wanted. Metrics are a tricky thing
for something like superresolution. Most common metrics are two,
called PsNR or ssIm. They're both
measurements of how different one image is from another.
But as humans, we perceive images differently than a computer does.
A set of pixels that are, say, less saturated but maybe
sharper. That may lead to a lower metric score by
the computer, but a more aesthetically pleasing score for a human.
And this is not just a theoretical concern. At a certain
point, people rate more aesthetically pleasing images as
more similar than the ones the computers measure. And in fact,
for popular metrics, the authors often note that better
performing filters can tend toward a blurry, washed out kind
of look. So metrics are absolutely important, but it's
also important to bring your critical eye to them and consider
your own use cases. For our purposes for super
resolution, we're looking for good accuracy, yes, but not necessarily
the best accuracy. Just as important is that it
be fast and that it be compatible with JavaScript because not
all of the code that we're looking at is the paper
I ended up exploring was something called esrgan and the particular
implementation was this one. You can check out my blog
for more information on how I went about evaluating the different
implementations out there. So with a viable
architecture in hand, we can take the model offered by the author
and see how to make it work in JavaScript.
So we can start off by converting our model. For this example
you can go to a website called Google Colab, which is a
free notebook for running Python code in the browser.
It also offers GPU if you don't have access to one.
And so along the bottom here is a link where you can
run this in your browser. And so here I've set
up a number of cells that demonstrate the
code running and upscaling in
this notebook. This cell in
particular is very important. This saves the model
and not just the weights you can do either.
Built for our purposes, we need the full model
to be saved and converted, otherwise our
JavaScript code won't know how to interpret the
code that we give it. Another thing to note here is
that this highlighted line I found that I needed to do,
and I'm not sure if this is a bug, something that I'm doing wrong,
or if maybe this is something a bug in the software.
But I found that I needed to manually change this
bit of JSON configuration in order to get my JavaScript
code to run. So if you run into a similar issue,
just know that you may need
to run this bit of code in order to get it to run. So once
we have our model saved,
we can zip it up, we can download it, and we can then
upload it in the next step in JavaScript.
Then over in JavaScript this is code sandbox
and here's a link you can follow along in your browser yourself.
So what we can do here is create a folder that'll hold
our model and we can then upload
the files into it.
And there they go.
And there they are. So we can check and it
uploaded correctly. So now on the right is
the panel showing our code running.
This image of a baboon is we're considering it our source image.
This is what we're going to be upscaling.
And so we load our model here,
and the entry point is the JSON file, not the
bin files. The bin files contain information about
the weights and they're sharded to basically enable
caching in the browser. But youll always want to give it the
JSON input here. So on button click. We set up this
function that will start a timer and then do the
conversion of the image into a tensor.
A tensor is sort of the core data structure
that all machine learning works with. You can think of it as a
multidimensional numeric array. And so
we need to convert our image into a tensor
so that we can put it through our model.
So that's what this bit of code is doing is it's
taking the original image and then making it into a
tensor. So then we await the promise of our model
if it hasn't loaded yet. And then we put the tensor through our model
predict function that will return a new tensor,
which represents what it thinks is the upscaled image.
We then put that through this tensor as base 64 function,
which will take that tensor and turn it back into a base
64 source representation, which we can attach
to the image, and we can run it.
And voila, we've got an upscalerjs image.
That's really cool. It worked. We can
see that it took north of 900 milliseconds.
So one thing that's really interesting here is that the
first time takes 900 milliseconds, but subsequent
runs are a lot faster. They take around 100
milliseconds or so. So what's going on here?
So, if you're using tensorflowjs,
you'll want to know about something called warm ups.
Based on how tensorflow JS interacts
with your GPU, the first invocation of a model
tends to be significantly slower than subsequent invocations.
So the way around this is to when
your site loads up, you send some initial dummy data into
your model, and that will warm it up and avoid the
cold start. For this to work,
the image has to be the same size, which will be a particular problem here,
as we probably won't have consistently sized images. And on
top of this, this technique doesn't help the fact that the UI
blocks. So another example is that
we can explore web workers, and they help somewhat,
but they're not a silver bullet. Again, you can check out my blog post.
I go into more technical detail on exactly what's going
on there and why they're not a silver bullet.
So at this point, we've got a working implementation. We're able to upscale
an image in our browser, which is really cool. When I
first got this working, I was blown away that this is even possible,
but it's still pretty slow. And the solution that
we have in place for speeding it up only works if we're giving it
consistently sized images which we can't
really rely on. Plus it's still locking up the GPU,
so we still have a number of roadblocks before we're able to use
this in a consumer app. So what
if instead of feeding our image directly
into the model in its full sense,
what if instead we cut the image into pieces and try
to process those pieces one by one?
If we subdivide our image into sections, we can take a
single long task and break it into four smaller
tasks, where after each one we can release the UI thread.
But we run into a new problem. Now, these upscaled images tend
to have artifacting around the edges, and this is a common
thing that happens with a lot of upscaling algorithms, is that
they introduce this issue of artifacting around the edges. It's sort of inherent
to how they're working. It's generally not very
obvious in a full upscalerjs
image, but when we cut it into pieces like this, it becomes very obvious
to fix it. What we can do is add padding to each of
our chunks. And the interesting thing about this solution is
that going back to the issue of
warming up, we set our images have to be the same size.
So long as we set our patch size small enough,
smaller than the smallest image we expect to receive, we'll always
be able to pass a consistent image in and avoid hitting the warm
up. And I have an implementation at
this link here where you can see
where we're doing this, where we're doing some of the math to basically take
an incoming image and split it into smaller chunks that have a
consistent size that allows us to avoid
hitting that warm up. So there's also other things
we can do if we want to make this run faster in the
browser. We can quantize our model, which means
making it smaller and also easier to zip
as you're passing it down the wire. We can prune our model,
which means dropping poorly performing weights during
training, which makes it run faster. We can also improve the accuracy
of our model by giving it more data or training on a specific domain.
And these are all things that, if you're looking to deploy
this in production, are absolutely worth pursuing.
But the point that I want to emphasize is JavaScript is absolutely
a first class contender for considering can application of this technology,
and it's a contender that, in my opinion is arguably a
better option in a lot of cases than the pure Python
solution. You don't need machine learning experts,
although it probably doesn't hurt, and you don't need to be a machine
learning expert yourself, although again, it probably doesn't hurt.
All the code I showed today is available in upscaler JS,
the open source tool that I built using Tensorflowjs. You can head to
NPM right now and install the package and then run an image
through it, and voila, you've got a working upscaler
in your browser. As I continue to work in this area,
I'll keep improving the package as well as the models that ship with it as
well. Exploring domain specific models
like perhaps face specific models
or illustration models. Imagine the
implications of something like this for video technology. What if
we could reduce the size of a video coming down the pipe by
80 90%? What if we got improved models
that could, instead of upscaling by four x, what if we could do eight
x or 16 x in the browser? Those are
all improvements that are, they're not outlandish, they're very
feasible. I wouldn't be surprised if we see that in the next year
and that's all applicable in JavaScript. That's all technology
that is very feasible to happen in JavaScript. That's really
exciting. That's huge savings that we could be seen in the browser.
I hope you've enjoyed this talk and learned a little bit about
machine learning and javascript today. If you're interested,
you can find me on Twitter, on GitHub, and at
my website where I write and talk about this technology.
Thanks for listening.