Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone and welcome to the conference. Thank you
very much for attending this session today on using Webassembly
for in database machine learning.
My name is Akmal Chaudhri and I'm a technical evangelist at Singlestore.
And in terms of the agenda today I'll walk you through some steps
to show you how we can configure a web assembly development environment
on our workstation or laptop, and then walk you
through some steps to show you how we can build a function which will
upload into a database system and that will be for sentiment analysis.
A little bit later at the end of the presentation, I'll show you an actual
live demo on the large movie review data set to show you how
this actually works in reality.
First let's do a very brief introduction to single store, and then
we'll talk through a couple of slides and look at the motivations and reasons as
to why we might want to use webassembly with a database system.
So single store DB is an example of a real time distributed
SQL database system. So what we've seen over the last decade
or so is the development of these scale out distributed SQL
database systems, taking advantage of off the shelf hardware,
cluster based cloud based systems and single
storedb also offers a unified data engine. So that means
it supports both transactions and analytics as well.
It is also multi model and so it contains a wide
range of choices and possibilities. And NYSM is one of
these that it provides support for. And we'll look an example of how
we can do that today.
When working with a database system, there are a wide range of tools that we
could potentially use with that database system to utilize
machine learning capabilities. One of these, for example, could be Apache
Spark. Lots of great Python libraries out there
as well. We could also use many
external tools and technologies. OpenAI is one of these very popular ones today,
specific to single store, we have this concept of vector functions as
well. That's something perhaps for another presentation today. Our focus
though is going to be on Webassembly and specifically code
engine using Webassembly. And the example we'll look at is
using something called Veda Valence aware dictionary and sentiment
reasoner. More about that very shortly.
Previously we may have built applications for specific
platforms. Now we have this concept of write
once deploy anywhere. So we build our WASM application and
we can run it on the web, on the desktop, mobile,
edge, or on the server.
So Webassembly really helps us to solve a number of
problems that we might have with database systems. For example,
writing procedural SQL sometimes is very difficult.
Equally, there may be other requirements that we have, such as machine learning, scoring or
fuzzy text matching, which the database engine simply doesn't support.
And how do we extend its capabilities to meet these requirements?
Moving data around is also costly and we would prefer not to do that
all the time to send it to the application, keep the code and
the data itself close together, colocated in the database engine,
and that really gives us great benefits.
Another benefit that we get with Webassembly is the opportunity to use existing
code that we may have written in other programming languages.
We can transform these into WASM modules, have these
stored and executed safely within the database engine,
and now it extends the capabilities and functionality of the database engine
in many different ways that we couldn't achieve before.
So in the example that we're going to work through today, we'll see
some code written in rust and then we'll build
and load that code into single storedb.
It exists in a sandbox environment within the database engine and
as far as the database engine is concerned, it will treat it just as though
it's a UDF.
Let's now walk through the steps of how to set up a local WASM development
environment. We'll need an SDK as well as some
other software and we'll see how to install this and prepare our environment.
So this is entirely neutral and can be used with other vendors
products as well.
So first we'll download the Wazi SDK.
Here's the link. And in this
particular case we are going to use the installation
in the opt directory. So we download
the software and substitute wherever
path two is for your environment.
We unpack and
then we make sure that we add the bin to the path
as shown here. So here we
can see the next two steps. The first one is to install the
rust toolchain.
There we go.
And the second step there the wit bind gen
as such.
So here we can see the next couple of steps. First one is
to add the wasm 32 wazi to the rust tool chain
as it's not installed by default.
Very straightforward. As you can see,
the next step after that is to
use this pushwasm tool. And in order
to deploy our wasm we need this for
single storedb. So very straightforward.
Here all we need to do is to clone the
GitHub repo change
to the pushwasm directory. We build it and
then we ensure that we add the path
to where the software has been built to our dollar path.
So when we call it, it picks it up correctly in our environment
and obviously please amend the path
to to match your environment there.
In case there are some errors, usually lib
SSL, then it may be necessary to install this.
As such,
let's now switch to our rust code. So the first
using we'll do is we'll just make a directory
in our home folder and we'll switch to that
directory. And then within that directory
we'll create a skeletal rust source tree using
this command.
Here's we'll
also now create an interface definition file and
we're going to call it sentiment table wit
and inside that file we're going to place the
following code.
And what this code says is that we are going to have a function
called sentiment table. It's going
to take an input string and
it's going to produce an output consisting of these polarity
scores. And these polarity scores
are just above here. As you can see we've got compound
positive sentiment, negative sentiment
and neutral.
The next step is to replace the existing contents
of the cargo file. And here you can see we've
got a few things that reference back to
what we are trying to do. The sentiment table
specifically. Also the vader sentiment
which we are utilizing here, which has already been
converted to rust and that actually does the hard work for
us.
Now we actually need to provide the implementation of the
function in rust. And here is the
code. So you can see we are referencing the
interface definition file here.
We provide the name of the function
here, the fact that it takes a string
as input and provides the polarity
scores as output. And we'll see this in the
next slide.
So here is the remaining code in our rust file.
And here we can see the polarity
scores which we referenced earlier as well.
So the compound positive sentiment,
negative sentiment and neutral.
So we're now ready to build the WASM module.
So all we need to do is to just go up one directory level
and then issue the cargo build command as such.
And then obviously we need to deploy that into
single storedb. But before we do that we need to create
a database. And so here
we're going to use a MySQl client to
connect to single store and
this is going to be running in the cloud. So I'll substitute the
value for host here for example,
as well as my password. And then
once I've done that I can do
a create database and I'll just call it demo and
just switch into that database as well. Use demo.
So I'll exit from the mySQL client back to the terminal
window and now I'll use the pushwasm
tool to actually load that waSm module into single storedb.
So you can see here, it looks like a busy statement there,
but let's walk through this. So firstly,
prompt for the password.
And then here the name of the
function that I want to load. Sentiment table,
the interface definition file that we created
earlier. The location of
the actual wasm that was built using
the cargo build command. And then connection
details. Okay, where is the database system
actually running? And again, for this particular
example, simply substitute the value
of host here it's in
the cloud. And you can see here
that demo is the database name referenced
here, which is what we've created on the previous slide. So once
this command is run, and if it's successful,
which hopefully it should be, shouldn't be any problems, we should
see this message output wisen function
was created successfully.
So we'll use the MySQL client to connect to
our demo database system and then
we can do a couple of quick tests. Okay, so the first one here is
just calling that function directly as
such, select star from sentiment table.
And then here is the string that we are passing in. We're saying the movie
was great and then based upon that it's
returning this to us.
So it's giving us the fact that there's
no negative sentiment.
Positive seems quite reasonable, neutral seems quite
reasonable there, and a compound score as well.
Now one of the things that Vader can do is it can consider things like
capitalization, for example. So for example,
just below here, you can see that now in
this I've put, the movie was great,
but great is in caps and
an exclamation mark at the end. And here
you can see that the results
returned show both
the positive going
up, the compound going up, and the neutral going
down. And obviously negative is still zero there.
So let's now actually look at using that function on
the large movie review data set and see
how that works.
So here I have my MySQL client connected
to my cloud based database system.
And let's take a look at the tables.
So in my demo database I have this table called IMDb
reviews, which has got two columns.
We can see that. There we go,
describe. So I've got the sentiment,
the text, the review, if you like,
of the particular movie stored in
the text column.
And we've got 25,000
rows here as you can see.
And now what I can do is here I have a
query. Substring is just to limit
the amount of data that it shows. Otherwise it becomes a little bit
too much to see. But in and amongst
all of this, if we look down at
the bottom here, we can see there is our sentiment
table function, and I'm
going to use that in this query and then just limit it to
ten results, ten rows. Okay, so let's take a look at that.
So there we go. So that's now been applied to
that particular table. So again, we've got
the compound, the positive, the negative and the neutral results
there. So previously we saw that
we were just passing a string in. Now we're actually able to utilize
this directly with a table, as you can see, and it has
many more practical benefits doing it this way. The other thing
we can do is we can ask database
system to show us the functions that it knows about.
And here you can see that it's reporting to us
that it knows about sentiment
table,
that it's a table valued function,
and that the runtime is wasm.
Just to summarize our presentation then. So waSm UDFs
give just great power, extensibility, the opportunity to
really extend the capabilities of the database engine in many new
directions. That might be difficult to do otherwise,
and we can do so in a variety of different languages. So today we
saw an example using rust, but other languages such as C, C Plus plus
and others as well are being developed and it
gives us near native speeds, so it's very fast. And because
it's a sandbox environment, it is also safe.
A couple of resources to highlight. So the first one there is an article
that I published on medium in October last
year, which essentially does a walkthrough of the example
that we've covered today. The only difference
is the pushbasm tool. The syntax of that has changed slightly
since my article last year, but the rest of the instructions
you should be able to follow without any problems.
And the second thing there is the bytecode alliance. So it's worth checking out
what they're up to, some of their blogs and articles.
Very useful to keep an eye in terms of development of wasm
overall.
Thank you very much for attending this session and for watching
the video. Hope you found it useful. And if you'd like
to contact me, please just send an email to team at
and just mention my name and for my attention and it will get
forwarded on to me and then I can respond to you directly.
Have a great conference, enjoy the rest of your day.
Thank you very much.