Conf42 Machine Learning 2023 - Online

Using WebAssembly for in-database Machine Learning

Video size:

Abstract

Very few WebAssembly examples exist that show how to embed it within a database system. This session will walk-through a code example on how to use WebAssembly with a database system for a Machine Learning application.

Summary

  • Session on using Webassembly for in database machine learning. Akmal Chaudhri will walk you through some steps to show you how we can configure a web assembly development environment. A little bit later at the end of the presentation, I'll show you an actual live demo on the large movie review data set.
  • Single store DB is an example of a real time distributed SQL database system. There are a wide range of tools that we could potentially use with that database system to utilize machine learning capabilities. That's something perhaps for another presentation today.
  • Our focus though is going to be on Webassembly and specifically code engine using Webassembly. Webassembly really helps us to solve a number of problems that we might have with database systems. Let's now walk through the steps of how to set up a local WASM development environment.
  • We're going to use a MySQl client to connect to single store and this is going to be running in the cloud. We're now ready to build the WASM module. And then obviously we need to deploy that into single storedb. Let's take a look at how that works.
  • Thank you very much for attending this session and for watching the video. If you'd like to contact me, please just send an email to team at and just mention my name. Have a great conference, enjoy the rest of your day.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone and welcome to the conference. Thank you very much for attending this session today on using Webassembly for in database machine learning. My name is Akmal Chaudhri and I'm a technical evangelist at Singlestore. And in terms of the agenda today I'll walk you through some steps to show you how we can configure a web assembly development environment on our workstation or laptop, and then walk you through some steps to show you how we can build a function which will upload into a database system and that will be for sentiment analysis. A little bit later at the end of the presentation, I'll show you an actual live demo on the large movie review data set to show you how this actually works in reality. First let's do a very brief introduction to single store, and then we'll talk through a couple of slides and look at the motivations and reasons as to why we might want to use webassembly with a database system. So single store DB is an example of a real time distributed SQL database system. So what we've seen over the last decade or so is the development of these scale out distributed SQL database systems, taking advantage of off the shelf hardware, cluster based cloud based systems and single storedb also offers a unified data engine. So that means it supports both transactions and analytics as well. It is also multi model and so it contains a wide range of choices and possibilities. And NYSM is one of these that it provides support for. And we'll look an example of how we can do that today. When working with a database system, there are a wide range of tools that we could potentially use with that database system to utilize machine learning capabilities. One of these, for example, could be Apache Spark. Lots of great Python libraries out there as well. We could also use many external tools and technologies. OpenAI is one of these very popular ones today, specific to single store, we have this concept of vector functions as well. That's something perhaps for another presentation today. Our focus though is going to be on Webassembly and specifically code engine using Webassembly. And the example we'll look at is using something called Veda Valence aware dictionary and sentiment reasoner. More about that very shortly. Previously we may have built applications for specific platforms. Now we have this concept of write once deploy anywhere. So we build our WASM application and we can run it on the web, on the desktop, mobile, edge, or on the server. So Webassembly really helps us to solve a number of problems that we might have with database systems. For example, writing procedural SQL sometimes is very difficult. Equally, there may be other requirements that we have, such as machine learning, scoring or fuzzy text matching, which the database engine simply doesn't support. And how do we extend its capabilities to meet these requirements? Moving data around is also costly and we would prefer not to do that all the time to send it to the application, keep the code and the data itself close together, colocated in the database engine, and that really gives us great benefits. Another benefit that we get with Webassembly is the opportunity to use existing code that we may have written in other programming languages. We can transform these into WASM modules, have these stored and executed safely within the database engine, and now it extends the capabilities and functionality of the database engine in many different ways that we couldn't achieve before. So in the example that we're going to work through today, we'll see some code written in rust and then we'll build and load that code into single storedb. It exists in a sandbox environment within the database engine and as far as the database engine is concerned, it will treat it just as though it's a UDF. Let's now walk through the steps of how to set up a local WASM development environment. We'll need an SDK as well as some other software and we'll see how to install this and prepare our environment. So this is entirely neutral and can be used with other vendors products as well. So first we'll download the Wazi SDK. Here's the link. And in this particular case we are going to use the installation in the opt directory. So we download the software and substitute wherever path two is for your environment. We unpack and then we make sure that we add the bin to the path as shown here. So here we can see the next two steps. The first one is to install the rust toolchain. There we go. And the second step there the wit bind gen as such. So here we can see the next couple of steps. First one is to add the wasm 32 wazi to the rust tool chain as it's not installed by default. Very straightforward. As you can see, the next step after that is to use this pushwasm tool. And in order to deploy our wasm we need this for single storedb. So very straightforward. Here all we need to do is to clone the GitHub repo change to the pushwasm directory. We build it and then we ensure that we add the path to where the software has been built to our dollar path. So when we call it, it picks it up correctly in our environment and obviously please amend the path to to match your environment there. In case there are some errors, usually lib SSL, then it may be necessary to install this. As such, let's now switch to our rust code. So the first using we'll do is we'll just make a directory in our home folder and we'll switch to that directory. And then within that directory we'll create a skeletal rust source tree using this command. Here's we'll also now create an interface definition file and we're going to call it sentiment table wit and inside that file we're going to place the following code. And what this code says is that we are going to have a function called sentiment table. It's going to take an input string and it's going to produce an output consisting of these polarity scores. And these polarity scores are just above here. As you can see we've got compound positive sentiment, negative sentiment and neutral. The next step is to replace the existing contents of the cargo file. And here you can see we've got a few things that reference back to what we are trying to do. The sentiment table specifically. Also the vader sentiment which we are utilizing here, which has already been converted to rust and that actually does the hard work for us. Now we actually need to provide the implementation of the function in rust. And here is the code. So you can see we are referencing the interface definition file here. We provide the name of the function here, the fact that it takes a string as input and provides the polarity scores as output. And we'll see this in the next slide. So here is the remaining code in our rust file. And here we can see the polarity scores which we referenced earlier as well. So the compound positive sentiment, negative sentiment and neutral. So we're now ready to build the WASM module. So all we need to do is to just go up one directory level and then issue the cargo build command as such. And then obviously we need to deploy that into single storedb. But before we do that we need to create a database. And so here we're going to use a MySQl client to connect to single store and this is going to be running in the cloud. So I'll substitute the value for host here for example, as well as my password. And then once I've done that I can do a create database and I'll just call it demo and just switch into that database as well. Use demo. So I'll exit from the mySQL client back to the terminal window and now I'll use the pushwasm tool to actually load that waSm module into single storedb. So you can see here, it looks like a busy statement there, but let's walk through this. So firstly, prompt for the password. And then here the name of the function that I want to load. Sentiment table, the interface definition file that we created earlier. The location of the actual wasm that was built using the cargo build command. And then connection details. Okay, where is the database system actually running? And again, for this particular example, simply substitute the value of host here it's in the cloud. And you can see here that demo is the database name referenced here, which is what we've created on the previous slide. So once this command is run, and if it's successful, which hopefully it should be, shouldn't be any problems, we should see this message output wisen function was created successfully. So we'll use the MySQL client to connect to our demo database system and then we can do a couple of quick tests. Okay, so the first one here is just calling that function directly as such, select star from sentiment table. And then here is the string that we are passing in. We're saying the movie was great and then based upon that it's returning this to us. So it's giving us the fact that there's no negative sentiment. Positive seems quite reasonable, neutral seems quite reasonable there, and a compound score as well. Now one of the things that Vader can do is it can consider things like capitalization, for example. So for example, just below here, you can see that now in this I've put, the movie was great, but great is in caps and an exclamation mark at the end. And here you can see that the results returned show both the positive going up, the compound going up, and the neutral going down. And obviously negative is still zero there. So let's now actually look at using that function on the large movie review data set and see how that works. So here I have my MySQL client connected to my cloud based database system. And let's take a look at the tables. So in my demo database I have this table called IMDb reviews, which has got two columns. We can see that. There we go, describe. So I've got the sentiment, the text, the review, if you like, of the particular movie stored in the text column. And we've got 25,000 rows here as you can see. And now what I can do is here I have a query. Substring is just to limit the amount of data that it shows. Otherwise it becomes a little bit too much to see. But in and amongst all of this, if we look down at the bottom here, we can see there is our sentiment table function, and I'm going to use that in this query and then just limit it to ten results, ten rows. Okay, so let's take a look at that. So there we go. So that's now been applied to that particular table. So again, we've got the compound, the positive, the negative and the neutral results there. So previously we saw that we were just passing a string in. Now we're actually able to utilize this directly with a table, as you can see, and it has many more practical benefits doing it this way. The other thing we can do is we can ask database system to show us the functions that it knows about. And here you can see that it's reporting to us that it knows about sentiment table, that it's a table valued function, and that the runtime is wasm. Just to summarize our presentation then. So waSm UDFs give just great power, extensibility, the opportunity to really extend the capabilities of the database engine in many new directions. That might be difficult to do otherwise, and we can do so in a variety of different languages. So today we saw an example using rust, but other languages such as C, C Plus plus and others as well are being developed and it gives us near native speeds, so it's very fast. And because it's a sandbox environment, it is also safe. A couple of resources to highlight. So the first one there is an article that I published on medium in October last year, which essentially does a walkthrough of the example that we've covered today. The only difference is the pushbasm tool. The syntax of that has changed slightly since my article last year, but the rest of the instructions you should be able to follow without any problems. And the second thing there is the bytecode alliance. So it's worth checking out what they're up to, some of their blogs and articles. Very useful to keep an eye in terms of development of wasm overall. Thank you very much for attending this session and for watching the video. Hope you found it useful. And if you'd like to contact me, please just send an email to team at and just mention my name and for my attention and it will get forwarded on to me and then I can respond to you directly. Have a great conference, enjoy the rest of your day. Thank you very much.
...

Akmal Chaudhri

Senior Technical Evangelist @ SingleStore

Akmal Chaudhri's LinkedIn account Akmal Chaudhri's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)