Search through your data with Weaviate and out-of-the-box machine learning models
Video size:
Abstract
This talk is an introduction to the vector search engine Weaviate. You will learn how storing data using vectors enables semantic search and automatic data classification. Topics like the underlying vector storage mechanism and how the pre-trained language vectorization model enables this are touched. In addition, this presentation consists of live demos to show the power of Weaviate and how you can get started with your own datasets. No prior technical knowledge is required; all concepts are illustrated with real use case examples and live demos.
Most of all data is unstructured. Additionally, data is often stored without context, meaning and relation to concepts in the real world. This means that all this data is difficult to index, classify and search through. While this is traditionally solved by manual effort or expensive machine learning models, Weaviate takes another approach to this problem. Weaviate is a vector search engine, which stores data as vectors and automatically adds context and meaning to new data. This enables to search through the data without using exact matching keywords. Moreover, data can be automatically classified.
Weaviate is completely open source, has a built-in machine learning model, has a graph-like data model, completely API-based and is cloud-native. Weaviate uses a GraphQL API next to RESTful endpoints to interact with the data in an intuitive manner. Additionally, Python, Go, Java and JavaScript clients are available to facilitate interaction between Weaviate and your applications. GraphQL and client examples will be shown in the presentation.
Summary
-
Weaviate is a cloud native, modular, real time vector search engine. It uses machine learning to understand the data that is in it. You can combine semantic search with traditionally search. It has full crud support for both data and vectors.
-
Weaviate uses semantics and the context rather than exact matching keywords. With VV you can combine these kind of factor searches with traditional scalar search. We can also ask for how certain we feel is to show certain results.
-
You can use weaviate for a big variety of use cases due to the flexibility of choosing your own machine learning model. In this presentation you learned that with the open source search factor engine vv eight you can search through unstructured data. Thank you and see you.
Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone. Welcome to my presentation of the vector search. And in Weaviate,
I'm Laura. I am a community solution engineer at semi technologies,
and in this presentation I will introduce you to our
open source factorsearch engine, Weaviate.
First, let's take a look at data, and particularly about
unstructured data. Unstructured data are forms of data that
are not organized in a predefined manner. Take for
example, big pieces of text. We learned
that 93% of your data stays unused and
is unstructured, and that 80% of businesses don't
know how to use their unstructured data in favor of their business.
Why is this so difficult? What's so difficult about unstructured
data? One thing that is difficult is searching through unstructured data.
For example, to answer business questions. Let me give you
an example, a simple example of searching through unstructured data.
If you want to find information from unstructured text,
you will need to use exact matching of keywords to find an answer.
If you look, for example, for a wine that fits with your seafood dinner,
while wine in your database only tells you that it
is good with fish, you will most likely not find this
wine. If you instead use a vector search
engine like VVA, you can find information in unstructured
data based on semantics. Compare this with Google Search.
If you ask Google a very abstract question, it might
find an answer. The question here, what color of wine is Chardonnay?
Is very abstract. Still, Google search finds exactly
this answer from a particular data node.
So the question is, how does Google find exactly this answer from
exactly this data node? And how can we predict the relation
between this answer and the question that we asked?
And in addition, how can we do this so fast? So the
main question here is, yeah,
what if you could do the same with your own data in a simple
and secure way? So the answer that we case up with is
Weaviate. Weaviate is a database that uses machine learning to
understand the data that is in it. Weaviate is a cloud native,
modular, real time vector search engine that is built
to scale your machine learning models. So as I said,
weaviate is a vector search engine. So first, let's dive in what
vector search actually is. Weaviate stores data as vectors,
which are placed in a space in relation to other data objects.
Machine learning models are used to compute a vector for each data object
and also for each semantic query. So VV eight really
takes to understand your data and your queries in more detail.
This is how weaviate works with a text factorization module.
A pretrained model, for example, a fast text or bird transformer
model can compute vectors from known concepts. This is, for example,
our daily human language. You can add your own data
to a weaviate instance and all this data will be
vectorized using these machine learning models and be placed as
vectors and with their own data object into weaviate.
The data object will then be indexed by the machine
learning models and be placed in the high dimensional vector space.
Then you can perform, for example, a search query which
will also be vectorized using machine learning models of WeAV eight.
For example, let's find a wine that fits with
seafood. Weaviate computes
does its nearest similarity search and find the objects that lies nearest
to your search vector. So the answer that lies closest to the vector of
this question will be returned. With weaviate, you can do the following
tasks with unstructured data. You can search through data,
you can discover answers to your specific questions.
You can classify and label your data
automatically with machine learning models and Weaviate
can predict relations in your database.
The vector database Weaviate has full crud
support for both data and vectors and you can combine vector
search and scalar filters, which means you can combine semantic
search with traditionally search. It has a graphQL and rEsTful
API and weaviate supports multiple
data types like text but also images.
And this is all possible through Weaviate modules.
So modules can be attached to
the Weaviate core vector database to enable the features
I just described in the previous slide. For example,
you can choose to use an image vectorization module
to index images and search through these images, but you
can also attach a question answering module. You can also attach any
transformer NLP model or you can even attach
your own machine learning or NLP models. So this
allows you to use Weaviate really for scaling your own machine learning models to
a production scale as well. So now let's move on to a demo.
I will use a data set of news articles for this demo
and you can also find this demo data set running
on our website. You can go here
from any code example, a query example.
So over here there's a really simple question
or query to first just see
what kind of articles we have in this data set. So I can perform a
get query to get all the articles. And here
I just want to see their titles, their URLs and their word count.
And here I get a list in random order of
all the articles. So now of course,
nothing special or nothing magic is happening here.
And just to show you that it is a vectors database, you can also
query the whole factor of a data object.
So you get the long list of vectors.
So now let's only show the title. And as
I said, this is just scalar search.
So I don't do any machine learning magic here.
But now let's take a semantic
filter. So for example, let's see if
the data set has any articles regarding housing prices.
I can perform a near text query and this filter
is added by a specific text factorization module.
So let's see if there are articles about housing
prices. And you can see this is very abstract question.
In here, the list of articles is ordered to the relevancy,
to the search query. So we have for example
something about housing becoming
the biggest asset class, something else about housing prices,
expensive housing, et cetera. And note that the
query using prices is not
an exact match of any of the words here
in the title. So here you can see that weaviate
uses semantics and the context rather than exact matching
keywords. We can also ask for
how certain we feel is to show certain results.
This is called certainty. So here you see that the first result is
around 87% certain
that it is matching the search query. And then we
can also make a filter based on this. So now
I will show only results that are above
80% certain. Now you can see that this is
a very abstract query and we can also make this a bit more concrete.
For example, to see the prices of
houses in Greece,
this is a bit more concrete. And you can see that there's only one result
returned now because we made the query more concrete and
this is about Ethene.
So yeah, we can see that weaviate matches Greece here
with its capital without saying anything about
Greece. So as I said, with VV you
cannot only store factors, but also it stores the
whole data object. So this means you can combine these kind
of factor searches with traditional scalar search.
And yeah, for example, I will show you
this, I will add some properties
first. So we have for example each article
appearing in a publication.
So we can see that this article for example appeared in the
Financial Times. This is a graph relation in
the database. And now I can takes a
scalar filter combining with this already
existing factor query. So this
is a rare search.
So now we can see I'm querying for using prices again and I want
the result to appear in this publication, the Economist.
So let's see now. So this first
result, it's 87% sure this
is the title. And we can see that Pearson economist we
can also ask questions to weavy eight or
to the data in Weaviate if we have a question answering module
available or enabled. I can
also show this so I will remove the previous filters so
I can ask a question also in a filter.
For example,
what was the monkey doing in the
neura link video?
And I will not query the certainty here, but I
want to see the answer.
And the answer here is he
was playing mind pong. So if we limit the
result to one and we
also ask for the summary, this is the
summary of the article. So the
result is that monkey was playing Mindpong and the
answer was found somewhere in this whole summary,
which is of course the bit of unstructured text that we have here.
Okay, so now let's go back to the presentation.
So you can use weaviate for a big variety of
use cases due to the flexibility of choosing your own machine learning model
or also keeping it very general. So this was my
presentation. Thank you for listening and watching.
In this presentation you learned that with the open source search
factor engine vv eight you can search
through unstructured data and in addition you can use to
bring your own machine learning models to production skill. Thank you
and see you.