Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, everybody. Welcome to my vic talk. I'm Daniel.
I used to be a tech lead at YouTube for seven years, and now I
work in the vector ops space with superlinked.
Let's just dive right in. So,
in today's talk, we'll cover the three most important questions.
Why are vectors useful? Why use them to
make your app better? We'll talk about which
features of your app benefit from being vector powered,
and then finally, we'll talk about actually
building a vector powered feature of your app in a
way that makes it easy to put it into production
in a reliable way. And finally,
you unlock the benefit for your end users, which is why
we are doing this. We are using new technology to make your app
better. So let's go into the why first,
though. Because I think vectors can be not so intuitive,
and it helps to have a mental model for why
they are different from perhaps other representations of data that you
have used before.
So we have to kind of start from,
really the basement here, and first look at when
humans adopted language. We kind of lost something in that process.
And I'll try to illustrate that with a few pictures here.
So, if you look at this slide, you see, okay, on the left side,
we have kind of a famous landscape picture for the Windows
fans, but there and then on the right side, we have a
representation of that picture in the RGB
values of the individual pixels of that
photo, right? And we have, let's say, about 1 million pixels
in there, and each is defined with these three values. So we have a lot
of data. We have a book worth of
data in there that represents what each of these pixels
looks like. Right? Now,
a human would translate this picture into
words, right, in order to communicate with others or kind of
remember it. And so this is very efficient
on the surface, right. We would say, let's say field of grass to describe
what this picture contains. And then this is just
three words, right? So we went from 1 into something
very short. So this looks pretty efficient,
right. However, the problem
is that when
you say field of grass, you may mean lots of
different things. We have lost the resolution of
similar pictures being called the same thing.
All of these different photos kind of collapse to those three words,
right? So let's say I did this experiment with
mid journey, and I took field of grass and put
it in as an input, right? And I got this field of
grass, right. Looks very different from where we started.
So then you might want to sort of start refining that, right?
To kind of regain back the resolution that we lost by going
from 1 words. So we start to specify more words,
right? So we say field of grass on a summer day that kind
of looks a bit closer to what we had before, but still worlds apart.
And so we can kind of keep adding more and more words to try to
regain that resolution. And after a
while we get to a field of
grass on a summer day, a few clouds in the blue sky,
and it's slightly hilly, right? And it sort of like gets somewhere,
but it's still a very different picture.
So on the left we have the original, and on the right we
have the kind of approximate reconstruction from those
16 words or whatever that was. And in this
demonstration we see how the natural language is a bottleneck of
communication, right? It's a very imperfect
way to describe stuff, and it misses the nuance,
right? It's very hard to describe small changes
in the input in a way that is preserved in the output.
So this is a problem of language, right?
It's not only hard to reconstruct what we described,
but also the descriptions are ambiguous,
right? All of these three images are
constructed from those same 16 words,
field of grass, hilly, summer day and so on.
So basically what I'm saying is words are
not very useful to represent information in
terms of, for computers. And it's not just
the words, it's any sort of structured data that you
are using to represent stuff out there in the real world,
right? Once you go from audio recordings that are analog
recordings and pictures, and you try
to represent it with some structured information, you'll just basically
lose a bunch of the resolution there.
So that makes those kind of structured representations not very useful
for actually exploring the different concepts
and for example, training machine learning models.
And here is where we introduce vectors, basically because
vectors give us the resolution that we need, right? So just
imagine we have a set of two dimensional points
on the xy axis, and we have some sort of
function that given a picture,
outputs xy coordinate, right?
And we feed the pictures that we have been
looking at into that function. And that function can
represent similarities and differences in that space
of different pictures of a grassy field.
And this kind of resolution and
nuance is what words are, or any
sort of structured data is basically lacking.
And this is why a vector representation is kind of closer
to the truth in a way. And so,
just to summarize, the reason vectors are useful
is they are expressive, right? They allow you to capture
differences between things that you cannot otherwise express.
They're smooth. So things that are close
by in the vector space are kind of gradual,
they are similar items that gradually get less and less
similar as youll go further in that space. And so it's smooth,
which means you can explore it, right? You can kind of navigate, you can get
closer to certain parts of the space,
and this makes it very useful for any sort of machine learning that
is basically doing the exploration, right.
The downside is that vectors are very difficult to work with because
they are just basically a list of
numbers, right? And when you print out a vector,
it doesn't really tell youll anything, right?
So that
sort of answers the question of why vectors, right? They're useful representations
of data because of those properties.
Now, in terms of what youll can build with vectors,
obviously the world of surgeon recommendations existed
before this kind of latest discovery
of the transformer models that help us generate these vectors
and these whole new hyped with vector databases. Obviously the search
and recommendation space has existed for decades.
So let's just have a look at how that space is doing sort
of pre this type of technology,
and let's look at search and recommendations specifically. So before
using vector representations to build search interfaces,
it was all about keywords, right? So you
had some sort of natural language processing pipeline that tried
to normalize the keywords in your documents and
in your search queries, try to perhaps expand them to
synonyms and do all kinds of precomputation
and expansion. So you had a bunch of natural
language processing functionality in there.
Then you had some sort of index that, given a
set of keywords or a keyword, pointed you to documents
that contain those, right? So that
was the core aspect that powered your search.
So that's why it's really important to get those keywords right and
normalized and processed. And then finally,
these types of systems are very much fine tuned
by hand, right? Observing queries that you haven't seen
before, trying to figure out how to tweak your keyword based
rules to get good results. And this has been
happening for decades. And the outcome
is that it still kind of doesn't really work that well.
And here I have a Walgreens I could have used any example.
So this is an online pharmacy basically, and I type in splitting headache,
and I'm basically getting one result, which is essential oil
and no pain medication, nothing for
migraines, right? Because I used keywords that
are not exactly matching the results. And so
a keyword based system gets it
wrong. Right? Now,
how is this looking in the recommendation side of the problem? Because surgeon
recommendations are very much linked, right? This is kind of two sides
of the same coin. So I'm sure you have been
to LinkedIn and you have seen your jobs recommendations,
and this is a screenshot of mine and they're particularly funny.
So I not so rarely
get recommendations for intern jobs after being in the industry
for 15 plus years.
And then also some sort of stealth, like this stealth
company. It's kind of a meme. If you search for
stealth on LinkedIn, you'll see that there are tens of thousands of people
working at stealth and this entity shouldn't
really be recommended. So basically we still
get recommendations wrong, same as
the search. The problems are somewhat
related, but actually recommendations have their own pile of
problems. Let's just look at a few of them. So how would you build
a recommender system? Before vectors, you would
try to combine two aspects of your data, right? You would have
content features. So you try to have metadata for your content that
help you understand what that content is about. You would
also have that for your users, right, stuff they tell you during sign
up or any
metadata that they create about themselves. So that's
the content features. Then you would have collaborative features, which is
features based on users behavior. So kind of similar,
users liking similar things, type of signal,
right? And you would sort of build those features and you would build the content
features and you would try to marry them together in
a model that responds well if you
don't have enough behavioral data for the user responds well if you don't
have enough content features. And kind of lets best of
both worlds. And usually this work is quite custom,
right? So companies do this in house with a lot of
effort and time after
being able to marry those two aspects. Then a recommender engine typically
has kind of two steps, right? So first you
use some sort of feature
store, some sort of database to retrieve candidates that roughly
match what you think would be a good recommendation,
right? But because it's a RaF kind of search,
kind of broad strokes, type constraints,
you have to retrieve a lot of candidates, right, because youll know that the candidates
will be low quality. So for the content you want to recommend, you kind of
take a broad brush and somehow fetch 10,000 candidates
for that recommendation that you hope that
some of them actually will make sense. So that's the retrieval typically
then, and then you have the ranking or the scoring,
right? So you have a model that given a candidate piece
of content and the user predicts the
probability that this user will actually like the content or will
engage with it, or let's say if it's a job that they'll apply,
you have this model that you run over all the candidates,
right, 10,000 times. And then you sort the candidates by
this score and then finally find three
jobs, let's say, out of those 10,000 that have the highest score,
and then those you return to the user, right. This is
a project that this whole system takes months to build,
typically is done in a custom way, and even
for a very large company, it can still miss
terribly. Right. So basically,
surgeon recommendations still an open problem. And this
is the before vectors world. Obviously, I'm not saying with
the vectors it's a solved problem, but we have some new approaches
to tackle it and see what we can do
there. Right. So let's look at the kind
of new world. How do you build certain recommendations with
vectors?
If, let's say you want to treat your vectors as just features
that you extracted, and then build the
old school stack with retrieval and ranking on top,
that's fine, that works, right. What I want to look at here is
new ways that you can use the fact that we can actually
index these vectors and find similar vectors really
fast. Right. So basically, we'll kind
of focus on that aspect of the problem.
And if you look at the stack of this kind of
new generation surgeon recommender system,
we'll look at three aspects of it. So how to basically
strengthen the retrieval part, given that we can do the nearest
neighbor search. We'll look at representations of
content and also users with vectors
that allow us to do that. And then we'll talk about maybe
having a component on top of this that will be much simpler
and faster than the normal kind of ranking that
sits on top of the retrieval. We'll talk about probably still
a need of having that component, but it's much simpler.
The benefits of this is that the
whole stack is much simplified because you basically just have the retrieval.
The results are, you don't rely on this
kind of candidate retrieval up front, which probably misses a
lot of the stuff you actually wanted to recommend. And then ranking
this actually is, in terms of recall,
not ideal, right, because what if you missed something out there,
even though you retrieved the 10,000 items? So in that sense,
the approximate nearest neighbor search is closer
to a global optimum. Right, because you are basically indexing the whole
database. You are not just running your
ranking on some 10,000 candidates.
So in that sense it's better and then finally faster,
right, because you are not doing ranking, you are not running some big model on
top of the candidates, you are not fetching 10,000 candidates from a database.
So you push everything into the approximate nearest neighbor index.
And this is then much faster to
actually generate the recommendations, which if there is a user waiting
for the feed to load or an ad to serve,
or maybe you have a bot detection use case where if
there is a misbehaving user, you want to catch them as fast as possible
before they do damage. The speed really matters.
Right? So this is kind of the basic setup.
And let's dive into each of these three aspects
of a vector powered surgeon recommendation system.
All right, so first of all, I mentioned
approximate nearest neighbor search. So what the hell is that?
Right? We have seen on some of the previous
slides this sort of two dimensional space with the pictures
and similar pictures were closer to each other. And then kind
of an autumn looking grass field was a bit further
apart. Right? So the purpose
of the nearest neighbor index is to
help us find similar pictures or similar vectors
in the space really fast, right?
And we have to talk about what nearest means and what fast means,
right? Those are the important aspects here.
So in terms of quantifying
the difference between vectors, normally what we use
is actually we just look at the angle between the vectors,
right, because that's kind of the distinguishing feature.
This obviously depends on the model that you use to produce,
given a piece of content, the vector representation, right? This model
has certain properties that, for example, say that
the distance should be scale independent, right? So if there are two
vectors pointing the same way, they just have different lengths, they should
be basically considered the same. And then this
translates into us using the angle between vectors as
the distance measure, right? So let's say a cosine similarity.
So that's what we mean by nearest, right? And then
fast, basically, just imagine that kind
of the current, there is a bunch of vector databases out there.
There are benchmarks, you can review those.
But rule of thumb is that you can do thousands
of queries per second per machine for tens of thousands
of vectors in the index, right? So this is now technology
that's maturing that can store a bunch
of vectors on each node, and then you can kind of share this and you
can get to millions and even billion vectors, right?
So that's definitely possible. And there is a bunch of progress happening in
this kind of approximate nearest neighbor index layer.
And what we, the production builders, are interested
in that want to build on top of the vector databases is
the question of, okay, how do I take my problem search and recommendations
and how do I express it as a vector search,
right? Because then the vector search, there is a bunch of databases out
there that can help me with that. So let's
talk about that. But before we get there, I kind of want
to reflect on something that's happening in the space, which is
these benchmarks, right? So typically
when you identify a metric that is easy to
measure, like a query per second benchmark,
everything will kind of coalesce around the metric, right? So right
now there is this explosion of different vector databases,
and somehow there is a lot of emphasis on the speed.
When you choose a vector database to work with, I wouldn't focus
so much on the speed, right. You just need good
enough, basically, just to give you an
idea, this chart basically shows
the recall on the x axis. What do we mean by
recall? This is approximate
nearest neighbor search, right? So if you look at a data set of 10,000 vectors,
then you look at the actual 100 nearest neighbors for
each vector. The recall tells you that
for this index, it was able to retrieve
within the first hundred closest neighbors certain
percentage of the actual closest nearest
neighbors, right. Because it's doing approximation. So it's going to miss some,
right. What you care about in surgeon recommendations use cases
is recall around 80, 90%. It depends really on
your use case. So you kind of look at this area here,
and then y axis is queries per second, and here
we have 1000 queries per second. And here we
have 10,000 queries per second, right? So in this region,
and these are 800 dimension big
vectors, right? Which is kind of a
good rule of thumb. You want your vectors to be up to, let's say 2000
dimensions for good healthy system.
All right, perfect. Okay, we talked
about how a search and recommendations use
case can make progress by representing
your content and users as vectors and then doing
this vector search that basically for a given user finds
the nearest content. Now the devil is
in the details, obviously, and it's in how you construct
these vectors, right? So we are doing vector search,
nearest neighbor search, but how are we constructing the vectors that we are
doing it with, right?
And so here is how basically, right.
What you want to do is youll want to capture the aspects
of the content and also of the user that
are relevant for this trade off of what the user
then sees in your app, for example, right?
So let's talk about a use case with social
network. In this use case,
there is a feed of content and you care about certain properties.
As a user, you have certain expectations of what you'll see
in your feed, right? You expect it to be sort of roughly chronological.
Let's just imagine LinkedIn, right? I already gave the example
with the jobs recommendation. So let's talk about the LinkedIn
feed, right? You kind of expect it to be roughly
recency or kind of age of the content ordered,
right. You expect that it's going to contain
content that is topically relevant for you, right?
So it's some sort of combination of the content that you liked
or engaged with before. You also expect that
the platform will recommend you content that
is interesting for users that behave similar to you. This is the
collaborative aspect of it, right? And there is maybe
some measure of quality,
maybe certain users are more sensitive or less
sensitive to quality of the content. And so youll want to capture
this as well. Now, this is nothing new.
Like in past, you would have these features extracted, youll would assemble
them, you would do some filtering on them to generate the candidates,
then ship it to the ranker. There you go. Right?
All build custom in house recommender engine staff,
right? This is normal. The new thing with vectors is that
you can actually take all these features and
literally just concatenate them into the content vector,
right? So you kind of normalize each of these vector parts and
then you literally just concatenate them together.
And this allows you to do cool stuff later.
Right. It allows you to do a search that actually balances
between these different objectives and it completely
offloads that navigation of the trade off to
the nearest neighbor search, right. Because in the end of the day, you just
have one vector and you just do nearest neighbor search on it. But it's assembled
from these different parts that are important
for your specific use case. So that's the content
vector construction. And then similarly,
in the same vector space,
you do that for the user as well. Right. So the user
will have some topical preference. Right. So some representation
of the content they liked before in terms of topics,
they'll have some popularity preference, right? Is this user
mostly interested in sort of the highest popularity
content or are they kind of venturing into not
just the most sort of hyped content, but also other parts,
how well they tolerate quality,
let's say degradation, right? And this might come from moderation
signals and so on. And then also the recency preference,
right? Are they after sending only the
most recent kind of news type of stuff, or are they happy to
venture broader into the catalog? And they are
kind of more driven by the topic, lower popularity measures
than recency, right? So basically you
can again represent all these different preferences
of the user into one vector that
also has those parts like we did for the content,
right? And that means that they are aligned. And now you can
basically just take the user vector and search with it into the
content space, right. And then that's
your retrieval. Basically what you can also do
with this, I have this node here in the bottom right corner is user
to user search, right. You can discover users that behave similarly.
This is useful for lookalike audiences
for bot detection, for recommending
people to each other to engage. Right? I mean,
dating apps, obvious use case. So you have a bunch of
opportunities on that front. But the core idea is that I'm
extracting different features of both the user and the content and
I'm staffing them all together into one vector, I'm putting different
weights on them, I'm normalizing them so they can be combined in
this way. And that gives me the basis for my
content and user vector representations.
Here is just a representation of how then we use these vectors.
So if then I have a user coming in and I
want to generate a feed of content for them, I literally
just take the user vector and I search in
the content space and then this is the content that will come up.
And the key thing to understand that's kind of
different from just these very basic overviews of vector
powered retrieval that you see online.
Typically people just use one model,
right? They would just have the relevance part, for example,
right? So you use some semantic embedding
model and you just do the
semantic embedding and then if you visualize it,
you'll have kind of topically
similar content clustered together, right. This is
what you see out there.
But what people are kind of realizing now is that you
can blend all those other features in there, right? So this
is not just kind of topically relevant content
for user two, but it's also very
recent content. And then the user two likes that. And so that's
why they are closer together, right? So there are all these other things
that can be expressed in the space.
Obviously here is kind of a projection of that space into two d,
so it's hard to visualize that,
but that's kind of what it's doing in its original 1000
dimensional space, is kind of navigating all
those different trade offs. And then you can let
the approximate nearest neighbor index actually do the heavy lifting,
right. All right, cool. So that's the retrieval
step, right. That covers the okay for a given user.
Here is a bunch of content that sort of really matches
their preference. Now there are still
some aspects of this recommendation and search problem
which can't quite be represented
or as easily represented with the vector
embedding. And for this, you might want a
module on top of your retrieval that is managing the queries.
Right. And it's sort of managing,
it's potentially manipulating the search vector, right.
So user is coming in, I grab their user vector,
I tweak it and then use it to search into the content
space. This tweak could, for example,
turn a recommender engine into a search engine because
I will basically just get a search query for the user.
I will create a semantic embedding of the query
and I will use it to overwrite or augment
the topical preference of the user. Right. So I'm
doing personalized search out of the box because all the other
preference aspect of the user are still in the user vector.
And I'm now also putting in the current context of,
all right, this user is searching for splitting
headache or whatever example we saw before.
So that's kind of search vector manipulation. That's how we can build the
search on top of the same idea. And then
other aspect of this is, let's say you want to improve
the diversity of authors for the content that you recommend.
This is sort of difficult to express
as a set of linear constraints. And so you might want
to have this query manager on top issue multiple queries, for example,
into different clusters of the
author users for the content, and then assemble the result together.
Right. So this would be sort of kind of like
creating multiple searches from that one initial search to satisfy
some additional constraint around recommendation
variation, diversity, that sort of thing.
And then finally it's an approximate system,
it's approximate nearest neighbor retrieval. So you can't
right away give a guarantee that there won't be something weird
in the result set. To actually have guarantees, you need to
post filter the results. Right. You should do this minimally.
Right. This part shouldn't do the heavy lifting,
but you might still want to combine some of the results,
filter a small percentage of them out, that kind of slip through
the nearest neighbor approximation. And for that,
again, the query manager on top of your vector based retrieval
is useful. Okay,
so now we talked about why vectors. We talked
about the types of things you can build.
We focused on the surgeon recommendations.
We will actually touch on the generative
AI in a minute, but let's talk about
how you will actually get this done. Right. This is the interesting part.
So we are motivated. We want to build a vector powered system.
I split it into three parts, this section. So first,
just what do you need to get started? Right, this is
some basic demo. You are just playing with vector embeddings.
You want to experience how they work. Right? For this,
youll need these four items, basically, very simple. Youll need your data,
right? Ideally unstructured text or images.
Basically you need to load this data from wherever you
have it right now, maybe on your computer or in a database you need to
load it into. Ideally the simplest is a
Python notebook collaboratory or
one of the online providers of Python notebooks will totally
work. Once you load the data, you need to
turn it into vectors. Obviously for this youll need a
vector embedding model, right? There is a bunch of open source
models out there. You can, for example, go to hugging phase and find
something that's popular. And from that you kind of
know that maybe it's an interesting place to
start. There are
also APIs, right? So for example, the famous OpenAI
API that can for a given piece of content,
generate the vector embedding. Of course there you'll have to pay.
But you can embed thousands and thousands of
pieces of content for cents or tens of
cents of dollars. The cost is
minimum until you start to
work with millions or tens of millions of pieces of data.
And then finally, okay, so you have your content, you have the
vectors that represent the content and you want
to find similar vectors to just sort of see. Okay, for this
image, which ones are similar? You don't need any extra infrastructure
for this. There's a bunch of libraries in Python like
Sklearn that has built in cosine similarity.
So you just literally have vectors as numpy arrays and you
do a similarity calculation with
kind of all your content. Right. So there is no indexing
going on. This is brute force. This totally works for even
thousands of data vectors.
And this is a great way to get started and get a feel for how
this works. You'll have access to the slides and I
have these examples linked. So there is a collaboratory
jupyter notebook that basically contains
an example like that. Okay, so this is the getting started, the first
steps right now, the second part. What will you need to add to
this to build an MVP? And I would kind of
advocate for two parts. Sometimes people just kind of
go with the vector database and call it a day. So a vector
database is a place to, once you create your vectors,
you store them there, you index them there, and then in
youll product you can do a query into the vector database instead
of the cosine similarity directly in the notebook.
Right? So that's kind of the basic setup for the MVP.
But what I would want to also add to that basic package
is some approach to evaluation,
right. We are working with vectors. We are experimenting
with vectors because we want to improve the end user experience, right.
This is the whole point. And so you need some way to keep an
eye on the quality and on what also the users
think about this, right. So obviously the first step
will be just eyeballing the results. Sanity checking, do they
make sense? Let's look at 20 random inputs. Let's look
at the top ten results.
This sanity checking is priceless,
right? You definitely need to start there. Youll find a bunch of issues.
You'll need to tweak your models, choose different embedding models and so on.
Then the second step is kind of quantitative evaluation,
right? If you have some kind of data
labels or some annotations from before, let's say for
searches, which results people are actually clicking on, or for
recommendations, a similar kind of data set, you can back
test vector powered implementation
of your search or of your recommendations and then kind of see, okay,
are the things that people tend to click on appearing
high up often enough in the vector
powered results? Right. Is there a chance that this will actually work?
And then once you get into the product, once you have the MVP out there,
it's useful to collect user feedback, right. Are people
interested in the results? Do they think they make sense?
And then finally, and probably most importantly, the analytics,
are people actually interacting more with the content?
Are you achieving your goal? Are you getting more people to
apply for a job or whatever the recommendation, use case
or search use case that you have, right? The analytics is kind of the be
all, end all. That's where a b testing comes in and all of that other
stuff, but maybe not for the MVP. Some basic analytics
setup, however, is something that I would definitely recommend because
that's how you learn if your mvp should
transition into the next stage. And the next stage is where this all
gets very complicated, right? Because you
are not just vectorizing pieces of content and letting
those vectors be, you are now assembling all those different
constraints. So previously we might have just worked with
one embedding model, right? The semantic embeddings,
vectorizing the content, vectorizing the queries, matching the two simple,
right? But if you guys want to do
the stuff that I described before, where you assemble
content signals and collaborative signals and all these other
stuff into one vector and then have a state of the art system,
you need a few more components, right? So I'll just quickly
run through this.
You'll need a sort of a serving system that has APIs,
and on one side you'll push the data
into the API, right. You'll push youll content and users metadata
and you'll push in your events. So this is what the users are actually
doing in the app, right. So we are actually using the user history as
well, not just the semantic embedding of the content and the query.
And we are also using the user metadata. Right. So that's why the
user data is also on the input. So this should all be coming into
the kind of vector powered system that you have
built. And then the
next step from there is the transformation. So all of this data
will have properties, will have different parts of it that
ultimately should make it into the vectorization.
Right. Different aspects. Let's say you have a job, you want to
classify seniority required for that position.
So the job title is probably pretty important for that. And the transform is
just about pulling that seniority key from
the content object and making sure that it comes into
the classifier vectorizer that resolves.
Okay, what is the seniority required for this job? Right. So that's the transform step.
Then finally you have the vectorization, which is where youll be loading
the embedding models from hugging phase. But also you'll
probably have some of your own models because your data,
there is always that aspect that's kind of special for your product and
your kind of embeddings model needs to be able to
support that, right? Sometimes this is just
loading a pretrained large language model, but sometimes it's
just vectorizing recency like we have
shown in one of the previous slides, right, because your users care about fresh
content, so you want to add that as one of the features. And now you
have a problem. How do I do that? How do I express recency in
a way that I will not have to recompute the age of the content?
I'll not be putting the age of the content
into the content vector and then having to run a pipeline every
night to update that, right. That's not good. So you need to be
really mindful about how each of these data
properties actually makes it into that vector, right? So that
it is available during the nearest neighbor lookup to guide the
recommendation. So that's the vectorization step.
And then finally, okay, we have the database.
It has the user and content vectors. So that's the thing that
obviously you are building this on top of the vector database.
You have your user and content vectors in there. They're up to date. Ideally,
if you want TikTok level recommendation performance,
then these vectors are updated in the real time so that when the
user clicks on something or does something, you immediately use that
signal on the input the event into recomputing
your vectors. And then finally you do the retrieval, right.
You do the nearest neighbor search to power your recommendations.
And then through an API layer again, you surface it
into your product search recommendations. We talked about bot detection
through the user to user similarity,
user segmentation, and other use cases, right? Sometimes when
the user is, for example, searching, you might have to kind
of feed the query back through the vectorization
before youll can do the retrieval.
But this is kind of the anatomy of a
system that you would need to
really pull this off in production at scale,
you can expect something looking like that.
All right, and finally, let's look at the generative.
What does all this have to do with generative AI?
Right, this is the current hype. That's actually
for a large part driving the vector hype.
So let's kind of connect the dots here, right?
So what is generative AI? And I'll just talk about
the text, but this obviously applies to other modalities
like images. But if you think about these large
language models, basically they are function that
on one side takes text, a prompt, and then outputs
some sort of response to that prompt.
Now, the next kind of
step that happened after people kind of played with this very
basic setup, right? I prompt you, you give me a response.
Cool. The next thing you can do is kind of take that response and
feed it back into the model,
right? And that's where, for example, Chat GPT really
took off, right. Because you had that sustained conversation back and
forth, the model saw its own previous responses
to you at every step of the way and therefore was
able to build a response that feels like you are
actually talking to the model. But it's critical
to understand that the model itself doesn't have any memory,
right. It only responds to the
text that it receives on the input. And so in the chat use case,
you are kind of constantly feeding in the whole chat
history or a part of it, right. If the chat is too long,
then it's just the recent part. This is why, for example, in Chat GPT,
if the conversation is long enough, it forgets
the stuff that was said a while ago, right?
Because the context window can only take so
much of the history and it has no other memory, right? So it
has no other place to store,
let's say, things that it learned about youll, right, so this
is the Chat GPT situation.
There are libraries that are useful for working with,
building on top of generative AI use cases.
I youll recommend checking out lank chain. There is a 13 minutes
explainer video that I found really useful. So again,
check out the link that I added to most slides.
There is something useful there. All right,
but what's up with this stuff? So here
I show, I demonstrate that the system kind
of pretends that it has memory, just to illustrate how, when I
was saying that for it to sort of feel like
a chat, it has to constantly feed back the whole chat history and
then generate the next response.
However, you can sort of play with it a game where you
ask it to make up a number and not tell you, and then you
guess it and it gives you feedback if the secret number is higher or lower.
And you can actually play through a game like that, right? You can
guess. It tells you if you are too high or too low.
And then finally, eventually you get the result.
It's just that technically it couldn't have
made up a number and then kept it somewhere
in memory, because the chat is its only memory,
right? So this is an interesting case of the model sort of pretending
that it thought about the number and didn't tell you, but again,
it has no place to store that number, right? So you can
try this yourself. It's actually quite interesting.
And so, okay, obviously I kind of set it up.
The touch point between what we talked about before, vector representations
of data and generative AI is this aspect
of memory, right? Is this aspect of being able to
use the large language model to generate a
vector representation? Because that can also be a byproduct. It doesn't
have to be text, it can be a vector and then
store, let's say you are processing a document, a large document
that doesn't fit into the input text all at once,
right? You might want to chunk it up by chapter or
paragraph and then generate vectors that then
you store in some vector database, right? So for each paragraph
you'll have a vector that represents its meaning.
And then you can do clustering, do similarity
search, all kinds of different things on this kind of corpus
of memory that you build out with the large language model.
And there has been this word floating around
agents. Sometimes the agents
can be autonomous and they kind of feed the output text
inside of the large language model and also use the large language
model output to actually decide what the next action should be. Should I retrieve something
from memory? Should I feed another new input to the model?
So those would be autonomous agents, but any sort of agent or
kind of controller module that you run on top of
the large language model sort of use case can just
have manually controlled logic of. All right, first I'll chunk
up a document, I'll generate those paragraph vectors, and then I'll use
that for search. Right? And that's the touch
point between vector embeddings and generative
AI use cases. Right?
Again here I recommend checking, but auto GPT if you haven't yet.
That's the most famous example of the autonomous agent.
All right, so today we covered why vector
embeddings, vector representations of your content,
but also users are useful,
how you can use them to build surgeon recommendation
features for your product, the different levels of
software stack that you need to pull this off.
And then finally, we also looked at the connection
of vector embeddings and generative AI. Thanks a lot for
joining me. Obviously I like
to talk about this topic and would love
to learn about your use cases for vector embeddings
and vector ops and learn about the things you
are struggling with in the space. As I mentioned,
I'm a founder at a company called Superlinked.
We are building a vector ops platform that makes
building surgeon recommendations on top of vector embeddings
easier. Right. And so we are interested in talking
to people who are in the space. They're experimenting with surgeon
recommendations and we would love to learn from
each other and deliver something useful. So let's connect
on LinkedIn and take it from there.