Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, everyone.
Welcome to my session.
I am super excited to be the speaker at Con42 Prom Engineering Conference.
In this talk, you will learn what are the generative feedback loops and how to use
them to create personalized recommendation solutions and generate targeted
ads based on real time information.
This session can be specifically useful for Python developers, data engineers.
or, super base or we relate users looking for solutions for detecting changes on
primary database like, a super base or PostgreSQL and streaming these changes and
continuously updating vector databases.
For ai powered applications, let me babur.
I'm developer advocate at glass flow.
I'm also microsoft's mvp For azure ai services if you have any questions,
please scan the qr code and connect me on linkedin I will be more than
happy to address all of your questions Here's what we will be covering today
We'll start with an introduction.
What is a generative feedback loop and followed by the real world use cases
and real time GFL pipeline architecture.
I'm going to explain and show you, the components, how to detect changes on
your primary database and streaming these changes, continuously updating all your,
as a destination vector database, when you are building AI powered applications.
And you will see also demo of building a typical pipeline with, technologies
like Glassflow, Superbase, and Vue 8 for our simple, ERBNB, listing application.
To optimize and personalize these property listings for a Airbnb, this
will be a live demo of running the production ready pipeline in 15 minutes.
By the end of this, my session, you will be able to process
a simple Airbnb listing data.
which is in the super base and enrich it with AI and store it vector database like
we wait, and you can also search through all these, and Richard listings in the way
we're using, we weights, graph console.
I'm incredibly happy to have the, such a knowledgeable, audience
in this session and this webinar.
So let's me start with, by understanding was a feedback loop.
feedback loop actually relates to the use of, current, outputs,
to optimize the future, results.
It involves, as you can see in diagram, utilizing outputs, to
generate better, inputs in the next stage and feedback loops are critical
components also in, training AI models.
Like when you are give, some AI model data, a model you can train, but in
real time also user input can impact on the AI model responses in real time.
Generative feedback loops or shortly GFL, takes also approach
of a generative AI model.
but it takes a step further by introducing continuous improvement cycle.
As you can see in the diagram in GFL, the output generated by the AI, when you
do from engineering, is not just a final product is a part of ongoing process.
Usually GFL, takes the results generated from the.
Gen AI or from your language models like GPT vectorizes them and saves
the result back to the vector database and you can use this generated
data for future AI processing.
And this output is used to also improve the future results from the AI models.
In other words, the outputs become the inputs for the loop cycle.
And AI outputs are analyzed to optimize training data for the
AI or algorithms, parameters.
The goal.
It's to improve the quality of the future content generation.
Let's think about AI can generate a code snippets based on the users
requirements and create a feedback loop with tests to improve the code over time.
some people always confuses, GFL with GNI.
GNI, we know that it primarily operates, in a one way direction.
It generates a content based on the input data you provide for AI.
however, GFL on the other side, incorporates the feedback loop where AI
solution continues to learn and improves from the outputs as it's generated.
And GNI, we know that it's typically used for creating new content or new
data, summarizing existing data, but GFL is focused on improving AI outputs
So over and over time, so the cycle of feedback loop learning and GFL solutions
are more complex than the standard GNI models because they require mechanism
for collecting feedback from the users or from the human, analyzing it and
adjusting the AI models accordingly.
Let me bring a couple of examples.
A good example of GFL in action can be the personalized recommendations.
There are AI might suggest products to user.
Based on the browsing history or clicks or purchases on an
online store, like a website.
The users further can interact with these recommendations to provide the feedback
to AI to refine the future suggestions.
In the real estate, another example, in the real estate industry, for example,
property listings are often updated.
if you want to buy or rent the apartment with the new information, such as the
price always changes, availability also changes, and additional features
like, you can add sofa, you can remove the refrigerator and so on.
And GFL pipeline can automatically update these descriptions of, let's say,
real estate, listings or information.
By making do, making sure that they always, car currently
optimized for social engines.
If you open a Google and then search for, specific apartment, if the apartment
data is changes in real time, you can see also, this, also reflected on your search.
Other examples, you can apply the GFL and these two other industries like
a real time job listing optimization.
For example, the platforms like LinkedIn can use this TFL.
if the AI detects, let's say, someone searching for remote work or flexible
hours, it can update the relevant job listings to emphasize these,
expectations or aspects, thereby like increasing, the visibility of
your job posting to the candidate.
to make it more attractive, for them, or you can use the same solution GFL,
like to customize travel itineraries.
This is one, this example, I really, I like, let's say if you use your
frequently search for some new trips online or cultural experience
in different countries and AI.
Can, create, generate some itineraries to include, some activities that perfectly
aligned with the preference of the user and other exams, like a TV shows, you
can like a Netflix can use to create a personalized viewing experience.
the red Netflix.
You can always see the quite relevant, movies, based on your, historical,
watch it, like the movies always, it will find something relevant
to your emotional, experience.
if you want to learn more about the GFL, I found this article by the author Connor.
He explains very nicely the concept of Generative Feedback Loops with large
language models like LLMs, how you can retrieve the information from a vector
database such as WeWeight to prompt vector databases generative model and then
vectorize and save the results of the AI generated content back to the database.
let's focus on now GFL with real time data.
for listening.
as you can see in the diagram, real time data transformation, and this
transformation are the backbone of a very effective GFL automation.
it's about detecting, real time data changes and continuously updating
vector storage on the right side.
In the context of, ERBNB listings, this means as soon as like new room is listed.
you, the description of, for the room can be, created by the AI and
vectorizes for a better search.
And then, you can store the data.
It's a vectorized result in a vector database.
Let's say user searching for a cozy apartment in Paris using Airbnb, and
he can see all up to date options for him in the same location.
And the real time transformation also here, what it does, As you can see,
if you see the yellow square is sort of pipeline, we are detecting some
changes from APIs, files, databases, where exactly Airbnb listings, located,
and then say, if new Airbnb listing is added, we are generating, description
based on the Airbnb listing attributes.
we are calling the open AI or other models, such as compilations and point.
And then B for the generated content from the AI, we are
calculating vector embeddings.
And then we are sending a vector embedding to store somewhere in the vector database.
Why we are storing in the vector database, because we would like to
build, in the next step, like simple.
And the application, to give the booking.
com experience to the users, where they can search for.
some, apartments, by, using the human language and then this human language
can be also converted to the vector embeddings and we can compare our data
corpus we created in the vector database.
To find the matching the apartments for the user query.
This is a, everything is happening in real time, as you can see, in
the previous slide, I showed you.
Data, can change here, maybe every minute or every five minute.
And this data is captured and in milliseconds, data will be visible.
In the vector databases.
This is the concept of real time GFL, how that works.
And now let's focus on, the, some of the technologies, to build,
how we can build these pipelines.
One of those technologies I'm working with also, helps you to build real
time data pipelines for, AI use cases.
Like you can build a pipeline such as a real time GFL using
Glassflow and Glassflow simplifies also creation process of your real
time data processing pipeline.
As we have seen previously, you may spend up to 15 minutes to setting up everything.
And your new pipeline is, ready to run in a production, environment.
and also with Glassware, you can integrate, with various data sources
such as PostgreSQL, MongoDB, or some message brokers like a Google PubSub,
message queues like Amazon SQS, and you can apply transformation in the middle.
Then you can store the results to the databases or BI analytics tools
or vector storages like Vue Vid.
So this is, how the works, the Glassflow, why we, decided to build this solution,
especially for Python developers.
we are trying to offer all in one platform, for, that focuses on the
creation of, easy creation of your data pipelines for data engineers
and data teams, especially for data scientists, you don't have to worry
about the infrastructure under the hood.
In other words, it removes the complexity of the, real time for data
processing pipelines, be it Kafka plus Flink, you can do everything in
a single serverless infrastructure.
let me explain how that works, building a pipeline with Glasswell.
let's say you, start with connecting to your live data sources like
Airbnb listings using built in integrations, or you can build your
own integration using Python SDK.
Then you start to build your pipeline, within the local environment like a Gospel
web app, or Using the CLI, if you prefer, like CLI option, then you implement your
transformation function, which is a very heart of the transformation in Python.
And you can, after your transformation function is ready, you can deploy to
serverless execution engine, where your transformation can scale up to
processing like billions of records.
And you don't have to worry about the scaling and manually.
And then when the transformation is ready, you can send the
output event to the transcript.
Internet destinations using the same integrations or building integrations.
this is, some of the use cases you can achieve with Glassflow.
You can build your pipeline, for example, to enrich your data with
predicted future prices, using AI, and to detect some changes.
In your database and also sending this data changes by
transforming to their destinations.
You can also build some let's say realtime clickstream analytics
dashboard to analyze clickstreams data from your website and sends them
to other downstream applications.
If you want to know more about Glassflow and Glassflow use cases,
You can scan this QR code, it will bring you to the GitHub repository.
Where you can try some of our real world examples and run them
just right from Jupyter notebook.
Now let's come back, switch to our real time genetic feedback loop automation.
We're going to build as a part of the session sample pipeline for GFL
to make a feedback loop pipelines, as you can see, here in the diagram.
We have data source, let's assume that ERB data always stored to
the SuperBase because while using SuperBase, SuperBase is open source.
First of all, then, it is another alternative to Google Firebase, which
works quite nice, especially when you have a real time data and your
data always changing your database.
Let's say whenever new, ERB listings added, or you updated existing one.
and Superbase can trigger an event, to send this change directly to
the Glassflow pipeline, using it is a webhook, data source connector.
Every change happening, Airbnb listings can be, actually, send it to the
automatically to the Glassflow pipeline.
And then when you, It reads the glass of pipeline.
You can do where AI and vectorization and you can write transformation
function to apply some, AI driven, solution in a driven insight in Python.
In this case, let's say, AI model.
can call the open AI to enrich listing and descriptions by
generating, more descriptive, description for the Airbnb listing.
you can summarize it and then transform it, all the descriptions.
and then at the same stage will be vectorize it and converted to
the vectors, format, and it'll be sent to the vector database like
a , as you can see in diagram.
I will show you, why this approach and why we are calculating
your vector, embeddings.
And again, like we are storing a vector database.
you will see in the next slides, I will give you some of the example queries.
It will give the, more, cleared, querying option for users.
They don't have to use SQL, if data runs in the Vector database, they can use
query data using their human language.
let me, explain what is our sample data set.
To train our AI or the build a pipeline and we can use like a simple CSV data set.
And we're like, is there some room listings that say New York city
from Airbnb last year in 2023.
And this dataset includes every, Airbnb listing attributes, such as a listing
name, host name, location, details, room type, price, availability, and so on.
once a dataset we have.
Here we can, what we can do by running this pipeline and saving to
the data to the way with, we can run typical queries, like we can fetch,
let's say, top five most reviewed listings in New York or in Brooklyn.
this could be quite useful to identify popular listings in this area.
Or you can find the listings, in Brooklyn that, a budget friendly, let's say,
less than 100 US dollars and have a positive reviews, something like that.
Or you can also search for the queries, may user might be looking for listings
that are described, like I want have a, the apartment with a great view.
This query would return the most relevant results based on the user queries.
And this conversion, from the human language to the, vector query operation
will be handled automatically by the vector database, like Vue Wait.
Once you understand the pipeline and, GFL, let's build this pipeline.
step by step.
Here, how we build our pipeline, like we start with setting up our vector
storage, which is our final destination.
we create a pipeline with Glassflow and we set up the super
base with a simple, Airbnb data.
Then, we run the pipeline.
Once data already In Vuelo GraphQL, I will show you how you can query, Vuelo
weight using it is GraphQL console.
Let me bring your attention to the demo here.
As you can see the demo, in place.
I will start by, analyzing, creating first, setting up the Vuelo
weight cloud, you can create, the first cluster, on Vuelo weight.
Then, once the cluster is up and running, you can, start to create a
new collection, inside your cluster.
It is called, let's call it Airbnb, NYC, which is we did in New York.
Then, you choose a vectorized type, like in my case, I'm using Text two Vector
from the open ai, and you choose a model like text embedding three small or large.
for me, it's a three small is enough, and you can keep the rest
of the configuration by default.
So as a next step, and now once I have the, Vuex site is ready, I'm going to
create the pipeline with Glassflow.
You can sign up for it for free.
You can get a free account.
And when you can create your first pipeline easily,
let's create a new pipeline.
And then, choose a data source as a webhook.
Because we are getting data from Superbase, as the events, sends
events through the webhook.
The third step is defining your transformation function.
You can define by writing the Python code.
As you can see, I have created already one simple transformation function.
it, simply, receives, from SuperBase Airbnb listing data, and, using
the OpenAI, it generates, the description from the Airbnb attributes.
Then, after AI response, with a generated, Airbnb listing, description, we're gonna
create, create the vector embeddings.
As you can see, it has a handler function, which is important
function for, GlassFlow.
It automatically detects, whatever inside the Lua logic under the handler function.
And then, our transformation function is more or less ready.
So next step, yes, also we don't forget to include your, dependencies,
like OpenAI dependency for my transformation function or Py file.
And next day, we, I'm going to define data sync operation, where my
transformer data will be send it again.
I'm going to choose a webhook because out the data I will send
to, we wait, collection that we, created with you in the before.
So I'm gonna, find first, we wait cloud, API key, in the URL.
I need a URL for the, my cluster and just copy past the URL for the cluster.
And then I will bring the admin key for that.
let me find the admin key from Vue with console.
and I can give me the define also content type and like application JSON.
Then.
it's a mandatory to define also, the API key to make sure that we are
securely connected to V Wave cluster.
Then, we send our data.
from the Glasslow Pipeline.
here we go.
I have the Bureau token and authentication is done.
I will click on the next step.
And you can see the overview of your Pipeline.
And when you press to click Pipeline, your Pipeline is ready
to run in a serverless environment.
The last step is setting up a Superbase.
Make sure that you have a Superbase account.
Assume that I have, my already created a simple database, called
it Airbnb, NYC, in super base.
You can see, from I had some different data sets.
This one is from, 2019 and has some attributes, to map the Airbnb listings,
like host name, location, room type price.
then, once I have, this data in place.
I can go and navigate to, the table editor to see, sample data.
As you can see, I have now, existing sample data, five, listings in place
to give your understanding, like this is how the data is looks like.
and I'm, I will create a webhook trigger on Superbase because
Superbase, triggers using the webhook or the Glassdoor pipeline, right?
let's create our webhook.
Okay.
Like you can call it, you can give it any name, in my case, maybe ERBMB, a
listing, or ERBMB data change capture, because it changes, detects, the changes
and sends to the Glasgow pipeline.
And you're reading in the next step, create, choose a data type.
database, table, which database table we are, choosing and events like
in sort, we have frequent entries, should trigger in sort or so on.
this, and also next step is we need to also define, Glassflow Pipeline
Access Token in the webhook URL, because when the SIPL base calls the
Glassflow, we need to securely make this connection using the access token.
I put it in the header of the, view, triple base web hook,
access token for the glass flow.
Now, as you can see, is our, the pipeline, for the web hook is ready.
And make sure you need to enable real time for your database because
it should get real time updates.
Now everything is set up.
Next step I'm gonna.
send, some more sample data to SuperBase because you remember we
had the only, like five listings.
Let's add maybe 10 or 20 more for, my listing.
As you can see, I have a bunch of them in the sample data set, and I have also
one Python script called, to populate the, SuperBase with sample data.
And, I have one Python script called, to populate the, SuperBase with sample data.
And, I have also one Python script called, to populate the,
SuperBase with sample data.
it just inserts, rows of data, in bunch, in batch mode.
We are just, three, simulating, some incoming or registering your listings.
but in reality, this, service, any service can call these, to, insert some more data.
And let's run this Python script and generate some input data.
I'm going to create 20 more rows for the super base listings.
There we go.
I added 20 more.
Everything's successful.
and now I can switch back to the super base and check if,
this data already in place.
Yes.
As you can see, the data is in place and now.
we can go to Glassflow and check, if the data received by the Glassflow after we
did the insort on SuperBase side, right?
at the real time what's happening.
Yes.
Here we go.
As we insorted, every insort, detected by SuperBase and sent
to the Glassflow pipeline, which is where our magic is happening.
And Glassflow is already, in milliseconds, and sent to the SuperBase.
after this transformation and I can now, ready to query, and
search for Airbnb listings.
And, the view with the GraphQL console.
I did one query, like to find, five most reviewed listings in Brooklyn.
Yes.
Here we go.
Here are some listings.
And now let's try, some, query using, the human language.
For that, you need also, the OpenAI key.
Because, we need to, it's going to use AI to, use like human interactive searching.
I'm going to pass my open AI API key.
let's say when user, might be asking like luxury apartment with a nice view,
and as you can see, it found, based on the human language, relevant, data.
in our, database and the summary is actually generated, from the summary
generated from the AI generated this summary, but we had in the beginning,
only the data, about the Airbnb listings and this generated data
summary can be always, change it and reach it, in based on the property,
something that's on the property side.
if I change now and find any property and change the price.
For the property, it will be reflected immediately on the view weight.
This is how the, real time, continuous vector embeddings generation.
And, with the database, dating works.
So that was my, demo in summary.
we have, so far presented, concept of the generative feedback loops.
And this describes, let's say, not only using the results from the database to
answer, the user's queries, but, we can also save the result and, once, we create,
save the results, back to the vector database again for future references.
So this is called, we call it GFL and real time GFLs, uses real time data, to
receive, like user input and change the AI output based on the user's interactivity.
So you can always get most relevant and updated content using a real time GFLs.
So I hope you find the session interesting.
If you have any questions.
and now we can jump on a current session or give you questions in a comment,
and, scan score code to find the, the use case I showed you with GFL.
Also other use cases, do you want to try out, you can try
out some of other use cases.
Thanks for your attention.
and have a nice day.