Conf42 Prompt Engineering 2024 - Online

- premiere 5PM GMT

Real-Time Data for Generative Feedback Loop (GFL) Automation

Abstract

Explore diverse real-world applications of GFL across different industries.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, everyone. Welcome to my session. I am super excited to be the speaker at Con42 Prom Engineering Conference. In this talk, you will learn what are the generative feedback loops and how to use them to create personalized recommendation solutions and generate targeted ads based on real time information. This session can be specifically useful for Python developers, data engineers. or, super base or we relate users looking for solutions for detecting changes on primary database like, a super base or PostgreSQL and streaming these changes and continuously updating vector databases. For ai powered applications, let me babur. I'm developer advocate at glass flow. I'm also microsoft's mvp For azure ai services if you have any questions, please scan the qr code and connect me on linkedin I will be more than happy to address all of your questions Here's what we will be covering today We'll start with an introduction. What is a generative feedback loop and followed by the real world use cases and real time GFL pipeline architecture. I'm going to explain and show you, the components, how to detect changes on your primary database and streaming these changes, continuously updating all your, as a destination vector database, when you are building AI powered applications. And you will see also demo of building a typical pipeline with, technologies like Glassflow, Superbase, and Vue 8 for our simple, ERBNB, listing application. To optimize and personalize these property listings for a Airbnb, this will be a live demo of running the production ready pipeline in 15 minutes. By the end of this, my session, you will be able to process a simple Airbnb listing data. which is in the super base and enrich it with AI and store it vector database like we wait, and you can also search through all these, and Richard listings in the way we're using, we weights, graph console. I'm incredibly happy to have the, such a knowledgeable, audience in this session and this webinar. So let's me start with, by understanding was a feedback loop. feedback loop actually relates to the use of, current, outputs, to optimize the future, results. It involves, as you can see in diagram, utilizing outputs, to generate better, inputs in the next stage and feedback loops are critical components also in, training AI models. Like when you are give, some AI model data, a model you can train, but in real time also user input can impact on the AI model responses in real time. Generative feedback loops or shortly GFL, takes also approach of a generative AI model. but it takes a step further by introducing continuous improvement cycle. As you can see in the diagram in GFL, the output generated by the AI, when you do from engineering, is not just a final product is a part of ongoing process. Usually GFL, takes the results generated from the. Gen AI or from your language models like GPT vectorizes them and saves the result back to the vector database and you can use this generated data for future AI processing. And this output is used to also improve the future results from the AI models. In other words, the outputs become the inputs for the loop cycle. And AI outputs are analyzed to optimize training data for the AI or algorithms, parameters. The goal. It's to improve the quality of the future content generation. Let's think about AI can generate a code snippets based on the users requirements and create a feedback loop with tests to improve the code over time. some people always confuses, GFL with GNI. GNI, we know that it primarily operates, in a one way direction. It generates a content based on the input data you provide for AI. however, GFL on the other side, incorporates the feedback loop where AI solution continues to learn and improves from the outputs as it's generated. And GNI, we know that it's typically used for creating new content or new data, summarizing existing data, but GFL is focused on improving AI outputs So over and over time, so the cycle of feedback loop learning and GFL solutions are more complex than the standard GNI models because they require mechanism for collecting feedback from the users or from the human, analyzing it and adjusting the AI models accordingly. Let me bring a couple of examples. A good example of GFL in action can be the personalized recommendations. There are AI might suggest products to user. Based on the browsing history or clicks or purchases on an online store, like a website. The users further can interact with these recommendations to provide the feedback to AI to refine the future suggestions. In the real estate, another example, in the real estate industry, for example, property listings are often updated. if you want to buy or rent the apartment with the new information, such as the price always changes, availability also changes, and additional features like, you can add sofa, you can remove the refrigerator and so on. And GFL pipeline can automatically update these descriptions of, let's say, real estate, listings or information. By making do, making sure that they always, car currently optimized for social engines. If you open a Google and then search for, specific apartment, if the apartment data is changes in real time, you can see also, this, also reflected on your search. Other examples, you can apply the GFL and these two other industries like a real time job listing optimization. For example, the platforms like LinkedIn can use this TFL. if the AI detects, let's say, someone searching for remote work or flexible hours, it can update the relevant job listings to emphasize these, expectations or aspects, thereby like increasing, the visibility of your job posting to the candidate. to make it more attractive, for them, or you can use the same solution GFL, like to customize travel itineraries. This is one, this example, I really, I like, let's say if you use your frequently search for some new trips online or cultural experience in different countries and AI. Can, create, generate some itineraries to include, some activities that perfectly aligned with the preference of the user and other exams, like a TV shows, you can like a Netflix can use to create a personalized viewing experience. the red Netflix. You can always see the quite relevant, movies, based on your, historical, watch it, like the movies always, it will find something relevant to your emotional, experience. if you want to learn more about the GFL, I found this article by the author Connor. He explains very nicely the concept of Generative Feedback Loops with large language models like LLMs, how you can retrieve the information from a vector database such as WeWeight to prompt vector databases generative model and then vectorize and save the results of the AI generated content back to the database. let's focus on now GFL with real time data. for listening. as you can see in the diagram, real time data transformation, and this transformation are the backbone of a very effective GFL automation. it's about detecting, real time data changes and continuously updating vector storage on the right side. In the context of, ERBNB listings, this means as soon as like new room is listed. you, the description of, for the room can be, created by the AI and vectorizes for a better search. And then, you can store the data. It's a vectorized result in a vector database. Let's say user searching for a cozy apartment in Paris using Airbnb, and he can see all up to date options for him in the same location. And the real time transformation also here, what it does, As you can see, if you see the yellow square is sort of pipeline, we are detecting some changes from APIs, files, databases, where exactly Airbnb listings, located, and then say, if new Airbnb listing is added, we are generating, description based on the Airbnb listing attributes. we are calling the open AI or other models, such as compilations and point. And then B for the generated content from the AI, we are calculating vector embeddings. And then we are sending a vector embedding to store somewhere in the vector database. Why we are storing in the vector database, because we would like to build, in the next step, like simple. And the application, to give the booking. com experience to the users, where they can search for. some, apartments, by, using the human language and then this human language can be also converted to the vector embeddings and we can compare our data corpus we created in the vector database. To find the matching the apartments for the user query. This is a, everything is happening in real time, as you can see, in the previous slide, I showed you. Data, can change here, maybe every minute or every five minute. And this data is captured and in milliseconds, data will be visible. In the vector databases. This is the concept of real time GFL, how that works. And now let's focus on, the, some of the technologies, to build, how we can build these pipelines. One of those technologies I'm working with also, helps you to build real time data pipelines for, AI use cases. Like you can build a pipeline such as a real time GFL using Glassflow and Glassflow simplifies also creation process of your real time data processing pipeline. As we have seen previously, you may spend up to 15 minutes to setting up everything. And your new pipeline is, ready to run in a production, environment. and also with Glassware, you can integrate, with various data sources such as PostgreSQL, MongoDB, or some message brokers like a Google PubSub, message queues like Amazon SQS, and you can apply transformation in the middle. Then you can store the results to the databases or BI analytics tools or vector storages like Vue Vid. So this is, how the works, the Glassflow, why we, decided to build this solution, especially for Python developers. we are trying to offer all in one platform, for, that focuses on the creation of, easy creation of your data pipelines for data engineers and data teams, especially for data scientists, you don't have to worry about the infrastructure under the hood. In other words, it removes the complexity of the, real time for data processing pipelines, be it Kafka plus Flink, you can do everything in a single serverless infrastructure. let me explain how that works, building a pipeline with Glasswell. let's say you, start with connecting to your live data sources like Airbnb listings using built in integrations, or you can build your own integration using Python SDK. Then you start to build your pipeline, within the local environment like a Gospel web app, or Using the CLI, if you prefer, like CLI option, then you implement your transformation function, which is a very heart of the transformation in Python. And you can, after your transformation function is ready, you can deploy to serverless execution engine, where your transformation can scale up to processing like billions of records. And you don't have to worry about the scaling and manually. And then when the transformation is ready, you can send the output event to the transcript. Internet destinations using the same integrations or building integrations. this is, some of the use cases you can achieve with Glassflow. You can build your pipeline, for example, to enrich your data with predicted future prices, using AI, and to detect some changes. In your database and also sending this data changes by transforming to their destinations. You can also build some let's say realtime clickstream analytics dashboard to analyze clickstreams data from your website and sends them to other downstream applications. If you want to know more about Glassflow and Glassflow use cases, You can scan this QR code, it will bring you to the GitHub repository. Where you can try some of our real world examples and run them just right from Jupyter notebook. Now let's come back, switch to our real time genetic feedback loop automation. We're going to build as a part of the session sample pipeline for GFL to make a feedback loop pipelines, as you can see, here in the diagram. We have data source, let's assume that ERB data always stored to the SuperBase because while using SuperBase, SuperBase is open source. First of all, then, it is another alternative to Google Firebase, which works quite nice, especially when you have a real time data and your data always changing your database. Let's say whenever new, ERB listings added, or you updated existing one. and Superbase can trigger an event, to send this change directly to the Glassflow pipeline, using it is a webhook, data source connector. Every change happening, Airbnb listings can be, actually, send it to the automatically to the Glassflow pipeline. And then when you, It reads the glass of pipeline. You can do where AI and vectorization and you can write transformation function to apply some, AI driven, solution in a driven insight in Python. In this case, let's say, AI model. can call the open AI to enrich listing and descriptions by generating, more descriptive, description for the Airbnb listing. you can summarize it and then transform it, all the descriptions. and then at the same stage will be vectorize it and converted to the vectors, format, and it'll be sent to the vector database like a , as you can see in diagram. I will show you, why this approach and why we are calculating your vector, embeddings. And again, like we are storing a vector database. you will see in the next slides, I will give you some of the example queries. It will give the, more, cleared, querying option for users. They don't have to use SQL, if data runs in the Vector database, they can use query data using their human language. let me, explain what is our sample data set. To train our AI or the build a pipeline and we can use like a simple CSV data set. And we're like, is there some room listings that say New York city from Airbnb last year in 2023. And this dataset includes every, Airbnb listing attributes, such as a listing name, host name, location, details, room type, price, availability, and so on. once a dataset we have. Here we can, what we can do by running this pipeline and saving to the data to the way with, we can run typical queries, like we can fetch, let's say, top five most reviewed listings in New York or in Brooklyn. this could be quite useful to identify popular listings in this area. Or you can find the listings, in Brooklyn that, a budget friendly, let's say, less than 100 US dollars and have a positive reviews, something like that. Or you can also search for the queries, may user might be looking for listings that are described, like I want have a, the apartment with a great view. This query would return the most relevant results based on the user queries. And this conversion, from the human language to the, vector query operation will be handled automatically by the vector database, like Vue Wait. Once you understand the pipeline and, GFL, let's build this pipeline. step by step. Here, how we build our pipeline, like we start with setting up our vector storage, which is our final destination. we create a pipeline with Glassflow and we set up the super base with a simple, Airbnb data. Then, we run the pipeline. Once data already In Vuelo GraphQL, I will show you how you can query, Vuelo weight using it is GraphQL console. Let me bring your attention to the demo here. As you can see the demo, in place. I will start by, analyzing, creating first, setting up the Vuelo weight cloud, you can create, the first cluster, on Vuelo weight. Then, once the cluster is up and running, you can, start to create a new collection, inside your cluster. It is called, let's call it Airbnb, NYC, which is we did in New York. Then, you choose a vectorized type, like in my case, I'm using Text two Vector from the open ai, and you choose a model like text embedding three small or large. for me, it's a three small is enough, and you can keep the rest of the configuration by default. So as a next step, and now once I have the, Vuex site is ready, I'm going to create the pipeline with Glassflow. You can sign up for it for free. You can get a free account. And when you can create your first pipeline easily, let's create a new pipeline. And then, choose a data source as a webhook. Because we are getting data from Superbase, as the events, sends events through the webhook. The third step is defining your transformation function. You can define by writing the Python code. As you can see, I have created already one simple transformation function. it, simply, receives, from SuperBase Airbnb listing data, and, using the OpenAI, it generates, the description from the Airbnb attributes. Then, after AI response, with a generated, Airbnb listing, description, we're gonna create, create the vector embeddings. As you can see, it has a handler function, which is important function for, GlassFlow. It automatically detects, whatever inside the Lua logic under the handler function. And then, our transformation function is more or less ready. So next step, yes, also we don't forget to include your, dependencies, like OpenAI dependency for my transformation function or Py file. And next day, we, I'm going to define data sync operation, where my transformer data will be send it again. I'm going to choose a webhook because out the data I will send to, we wait, collection that we, created with you in the before. So I'm gonna, find first, we wait cloud, API key, in the URL. I need a URL for the, my cluster and just copy past the URL for the cluster. And then I will bring the admin key for that. let me find the admin key from Vue with console. and I can give me the define also content type and like application JSON. Then. it's a mandatory to define also, the API key to make sure that we are securely connected to V Wave cluster. Then, we send our data. from the Glasslow Pipeline. here we go. I have the Bureau token and authentication is done. I will click on the next step. And you can see the overview of your Pipeline. And when you press to click Pipeline, your Pipeline is ready to run in a serverless environment. The last step is setting up a Superbase. Make sure that you have a Superbase account. Assume that I have, my already created a simple database, called it Airbnb, NYC, in super base. You can see, from I had some different data sets. This one is from, 2019 and has some attributes, to map the Airbnb listings, like host name, location, room type price. then, once I have, this data in place. I can go and navigate to, the table editor to see, sample data. As you can see, I have now, existing sample data, five, listings in place to give your understanding, like this is how the data is looks like. and I'm, I will create a webhook trigger on Superbase because Superbase, triggers using the webhook or the Glassdoor pipeline, right? let's create our webhook. Okay. Like you can call it, you can give it any name, in my case, maybe ERBMB, a listing, or ERBMB data change capture, because it changes, detects, the changes and sends to the Glasgow pipeline. And you're reading in the next step, create, choose a data type. database, table, which database table we are, choosing and events like in sort, we have frequent entries, should trigger in sort or so on. this, and also next step is we need to also define, Glassflow Pipeline Access Token in the webhook URL, because when the SIPL base calls the Glassflow, we need to securely make this connection using the access token. I put it in the header of the, view, triple base web hook, access token for the glass flow. Now, as you can see, is our, the pipeline, for the web hook is ready. And make sure you need to enable real time for your database because it should get real time updates. Now everything is set up. Next step I'm gonna. send, some more sample data to SuperBase because you remember we had the only, like five listings. Let's add maybe 10 or 20 more for, my listing. As you can see, I have a bunch of them in the sample data set, and I have also one Python script called, to populate the, SuperBase with sample data. And, I have one Python script called, to populate the, SuperBase with sample data. And, I have also one Python script called, to populate the, SuperBase with sample data. it just inserts, rows of data, in bunch, in batch mode. We are just, three, simulating, some incoming or registering your listings. but in reality, this, service, any service can call these, to, insert some more data. And let's run this Python script and generate some input data. I'm going to create 20 more rows for the super base listings. There we go. I added 20 more. Everything's successful. and now I can switch back to the super base and check if, this data already in place. Yes. As you can see, the data is in place and now. we can go to Glassflow and check, if the data received by the Glassflow after we did the insort on SuperBase side, right? at the real time what's happening. Yes. Here we go. As we insorted, every insort, detected by SuperBase and sent to the Glassflow pipeline, which is where our magic is happening. And Glassflow is already, in milliseconds, and sent to the SuperBase. after this transformation and I can now, ready to query, and search for Airbnb listings. And, the view with the GraphQL console. I did one query, like to find, five most reviewed listings in Brooklyn. Yes. Here we go. Here are some listings. And now let's try, some, query using, the human language. For that, you need also, the OpenAI key. Because, we need to, it's going to use AI to, use like human interactive searching. I'm going to pass my open AI API key. let's say when user, might be asking like luxury apartment with a nice view, and as you can see, it found, based on the human language, relevant, data. in our, database and the summary is actually generated, from the summary generated from the AI generated this summary, but we had in the beginning, only the data, about the Airbnb listings and this generated data summary can be always, change it and reach it, in based on the property, something that's on the property side. if I change now and find any property and change the price. For the property, it will be reflected immediately on the view weight. This is how the, real time, continuous vector embeddings generation. And, with the database, dating works. So that was my, demo in summary. we have, so far presented, concept of the generative feedback loops. And this describes, let's say, not only using the results from the database to answer, the user's queries, but, we can also save the result and, once, we create, save the results, back to the vector database again for future references. So this is called, we call it GFL and real time GFLs, uses real time data, to receive, like user input and change the AI output based on the user's interactivity. So you can always get most relevant and updated content using a real time GFLs. So I hope you find the session interesting. If you have any questions. and now we can jump on a current session or give you questions in a comment, and, scan score code to find the, the use case I showed you with GFL. Also other use cases, do you want to try out, you can try out some of other use cases. Thanks for your attention. and have a nice day.
...

Bobur Umurzokov

Developer Relations Manager @ GlassFlow

Bobur Umurzokov's LinkedIn account Bobur Umurzokov's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways