Conf42 Kube Native 2024 - Online

- premiere 5PM GMT

Unleashing the Potential of Cloud Native Open Source Vector Databases

Abstract

The quick and easy way to run a true cloud native open source vector database for RAG, Semantic Search, Image Search and many more use cases the cloud native way.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, and welcome to my talk today. It is unleashing the potential of cloud native open source vector databases. It would have been longer, but I really can't fit that. I'm Tim Spann. I'm the developer advocate here at Zillow's along with my partner in Berlin and other people coming all over the world, I cover Milvus, the open source vector database, which is pretty awesome. We're going to show you a bunch of slides as we tend to do. Kube native is complex. You're trying to get everything running natively in the cloud in Kubernetes. But we'll show you the easy way to run an advanced open source distributed vector database. But we'll show you some cool demos first, really easy. Is there anything special going on? We're including a chatbot we created to use Milvas and let you chat with a lot of different documentation. Why is that a thing? There's a lot of unstructured data. And let's show you some of it now. It could be something like, I want to do something with Kafka. A popular streaming system. If you're down in Austin with we've got some of my colleagues down there and some of my friends from StreamNative and from a bunch of other cool companies. If you're in Austin, I definitely say check out Current, pretty cool event. And here's a really easy way you can learn how to use it. Very easy project all running on Milvus and we give you the source code. Now, if you want to do something like a search, you could do that. What is Milvus? If you didn't know, it's also a bird, but we're talking about the database, high performance vector database and other demos. You could see these are all public URLs. You don't need any special thing to get there. You can also do something like get a cat, but you should have one and search by the cat. Oh, those are pretty cool cats. You get the idea. Also, we could do a little smarter cat and ask a question. Show me ceiling cats. Let's see, that one might be a little hard. But we've got text and we've got an image. Oh, actually, they're doing pretty good. Now, the one thing we could do is use a re ranking algorithm to adjust them algorithm. based on percentages. It's pretty cool. One is doing a search by the vectors for my cat picture and one is doing the vectors of my text. It'll re rank them here and that is the best one because you can see it's a ceiling cat. So that's pretty awesome. We've got a tutorial you could follow, do your own just to give you an idea. Let's get back to the slides for a little bit. Then we'll go to my custom demo and source code. But, any of this data that's not typical, maybe it changes, maybe it's variable, maybe it's things like images. They change a lot. One pixel changes, it's a different encoding. Videos. Text molecules. There's a lot of it just getting more and more, and most of it's never analyzed. You don't query it. You don't use it to find things. Now you can, which is pretty awesome. This state is growing will continue to grow. But what's nice is thanks to the power of these new deep learning models for the last couple of years, we could take this unstructured data applied deep learning models, map it to vectors. Thank you, neural networks, and then make it available in a vector database like Milvus. Or if you want it fully Kubernetes, fully managed for you, Azilis Cloud does all that. You don't have to worry about setting up your own Helm charts and doing all that work. But what is a vector database? It's built so you can store, index, and query all these vector embeddings of whatever these unstructured data is. But it adds things on top of what you might get from other ways to do this. So you get filtering. You get a hybrid search like you saw. Dense vector, some text. It's durable. It's going to be saved. It's going to make sure you don't lose data. Things can be replicated. Highly available in a cluster. Things that make sense for native Kubernetes. Sharding, aggregations. Things will be backed up. You could do full crud. You can create, you can update, delete things. Indexing, re indexing, multiple people using the same app. These are cloud things we all know about. Vector search libraries, you can do vector searches very quickly with a lot of options there. And there's a lot beneath the covers to support things like GPUs and custom chips. And to be able to scale out to billions of them. Be able to insert and delete fast. Be able to handle a lot of people querying at once. And regardless of what that unstructured data is. We mentioned the fun ones, images and video and your emails and all kinds of PDFs and random documents and audio. But it could be logs, can be any data. Maybe it's data that doesn't doesn't always look the same. Maybe it changes. Maybe it's semi structured. Maybe it's structured, but maybe there's an image with it. There's a lot of different data that works well on a vector database. A lot of reasons why you may need to use this data as part of a RAG approach against a large language model. So having a vector database makes sense. Again, take the data, make it a vector embedding, get it in the vector database so it's available and safe. Perform an approximate nearest neighbor similarity search, get your results, and then people can query it and do what they need with it. Very often, like we said to use in various applications, especially in AI. Now, could you get by with something else? There are some really good libraries out there, and we know this because we use them underneath the covers. The ones from Facebook. The the Small Worlds one. But these are really designed for Writing your own little apps, testing things out, learning, working with a couple hundred thousand vectors. A lot of functionality there, but it's a library. You could take some, existing database that does something else, add one of those libraries, and that kind of does a little bit. Does it scale out as big as you need? No. Does it support the full life cycle of a vector? Do you get real time search? Top K results, range searches, hybrid searches, multi modal, multi vector. We could have 10 vectors in a single row in our collection. Fully distributed, fully compute and storage separated, so you could easily do Kubernetes, that's what Milvus does. Now, Milvus has been around for a while here. It is part of the Linux Foundation AI and Data. It is Apache licensed. A lot, almost 30, 000 stars. A lot of people using this. Download numbers go up all the time. You could start off with just a pip install in a Jupyter notebook and be ready to go. Code is very reusable. Write it against something that just runs on your laptop. Then run out to the biggest Kubernetes cluster ever. And it integrates with all the things that you're used to using already, whether it's OpenAI, LangChain, Llama, Index, whatever it is, all the features you expect from the modern vector database. Dense and sparse encodings, filtering, re ranking, as new things come out, it is added there. Very scalable, very elastic. Again, using the power of Kubernetes to shrink down when you need to. Also, you can prototype in Docker, run with Milvus Lite in your notebook, and while not incurring that cost until you're ready. Which is nice. Support for all the hardware acceleration that's available in all the different clouds, different ways to search, different ways to tune the consistency. So depending on what your needs are, you have a lot of options. If you need to improve performance regardless of anything else, you could do that. Support for all the big indexes, things like sparse indexes, Disk space indexes, GPU indexes, especially with NVIDIA. They're doing some great stuff with Cagra. The hierarchical, navigable small worlds one, which is very fast, does a lot of things. Different types of searches and ability to group them and filter them. Really important to optimize, speed things up, and get just what you need. Again, multi people using everything. All the hardware you need. Very straightforward there. And there is someone out there with almost a hundred billion vectors. And it can be done. Ten billion vectors. with a 1500 dimension vector in there that exists in a single instance on the cloud. Again, decoupling compute and storage as we expect in everything here. And it is done because we need to scale everything. A proxy to access thing. Be able to log all the coordination of what's going on with all the elements in your cluster, whether it's data querying or indexing. All those workers making sure you get your All your queries done no matter how many you need. Scale up more if those slow down. Index as data is coming in. Make sure it's indexed and ready to go. Make sure your data is processed and then stored. And permanently stored without being lost. We store this in native storage, whether it's S3 or what have you. Support for all the major vector index types out there. Approximate nearest neighbor search type stuff. Everyone's used HNSW, it is very good, graph based. The Kaggler one by NVIDIA is a GPU version, which is incredibly fast, which is important. If you want things super accurate, might not work on a super large one, you could do flat. There is a GPU version of that. The quantization based IVF flat is a nice balance, a little faster there. Accuracy is not as high as the pure flat SQ8 quantization reduces your disk load for PQ, again, another one of the quantized base indexes, you want a high query speed, and you're willing to take a little bit of accuracy off, again, You get those options, what you want to do again, being able to do by disk is nice. So you don't have to put a lot of stuff in RAM. Traditionally, things have to go in RAM to be able to get to a high level. As you can see, this will look pretty familiar to people who run Kubernetes. Having the separation of different services. And stores like we mentioned workers so we can scale out as large as we need or shrink back down. We have messaging in the middle, your choice of Kafka or Pulsar, and both of those are very powerful. This again makes sure you don't lose data, makes things are distributed, makes sure things aren't stuck waiting for something, which is great. And again, getting those workloads distributed and out to however many people need them, keeping separate indexing, querying, and data processing and storage, so things can scale out. We have etcd in there great for coordination and metadata. You really need that. And again, etcd is really good in our environments there. So we use that again, a different way to look at it. Obviously, we're running on Kubernetes here on top of different servers, but we could take advantage of different processors and different systems, depending on where you're running and showing you a little more of how they connect to each other. Again, keeping things separated so we could scale them out pretty easily. Remember I said the install? That is the full install. And that will give you the libraries to talk to Docker or the cloud or your, wherever your cluster is running. And also give you the ability to run on your small device. Just in a notebook. Don't need anything more advanced. Under the covers it does all the magic. Like I said, magic is taking unstructured data and turning it into something that AI can understand and find quickly is what these various embeddings are used for. Retrieval Augmented Generation, RAG, you're going to hear that term a lot. A ton of people are doing this because it's helpful. It saves you money. It makes your results more accurate. Gives you back more relevant results. Reduces significantly hallucination. And you can provide, Something specific to a domain, like I gave it air quality data, gave it just articles about Milvus, so we get better results if someone asks a question. I send that to the vector similarity search first, search my collections to get domain specific data, things that maybe are private to me or that I know you're going to need for that question, use that to Give you the better answers from all the major models out there. Whether you're hosted on Allama, or OpenAI, or Hugging Face, or Google, or Microsoft, or Cohere, or Mistral, or all those great options out there. Get your answers back to your actions. Pretty straightforward. And again, what's nice is, you can keep your data secure. No one has to see it. And you can even run your own local models, so you don't have to worry that the model will be retrained on your specific stuff. Data sovereignty is a cool word, meaning I own it. I'm not having my data leak out somewhere, if it is my secret sauce there. And you got options. Now, if you are a Kubernetes expert, which I'm imagining we have some people on the conference that are, you could set up everything. Everything's open source, including things to make it even better. I'll show you AddTo is a nice GUI for managing it, so you don't need to pay for anything. Nowhere is a vector search engine embedded into Novus to use all those popular libraries, make it so it can be expanded out to support more. I made things run fast, again distributed, and again another open source project. When you saw that OSS chat, we found out that calling models like OpenAI is pricey. GPT is not cheap. So we came up with a good way to cache your queries and give you optimal results. Save money with a very simple open source tool. If you want to know how fast things are, we publish a benchmarking tool in the open source. You could do it if you just want to create collections and let someone else do the work. Not a bad idea. Have things a hundred percent secure. Have Zillow's cloud do it. They'll put it where you need to put it. To store it. Or if you already have things built out, you can run on top of that. All the embedding models we mentioned before. All the resources. I'm going to go into demo next. I want to get the slides out of the way. If you're watching this later. Fast forward until the demos if you want. This is important information. Give us a star to make sure we're on the right track for the open source project. A lot of other databases are not open source. Also, if you got problems, Discord is where we usually hang out. We've got engineers in there. I'm in there. Our friends all over the world are in there. other community members, power users. We also have a really good AI chat bot that has been trained on our stuff, trained on these problems, is exceptionally good. So definitely check us out on Discord if you're having problems, if you're in New York. or near New York or Princeton or Philly. I run a meetup. Now I do record this, sometimes stream it. We try to stream it on zoom or YouTube depends on network traffic and, the whatever's going on at the time. Sometimes it's easier than others, but I'm running one in a couple of days, usually once a month definitely check it out. We also have ones in California and in Berlin and in Asia. So check those out. If you want to learn more about Genitive AI, we keep a really good selection of knowledge articles with notebooks and how to get started regardless of where you want to run it. We are friends with everybody. So like I mentioned before, all those different names, we've got examples with them. I like to do a Llama so I can run it on the laptop for free. Don't have to pay anyone. I've got a ton of articles on how to do this. I've got an interesting one on using street cameras data in New York. That is not AI. That's mine. And I got some other one on showing you that we don't just store vectors. We have a ton of support for different numbers and other fields because you'll see Vector is important, but unstructured data is often next to structured or semi structured data. You want to put a whole chunk of JSON in there. Maybe you've got a couple of metadata fields really important. If I have an image, I might want to link to the original source. Some metadata about the size. Maybe a description. Things we don't need, but nice to have. The more data you have, The better. If you have data about an image, why not use it? Pre compute things, make things faster, easier. Again, if you want to see the new Milvus Lite, and how you can use that in a notebook, we've got an article there. And, I'm doing some cool stuff with Milvus at the Edge. Whether it's a client or running Milvus Lite there. Or even the Docker version. We could do that. In this one I'm doing some stuff with some AI kits, and it's fun. You could check that article out. I put out a weekly newsletter. Covers everything. All the fun stuff in open source. Plus some streaming stuff and some other stuff I've worked with in the past. If you want to reach out, this is where we are. You can always find me on LinkedIn more than I would like, but I'm there. Same with Twitter. Let's get into some demos. Hopefully everything in the world didn't time out. We showed you these demos. As you can see, you can go right to them. Let's see if all our other stuff timed out. I hope it did not. Okay, so I have a cluster here hosted on Zillow's for free. I'm using the free plan for myself here. And I have a couple of collections. It's pretty easy to get to. You can also use add to, remember that one, regardless of which version. So if I want to connect to Docker one, I can do that. If I want to connect to Zillow's cloud, I can do that. And this shows me all the fields I have. You can see here, I've got one vector field with the with the flat, but I also have a bunch of fields here, and these are important ones. I've got a text field in here, so I can chunk a big piece of text in there, so I can directly get that, and I can do vector searches. If you don't know what to search on, this will let you figure that out, or you could just browse the data, which is pretty helpful. What's nice with this tool is I can pick how consistent I want the data. Right now I don't have anything inserting, so that's fine, but if I wanted to query on stuff as it's coming in, I could just weaken my consistency there. Again, don't have to think about that if you don't care. Like for this one, let me find ones where the. Doc source is, you can see here, we can look inside Jason is me about that. Can I do that? Yes, I can. Okay. I still got a couple of results here. You can add another condition. What if I want the primary category to be vector database. And now we've limited that further. Which is nice, and you can copy all these things, pretty easy. You can also add data here if you want. If the data is partitioned, As you can see here, this one is, it tells you how many entries in each partition. This is for performance. This one is partitioning based on a key, and that key is that category, so there's a bunch of different ones. Now we can do a vector search based on a specific thing. As you saw with those examples, you upload an image, and that'll be your vector to get turned into The appropriate size number and we can limit however you want here. What's nice with this one is this will give you the code. So if you want to use an app to run this, you can just grab that code. You can also export the data as a comma separated value. Pretty easy. Just to give you an idea how you can browse your data. Let's go to our other thing here. Hopefully I didn't make this too big here. Let's get out of here. Okay. So you could see, I could search it a couple of different ways, import data. This one's nice is with the Zillow's cloud. There's a REST endpoint. So you could try things out looking at a query, see what you'd get back, however you want to do that. You could also browse the data. This is cool as you find data you want and use that to search. And when I do this search, you could see back what's got the number one rank? This one. Because it's an exact match. Because that's the one I searched on. That sort of thing. Now we have a bunch of different notebooks here that do a lot of different things. Again, like I said, they're documented to make it straight forward. Show you what libraries you need. Here, the stuff for Milvus. Here, I'm saying where I'm getting my data from. Connect to my Melvis. If you're connecting to Zillow's cloud, we have a token based system to log in. So I've got that in my environment. Check to see if I need my collection. This is how we add fields. You could add a ton of them. I add some indexes, create my index, see what the partitions are. I've got a thing to take out my HTML. I'm looking at changing that for a model that's coming next. Then I go through here and I iterate through. The top feeds from Medium, which is RSS, parse that out, get the fields I want. This is all it takes to insert into Milvus. Again, you'll notice here, using the model function to vectorize. That was listed up here. And that is a very simple one. That's using this smaller model and my CPU, running on a laptop, and using Hugging Face Sentence Transformers to do that. Ton of different models, doesn't look any different to you. And then you could check the results, run a query, run a search here, very straight forward. You can also do things like a RAG, very simple. Again, for this one, LangChain. Connect to make sure I use the right things in LangChain, connect to Milvus, there's my embedding, there's some text, primary key, make sure we connect, and then we just go through a loop and then we can ask questions, if you look here, when I get the result, I send those to a Slack channel, you can see here, question, answer, you can see here, Pretty straightforward to do that and not hard. We will give you all the source code if you go to my GitHub. You'll see I have a lot of examples. Lots of cool stuff whether you're interested in partitioning data, air quality, street cameras, I got a knowledge base, parsing news, doing multimodal. Doing stuff on a Raspberry Pi, looking at Olympics data, running on a Jetson, doing travel advisories, running on some other devices, looking at, oh, I forgot about the vehicle collisions. That one's interesting stuff in New York. If you saw what you like, definitely reach out. We are always trying to help people get started, whether it's at the meetup, a webinar, or in person. Definitely check it out. Thanks for coming to my talk here. See if I could search, show you the medium here. Got a ton of articles. Check them out. If you want a deeper dive into the Kubernetes internals, we've got some materials on that. If you really want to see it, but they work. All the Helm charts are available. Everything you need to get running. And it's pretty straightforward. So thanks a lot.
...

Tim Spann

Principal Developer Advocate @ Zilliz

Tim Spann's LinkedIn account Tim Spann's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways