Unlock the future: Built-in AI in browsers

Video size:

Abstract

Apps with AI superpower is inevitable but in order to use our creativity to leverage AI in our app, we must start exploring the future innovation that coming. I want to introduce recent AI APIs in Browsers to skyrocket excitement. We will explore different live demo + live coding.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hey guys, welcome to my talk. Unlike the future, built in AI in browsers. Better start learning to speak robot, isn't it? AI is rising. Everybody know AI is everywhere. AI even in our everyday life, like virtual assistants, recommendation systems, autonomous vehicles, even food. We have Coke with AI generated flavor. Come on. AI these days is not just a buzzword. Everything is focusing on AI because it is shaping the future of our industry. My name is Mohamed, Senior Software Engineer in Netherlands, ING Bank. This is the QR to my page, so we can connect on LinkedIn, have a chat, keep up with each other. Why not? I love to connect to enthusiastic developers, and you are, since you are here in this conference. So what's the plan? First of all, we will focus on local LLMs. So and compare them to cloud LLMs because we have two types of LLMs in this case, and we are focusing on built in ones, which is some kind of local LLM. And then we will make everything technical, focus on the API and overview what how the API works. Then we have some live coding guys finger crossed for me because live coding is always so next step Yeah, what lies ahead? So what is the next step that we can take or what's going on behind the scene later? So let's start with local LLMs. So let me um, Define LLM in a simple word in a simple word LLM is so simple You We have a prompt, like, where is Amsterdam? It goes to LLM, and then it gives us a response, Netherlands, right? converting a prompt to a response using LLM is the one that we are going to use. In a more technical way, LLM is a large language model. It's type of artificial intelligence that uses deep learning techniques to understand, generate and manipulate human language. So this kind of LLM, first of all, focusing on human language so it can generate, understand, manipulate. So yeah, like the question that we ask, where is Amsterdam is going to tell us. So let's see, differences between local LM and cloud LLM. When we use local LLMs, LLM is on our machine. So the whole, neural network and the artificial intelligence, all the technologies that lies behind would be on our machine. So we are limited to our machine as well in terms of, the performance. But the thing is, there are some benefits. The first thing is privacy. Why privacy? Because when we are using local LLMs, the data stays on your machine. It doesn't go anywhere else. So you shouldn't, send data through the wire to clouds and expect, I don't know, as your GCP get back to you with some results or some other services. So the thing is, yeah, privacy, which matters a lot. It's one of the biggest advantage of local LLMs. Then low latency, because everything is on your machine, low latency, you shouldn't wait for network and stuff, internet connection. So nothing is required in this case, because everything is on your machine. And then low cost, because no subscriptions required for this one. So because the LLM, it's just imagine there's a database, the LLM database is on your machine. So everything gonna work. some examples I can say Misrall is one of them and Blama from Meta. These are local LLMs. Cloud LLMs. Yeah, of course with cloud LLMs you have a superpower as I mentioned So since the model would be on your machine in terms of the local LLMs, but the cloud LLM Yeah, the model is somewhere in the cloud. So it could be Super big so it means that they trained it with more data. They have a better superpower there So the other one is network latency. Yeah, it is Up to date with a lot of data. It can give you better answers, but it has latency because you should send the data over the wire. So bigger data, slower, bigger data, it should compute more. And also high costs. So it has a cost because we are using third party services like Azure or GCP, Google Cloud Services. That's it. So in this case, we reach. But yeah, let's have a local LLM on our machine and thanks to the browsers which install on all of the machines and web application, that the application that we develop lies there. So it means that, yeah, if we put an LLM inside the browser, wow. So some kind of superpower in our machine, we can perform some magic. So let's see. How does the API looks like behind the scene, first of all, disclaimer. for the things that you are going to see no machine learning is required So don't worry that you don't have any knowledge of ml and this and other things so no ml No machine learning is required No, ai Knowledge is required as well. So you're going to see how does it work? So stay tuned the api is super simple First of all, you should create a session Since It is asynchronous. Yeah, definitely. We need a wait. So AI that assistant that create. So if you grab the AI object from window that assistant that create, you're going to create a session for yourself. So in this way, you can start, prompting asking question from LLM. Next step is prompting. As you see, with the session that I created, Yeah, perfect. I called dot prompt and then the text my question actually my prompt So where is amsterdam in this case? So do you remember that in the other diagram? I showed you Yeah, the answer is netherlands. Yeah Based on the data that the model has been trained for So the answer might be different and also a lot of different parameters, but generally, the data matters a lot so For me In the test that I did, it gave me this sentence. Amsterdam is the capital city of Netherlands. Nice. So with two lines of code, I could ask LLM something. So just imagine how, creative could you be in your applications to, interact with the user in a better way, gives them better suggestions and stuff. So we will see some use cases later on. The other one, the other API that matters a lot is prompt streaming. As LLMs are using generative. Technology, generative ai, technology, generative ai, it's, it literally means generating, then you call prompt. So it just waits from wait for the LLM to generate all the content, all the response. But if you use prompt streaming, it's different. It's the same thing, but. As soon as LLM, predict the next token, which is the next word, just imagine in this case, so it's going to give you the output. So this is a, a standard, stream in, in the browser. So you can, just await for that and expect some chunks from the stream. So as you see, the output would be something like this. The first answer done. Then is. Then is the capital city, is the capital city of, is the capital city of the, is the capital city of the Netherlands. that's how it works. when you are dealing with prompt or prompt streaming, just imagine how does it work. this is a cool API from the prompt, from the, built in AI API. Yeah, no NL, expertise required, no AI expertise required, no neural networks, some magics, but to be honest, prompt engineering can make it much better. So there are a lot of prompt engineering out there that you can check them out. First of all, how to write a better prompt, how to ask LLMs or what are the tricks that you can use? What are you know how LLMs can answer you better? So these are all the things that you can learn. For instance, one of them is Nshot prompting. Nshot prompting. It's like giving some examples to LLM Please use these examples and get back to me like I asked for. Let's start from the top. I'm using again, AI assistant that creates. Then I'm passing two parameters, temperature and top k. So these are these two parameter are so details. I recommend you to read them in the documentation that I'm gonna share the link later. But generally the temperature, for instance, in this case is something like how creative LLM could be. Most of the time, if you set it to zero, it's something like it's gonna give you the same answer all the time somehow. And then I initiate prompts. Some system prompts, some kind of persona, I give a persona to the system, pretend to be a tour guide. And then I give some examples. For instance, for Netherlands, if user asks, the assistant says, let's go to Rijksmuseum. If I talk about Italy, say, let's do Colosseum. see, these are well known museum, tourist places. what if, now, I ask something new? tell me something about Iran. the thing is, since LLM just learned that, Yeah, my user expects this kind of sentence. Let's go to somewhere! yeah, the answer, the response in this case would be, Yeah, let's go to Grand Bazaar. Why not? So nice. That's how it works. So it's one of the, techniques to improve your prompts, which is nshut prompting. So it's time for some live coding. Finger crossed. Let's see what do we have. I have a simple form at blog post. I'm gonna put my context and tags for that blog post. So let's use. The API that we just learned, just improve our application simply. So you remember what was the API, right? So it means that if I want to run the API, I should run it on the text change. So let's see the code that I have. Really straightforward. Add event listener on the inputs of my text element. Nice. So far so good. In the prompt function, I have an asynchronous function. I am awaiting to create a session. Simple. Then. I'm prompting my LLM, my local LLM. So my, my prompt is something like this. You are an author helping people to find the right tags for their blog posts. nice. Some kind of a persona. Consider the following text and suggest tags for it in a JSON format. Yeah, every developer's favorite format, right? Is it? Yeah, anyway. So I'm gonna put the text inside, an html type XML type. It could be anything. It could be back tag, backpack tick or something else. But to be honest, I prefer in this format because LLMs most of the time understand it better. So I'm awaiting for the response, getting the response and just like it. So let's see what do we get from the text. First text is AI is awesome. Let's see the console. Come on Oh, yeah, as you see it took a little bit time and it makes sense because it was Computing on my machine with all the stuff. Yeah, it is somehow in json format as you see It gave me in a markdown format, which we are gonna trim it But yeah, we have an object full of tags For the AI is awesome. It detects artificial intelligence. Nice, an AI tag. Yeah, nice. So we are gonna show tags here for Our application. I am so happy to be here. Let's see. Okay. yeah, in food. Come on, because I'm just, typing, it's gonna run the function on each of my, pure press. So that's why you can see a lot of logs here. yeah, the last one is this one, yeah, happiness. Yeah, it could be. Okay. a good tag for my blog post which contains this text at the moment Or it has a lot of others as well sentiment tone emotions nice. But anyway, this is the last one and the most accurate one because right? Yeah at this time it had all the text. So let's use it. So I want to extract tags, right? so My response contains some backtick and the JSON in the front. So let's replace them. I'm going to use regex, find all the backticks and just move forward. Everything that you found, replace it with. Nothing. So these are the tags. Yeah, the tags string. So let me parse it with json. parse as well to make it a JSON, an object. And let's log tags for myself. Let's see what do we have in the browser. So text is happy. Yeah, nice. Thanks. Happiness. Great. So far so good. what else? So let's have a four loop our tags. Nice. I have a UL here. Name tags. I'm getting it by. Id You have tag element. So let's me create an li. Oh, nice. Nice. Thanks copilot. We have an li, set the text content to the tag itself, append it to the tags. Let's see the result. Refresh. Happy. Yeah, live coding. You should wish me luck, but why you didn't? Anyway, let's see. Let's see. Let's see. What's the problem? Let me put a log here. Nice tag element append. Okay, li create element li. Yeah, everything makes sense. I just want to make sure that we have some content for the tags. So yeah. Oh yeah, guys, tags. It has the property tags. So yeah, so I'm gonna just put tags there. So right now we have tags. So everything makes more sense. Let me go for happy. cool, cool. Yeah, happy. Yeah, see, yeah, since I was typing. So yeah, music happiness. it's all those prompts that it was doing during my typing. So I can do two things. Let me go for tag element in our HTML empty. Yeah, because each time I should clear the list. So even I can set a loading or something which doesn't matter in this case. Yeah, happy So okay. Oh, yeah, it's working. It's working. Anyway, you can see happiness here So if you want to create a tag, you can you know Use this suggestion link to create a tag and also as these are just redundant, right? So I have already loaded the low dash here for myself. So yeah easily let's use And the bounce from low dash put it to 200 300 millisecond something like that to, just debounce all of them. So happy. Yeah, just one super fast because yeah, there aren't a lot of prompt to my LLM, which is my computer. So it's going to make it slow. So on and so forth. So easy peasy. So as you see, with just, i dunno, from 55 to just 85, 20, 25, 30 lines of code, simple codes. So I just could to, I could prompt LLM, get some tags for my blog post and just suggest them to the user, make my user's life easier. So everything is great. Let's try it out for the last time. AI is. Awesome, I am presenting at conf. Let's see, what are the tags that it suggests? Conference. Yeah, technology, artificial intelligence, conference. Yeah, good. So it means our application is working in the way that we want. And yeah, that's it. So let's move on. So maybe you wonder how did I oh, I'm sorry. So I'm gonna start right now. Okay. So yeah after the demo So we can cut it and right now the second part So maybe you wonder how did I set up this API in my browser? First of all download Chrome Canary or death. It doesn't matter for sure latest version So if you go to this, url, these are internal urls of the Chrome colon slash flags and if you search for prompt api for gemini nano Yeah, just make it enabled and also another flag in the flags url optimization guide and device model So you should enable that one as well enable bypass perf requirements Who named it like that? Anyway, The last Thing that you should do, and actually it is one of the most important one, is downloading the model itself. if you go to Chrome, URL Components, so you will find, this name, Optimization Guide Ion Device Model. just click and check for the update, so it's gonna No check for sure. There is no, model on your machine, so it's gonna start downloading. So we just need internal connection for the first time to download. So it means that later on, if somebody wants to use the API, the model has been downloaded and can use it. So what's the next step? What lies ahead? One of the most important thing step after my talk after the thing that we just learned is you. So I want you to join built in A. I. Program from Google. This is the U. R. L. This is the Q. R. Two. that page just joined the program so we can, help each other understand more. Look what's going on. If you face an issue, because we are talking about the future, right? Since we are talking about the future API that is coming, so for sure it's not, a stable even from time to time I face issues as well. So keep in mind it is very important. So to, just make your hands dirty with the code to experiment it. So this is the URL and QR. So thanks for joining. And also, From time to time people ask me, so how do you see this AI? Where does it go? Will AI replace us? To be honest, I always believe in this sentence The best way to predict the future is to create it. So Right now we have the API, we have the AI, we have the LLMs, we have the APIs, everything is there. Come on guys, we are living in a world full of, innovations, APIs, so let's keep it up. just try to use the latest APIs and stuff and start learning, create something, it shouldn't be production ready, something, or it could be even one simple page for yourself. yeah, why not? Let's start using these features. And in this way, You can predict the future. You can see what are the features that are coming. So in that way, yeah, you can adapt yourself to the future. So sooner or later, if something happened to our you, industry because of the A. I. So you are ready for that, at least before that predict that. So let's see some use cases off this A. P. I. For instance, classification, tagging, keyboard extraction, helping user, to compose text summarization, generating titles, headlines for the articles, and answering some questions based on the unstructured data. So you know that, the data in web pages or PDFs and stuff are somehow unstructured. So you can, because LLM understands it better than us. So it's not something most of the time, it's not tangible to our eyes, but to LLM it is. Also translation between languages, it is one of those APIs that is coming. So a lot of features are coming, a lot of use cases for this API. So yeah, it's us that we should think about it and start innovation. So if we know that this API exists, so we can adapt ourselves to the newest changes. So yeah, AI is everywhere. Let's be part of it. Thanks for listening! If you have any questions, my email is here, also the QR to my page, all my social media links are there. I would be super happy to have a talk with any of you. Happy to see you guys! Hope catch up next time!

Slides

Download slides (PDF)

See all 36 talks at this event!

Conf42 JavaScript 2024 - Online

October 31 2024 - premiere 5PM GMT

Unlock the future: Built-in AI in browsers

Video size:

Abstract

Summary

Transcript

Slides

Mohamad Shiralizadeh

Software Engineer @ ING

Join the community!

Featured event

2025

2024

Info

Conf42 JavaScript 2024 - Online

October 31 2024 - premiere 5PM GMT

Unlock the future: Built-in AI in browsers

Video size:

Abstract

Summary

Transcript

Slides

Mohamad Shiralizadeh

Software Engineer @ ING

Join the community!