Conf42 Python 2025 - Online

- premiere 5PM GMT

Implementing Agentic AI Solutions in Python from scratch

Video size:

Abstract

The use of AI and AI Agents in everyday Django can be viewed as AI as API where we can create on the Django side powerful agentic APIs.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello, everyone, and welcome to the implementing agentic AI solutions in Python from scratch. Here is the repo, and you have full access to it. We have all the code samples. I'm using my notes here, which I'll have access to, as well as an HTML version, if you so need it. So who am I? I'm one of us, a regular Pythonista. I was in tech in the early 2000s as a business information architect and certified Microsoft SQL Server DBA for about four years. And then I returned in 2017 via WordPress and JavaScript. Frameworks. Moving to Python and machine learning in 2021. Currently, I'm working on a project, AI powered knowledge systems, building a book framework similar to my Python. test full stack, which is here at pytestcookbook. com. I've also got some useful notes on Django full stack testing, and here is my main project at the moment. I'm based in Brighton in the UK, down on the south coast, and here is our lovely beach. And I'm a volunteer coach at Cobar. io, which I find very rewarding. We meet every two weeks. And I've just got myself a new Red Fox Labrador pup, Leo, much earlier than planned. And locally, we have a red fox that is quite tame, and seems to check Leo out, and they both stare at each other, wondering who's who. My first computer was in 1979, and it was a paper tape reader, with a teletype printer for output, and cut and paste was cut and paste. So what are AI agents? The word agent is a subject of much discussion in academic circles, but if we're looking at AI agents, we can see that Pydantic has its version, a primary interface for interacting with LLMs, and an Anthropic also defines workflow and agents, and Hugging Face says AI agents are programs where LLM outputs control to workflow. What we're going to do is we're going to look at examples of code to see what AI agents are and what they can do. Now, for example, if we look at this link here, we will see The range of AI agents that are being created. There's 46 character categories, 825, all different versions in different areas, so it can be quite overwhelming to know which framework to use and what they are. What's the aim of my talk? Well, my talk is to a chain to achieve that, to demystify AI agents and AI programming because it can seem like it's another different world of development for us, ISTs. And what I'd like to propose is what if AI agents adjust Python code with a REST API call, admittedly to a very magical API? Then we could use day to day Python design patterns to handle the responses we get back from these API calls to the AI, to the LLMs. And so the main focus of this talk is to demystify and simplify, and not to focus on actual real world applications. And with that in mind, we don't need to fully graph the code this time round. It may take a few minutes. iterations to fully grasp it, so perhaps to look at the high level view and to see how it is different from just regular Python and to realize that actually AI agents are Python code with REST API calls to an LLM, admittedly a very magical REST API. In fact, what I'd like to propose is that actually there's no real difference between doing our regular day to day Python and Python code. It is very much like a mouse that we turn around 180 degrees. It's still the same actions, up, down, left, right, but in a different way, in a different paradigm. And that can be a little bit tricky to get to grips with initially. And in this regard, there are three areas that I consider to be part of agentic AI. First, it seems that we can almost create on the client side the endpoint, the roots, that we would normally build on the server side. We also use natural and human language, in my case English, to create code, very much like pseudocode. And thirdly, we give a sense of autonomy, in a sense that the LLM can direct the flow of the app. And that could be within bounds that we have created. But basically, the next step can be determined by the LLM, which our app will take. So before we go into some code examples, why don't we just refresh ourselves on what a REST API is before we start using any library implementations by using just the requests library so that we can see that how we actually do a POST request to our LLM rather than using, say, an OpenAI library that hides the implementation of the requests. So we have our model, we have our endpoint, and it's worth noting that we only have one endpoint, one root, and we'll see why this is apparent later on. And we will need to send our token, our API key, in the headers. We will send our payload, for example, our model, a list of messages, whether we're streaming, the temperature. And here, with requests, what we can do is send POST. With all these details to get a JSON response and to bear in mind, the request is a string of characters and doesn't contain any objects or other data types, and basically it's a JSON dumps or if we were in JavaScript, it would be a JSON stringify. Where all that information is created as a string. So what we'll do now is we'll look at our very first file, 01, to see this in action and to see the basics of an agent application. We're now in our very first file, 01, and the first thing we're just going to do is load in our imports of OS JSON requests, and to read our keys from the env file. I'm going to keep mine secret here, but you have a copy here as env. sample. Just paste your Open AI key here. If we come back, what we can then do is load in our AI key. And just check that we have it, which we see we have here. We're going to select our model, which in this case is GPT 40mini. And we just check that it's here. And for the demonstration, we're going to create our own class to show how we can do a request to the endpoint. Later on, we'll use one of the libraries from OpenAI. But we're just going to see in a very raw form actually how we would create our own class. request to an LLM directly to its endpoint. So we're going to be making a POST request to this endpoint. And what we're going to do in this class is use this one single endpoint, and there's only ever one endpoint. It's not that we have different routes for different tasks that we want to do. We have the temperature, which is a hyperparameter that kind of takes into account the probability. Zero means it's very strict and is as deterministic as possible. If we want to be more creative in generating text or images, we would vary the temperature to, say, 0. 5, 1, etc. The range is between 0 and 2. We have our system prompt. We'll get into that. Our API key. And here we have our headers so that we get authentication. Giving it the content type. This is our request. And when we want to generate some text, we pass in a prompt, our query, our request. We pass in the payload of the model, the messages, the system prompt, our prompt. We'll get into that. Whether we're streaming, in this case false, the temperature. And here, we make our requests to that endpoint with headers and also with the payload. And we can put in here for future reference, URL equals. So that's a little bit clearer. So now that we've got that class, let's create an instance. We pass in the model we want. We pass in the system prompt, which is like, gives the character or the personality or the role. of our agent. And we're saying in this case, you give concise answers to questions with no more than 100 characters. We can get into more complex system prompts later, and that's the sort of prompt engineering. And we can then make a request to say what is Pydantic. And if we print the response that we get back originally, it is a JSON, stringified JSON object of all this information. But if we drill down through choices, message, and content, we will end up with the Pydantic. response that we want by Dantic is a data validation and settings management library from Python. And you can see how it's respecting the 100 characters. So this is our base and we can see that if we were not using any of the AI libraries but merely using our Python library of requests and env, that this is how we would send a POST request to the LLM with all the details that it needs. what we would get back. And what we can do is we can just hide this and then show the response. We can see it's quite complicated in quite detail, but it's choices is where the answers come back. We want the very first one. And as we work through, we see message. We want to look for the content where the answer to our query lies. So this is our base template. We're going to replace it with OpenAI library for our calls. Let's now move on to the second file, O2 API. And what we're going to do is we're going to get a joke back, but we're also going to make a little bit of prompt engineering to get the LLM to rate the joke and give us the next step. So let's go into that to see an example of that. So if we go into O2 API, We again load in our usual imports. I'm using rich to colorize the console output. And we know we can make a request to some random joke API. To get a joke here, using the request library, and we get the joke. We won't go into that. We will load in our open API key. We'll get our model as usual. And what we'll do is we'll then set up a more elaborate, system. We're going to start with our basic system message. You are an assistant that is great at telling jokes. But we want to get something a little bit more advanced. And it's almost as if we were sending to the endpoint a new endpoint that would do something totally different for us. But what we will find is actually that we will do this on the client side. So. Let's just add this extra prompt here, and I'm just going to call it prompt engineering because it's separate from the system message, totally customizable, doesn't have to be prompt engineering. And we're giving it a set of instructions, and we could almost consider this to be pseudocode. A joke worthy of publishing is a joke that has a rating of 8. 5 or 10 or above. If the joke is worthy of publishing, also includes the next step, whether to publish it. Otherwise, tell us what the next step would be, which we should retry. We give it an example. of what we're going to get. So this is called, one shot prompting or multi shot prompting where you give one or more examples. And basically we're giving it an example of what we would like to get back. Please supply the response in the following format. It's in JSON format here. And to actually help it, let's actually, although it works, to be more specific, we'll say the following JSON format. Because we want to have clear instructions. And here we see we get the setup, the punchline, we ask it to give us a rating, and we also then tell us that based on the rating and whether it deems it worthy of publishing, to either send the next step, that we pass on higher up in the app chain to publish or to retry. We're also giving it some further instructions to remove all backticks, any unnecessary characters. Once again, I did say JSON format here, and using capitals can be a useful way to emphasize things for the LLM. And also we're saying that if we do a retry, do not repeat the joke, and we can even say thank you. So this is our pseudocode. This is almost a new endpoint that we would like to have where we could send a request and it will process something different rather than just returning us a joke. But what we do is on the client side, we send that code up to that one endpoint. And that's the mouse being at 180 degrees different. It's like we are now creating our own code. REST endpoint R route here. So we just add those two together so that it's now the system message. And we also have our user prompt, which is going to tell a light hearted joke for an audience of Pythonistas. And when we send up to the ALM, we want to send a list of all these messages, and the convention is we have prompts. We have a role for system, and its content is the system message. We have a role for user, and the content is the user prompt. We can also have role of the AI assistant, because we may want to filter out the messages. And this is the practice with these LLMs, is that we have a system message, user message, and an AI message, which is the response back from the LLM. So we complete as before, but this time we are using the OpenAI. If we scroll up to the top, we're using from OpenAI, one of its libraries. So we can just create the client here, An instance of OpenAI. And therefore, when we send the request, we don't need to be as detailed. We can use a convenience method. Client. chat. completions. create. We send the model we want. We send all our messages in, which is these prompts. And when we get it back, we will get a response. Again, it's in the choices, it's in the message, in the content, and we can display it here. So we now get a JSON object here, where it has the setup, the punchline, The rating and what it advises to do next in the flow, which we publish as opposed to retry. Now, in our app, we might have a state object that holds all of this information because this is we've made the REST API call to the LLM. But what we do next is day to day Python. It's any system design, PubSub, ActorModel, FiniteStateMachine. We've now been given exactly what to do next. Now we can pass that over to another agent, or we can act upon it ourselves. And what I'm going to do here is I run it out. I've extracted it out nicely. JSON data. And when I look at it, I can then basically see that if the result next equals publish, for example, I can load it into our state object, and I could then go on to publish it, to send a message to another part of our program or to another agent. The main thing is we're kind of getting almost like an event driven application, but we've been asking the LLM to decide the next step. And that's the autonomy, because it could have come back with retry. And in which case, the flow of the program would go in a different direction. And so what I've just done here, I've extracted out the next step, which is publish. So this is how our app would have its flow and its direction directed by the LLM, as opposed to us imperatively. So to recap with this, the main thing is we're structuring what we would like to get back. Rather than just getting a joke, we're setting back an endpoint. We're setting back instructions. This is our pseudocode. This is our REST endpoint that we send from the client up to the server along with our payload, and then we get the response back as we want. And we're going to see various different forms of these prompts. They could be more advanced and more structured as we go along. So when we get that back, we can then basically decide what we want to do next. If we're having many agents, we may keep track where each agent is. So this was O2 API. An initial start into prompt engineering and how we use the client side REST API coding in natural language, enabling autonomy to take place in our AI agent. We've seen the two out of the three steps of the AI reverse process, where we kind of created on the client side our route, our endpoint, and the use of natural language. And we briefly looked at autonomy. So how do we handle this autonomy of the flow of an API? AI agent app. Well, we asked in our LLM to give us not just a rating, but to give us that next step. And in this case, it was published. So what we'd like to do now is to begin to see how we can handle this in our app. And we're going to go on to our next example, which is a sort of an idea is leading into the idea of an FAQ or sort of the router pattern, a sort of if else. And what I like about the FAQ pattern, which we're going to see now, is the fact that we can introduce a little bit of AI into our app. So let's have a look at this example of OA03FAQ. We're going to see an example of sort of retrieval augmented generation. Now, we're not querying documents, but RAG is basically supplementing our query with additional information that we haven't fine tuned our model with or that's important for our LLM query. So what we're going to do is we're going to give it a list of frequently asked questions and have a little chatbot experience. So And this is going to pave way for the next file, which will be a sort of a router pattern. So once again, we load in our imports, we get our key, we get a utility client from the OpenAI library to ease our connection to the LLM, which is picking our model. And we're going to create now a function that has some history, that has all the previous messages, that has the system message and the user prompt. And the history is important because it gives us a record of what went on before, because every request is stateless. It's like the LLM is seeing it for the first time. So we need to pass the history back with it so it has context and a certain degree of memory. So in our function chat, We're having our role, our system. We're adding in the history of all the previous messages that we have in our chatbot and our user message. We can print out the history, print out the messages. We're using the stream option for our chatbot, which we'll see in a minute, and we're going to then chunk out our responses. So this is where we start to build more context into our chatbot. agent. We're saying it's a helpful assistant for a shoe store. And if a user asks a question, please be as helpful as possible and as courteous and professional manner. You are provided with the following facts to help you. Please be verbose and suggestive. So now I'm changing like the character and the nature of the prompt, adding in a little bit more verbosity and suggestiveness. So here is a list of just some basic facts about our shop. And bear in mind that this could be retrieved from the database. This could be a result of selecting from the Options from a form for further information. But we can see here is if we just pass some extra content, some retrieved content. Admittedly, it's already in the file, but it could have been retrieved from a database or from some other source, and we join it onto the system message. We can now use Gradio to set up a little chatbot interface to see how this works. And if we look at the code while this is working, it hasn't taken a lot to introduce a fairly sophisticated little chatbot based on very limited information. We could very much increase this greatly in our app. So as that runs through. Almost there. Come back down. We have our Gradio interface, and let's just open that up. In fact, I'll put it in. There we go. Here's our chatbot. Type message. Now, it's quite simple. I could just say Sunday, and if we look at that, it's coming through. Thank you for your inquiry. Our store is picked up. It's about time. It's Monday to Friday 9 to 5. Unfortunately, we're closed on Sundays. Notice how it's quite verbose and suggestive. We look forward to welcoming you soon. If we come back to our code, we can see that here are the facts. We don't have very many. This could be a much more complex document. It could be an MD markdown file. It could be driven from the database. So if we come back and let's have another example. I can say green belts. Thank you for reaching out. We don't do that. Gives us the address. It says that basically exclusively in shoes. So this little example here of basically having an agentic AI with a minimum amount of rag, a minimum amount of extra context can produce a nice small little app in your, small little AI app in your Python app without having to be fully AI. This is what I call a bit of AI programming. Now, this is quite an interesting pattern, because in the next one, it's the agent router, and it's something that happened when I was at CodeBar. somebody asked that they would like to get a job in AI. They were doing Python. And I said to them, Do they have an AI department where they work? And they said, No. And I said, What do they do? They said they were insurance. I said, What do you do? And they said they they don't write the reports that they're there to go to person that when somebody wants a report, they know which one it is, and they can run it for them. I thought, Brilliant. You can do that. And there were concerns. They said, put me out of a job. I said, yes, but you'll then be the head of the AI department. And so what we're going to do is we're going to have a little variation on this FAQ. It's a similar type of thing. We're loading in all the usual imports and setting ourself up the same chat message. And a useful tip is caps and italics and even markdown in one's prompts actually have an impact with the LLM. It's being trained on so many of these that it begins to recognize the importance. And I'm just basically setting up a report agent. I'm saying you're a report selection agent. You're very good at returning the best report to answer a user's question. For example, if a user wants a joke, you reply with, and I'm just using this format for demonstration purposes. This will make it nice and bold. The tool they need is the get joke report. If they want total sales, the tool or report they need will be the get sales would be the best report. So I made a list of reports here for whether use the get weather for hotel, the hotel booking, very much like we did in the last FAQ, adding them all in. And if I run all of that, we will see now that we can actually have a report selection agent that will get the right report. And if we combine that with information like the date range or any other properties, we could even run the report for them and send it. All through agents. So let's check. We've got this one working. Lovely. Okay. So say I want to take a plane and notice I didn't use the word plane. I use flight. Plane to Rome. What report should I get? Get the flight. Plane to Rome and Auto, let's just check the typos there, to Rome, and auto to Paris. It comes back with those two reports. So straight away, with that, if we had the information, shall we say, sent along with it through a form selection, we could then get the right report. And so this is an example of a sort of router that actually, what do we want to do next? Through a very simple addition of some context to our system message, I'm going to return to this slide a number of times because it's easy when we're going through these different patterns and function callings to lose sight of what is the essence of an A. I. agent and A. I. agents are python code with A. P. I. request to L. L. M. s. We can only pass a string in the A. P. I. request and in that string we create a job description. And this could be what the role is, what they do, these are the tools you have, here is the data to work on, this is what we want returned, and in what format. This is prompt or flow engineering. How we make use of this in terms of design patterns is then day to day Python. Fundamentally, it is a function with inputs, LLM magic, and some output returned in the form that we want. And we'll return to this slide as we proceed through some of the more involved examples. Now we've seen how an agent can make decisions. About the next step, the autonomy, how we've created our sort of client side API, and we use natural language, but an agent may need tools. It may need to do anything. It may need to make a request to the Internet. It may need to call upon a function that we have in our code base to calculate something and use that it. as part of its response. So what we're going to do in O5 is we're going to look at not how we just define tools, but also how an agent can decide which one to use. Now, usually it's better to have an agent to just one single thing, but sometimes we might have an agent that might need to make a decision for a particular task, which tool to use. So we may need to determine Which tool to use in the agent and basically, what happens is we're constantly adding new messages. Now, where does those functions run? What's going to happen is when we've given the prompt, which we're going to have a look at now in 05. What we're going to do is do the standard setup. But now when we come to our tool prompt, we're saying you're in a system that is very good at determining what tool to use to solve a certain query. And our AI programming is we give it in descriptive form, which is saying we're using Markdown here to emphasize this is tools. We have two tools. We're describing our calculator tool that does basic arithmetic. And it responds in JSON. And we give it an example of the JSON format, of that, when we pick the particular tool called Calculator, the next signal will be to use the doCalculation function, and, for example, what argument should be passed. This is an example of one scenario. We have a second tool, which is a joke tool, something totally different, JSON format. And this is what the result the tool would look like. So, for example, we can ask it a question. What is 10 times 9? And what we're going to do is when we send that out, it will actually determine that the tool it needs, the response it's sending back to us is the calculator, the do calculation, and the arguments that are needed. We can strip all of that out and basically then say if the do next was do calculation, we can run those functions. If it was perhaps, for example, to do the joke, we might do an internet request to get a joke. So let's just go back to the flow of messages. We're doing all the usual messages. But when the API, when the LLM decides that it needs a particular tool, what it does is it sends back the tool signature and the arguments that it's extracted from the query. We then on our own computer box, not the LLM, run that function, get the result. And add that back to the list of messages and send it to the LLM that does the next step, for example. Now, that particular example is one where we're going to see later where we're going to use planning, where it's thinking, making an action, getting a result, putting it back in the loop. This particular example here in 05 tool is just the first step of showing how it can determine which tool to use. So I'm just going to run everything. And as you see, we load in our imports, we get our key. We get our messages. It's added into the system message at the very end here. Our code. This is our endpoint. On the client side, in natural human language. And it's like a very clear description you give to somebody when they join a company. This is how you do your job. The more detail, the clearer you can be, the better. So, for example, we asked, what is 10 times 9? We've added the messages in. We've got back this response that it's determined the tool it needs is the calculator tool. It knows what to do next. It's the do calculation, and it knows the arguments. 10 and 2, 10 and 9, and the operation is multiplication. What is 10 times 9? Notice how it's picked up times and multiplication. What we can then do is strip that all out, and if we come back down here, in this particular agent, we can actually run some code, or we could pass this on to another agent, or run it through a loop again, which we'll see later on. So in this example, because it knows that doNext says doCalculation, it extracts all the information, the tool, the arguments, and then basically just carries out these functions to produce the answer back. So if we do that again, but with some different numbers, 102 times 3, and we run it all, and we see it going back through to clear. It's now picked out arguments 102 and 3, operation multiply. Let's do add. What is 102's plus 3? Let's run that. We can see that it's picking out the arguments. It's knowing which tool it needs. 1 0 2 3 and addition. And when we come back down to the answer, it produces the answer 1 0 5. Let's see if it picks up if it needs the different tool. So, for example, here, I'll do that. Tell me a joke. I'm doing this at a builders conference. Let's run it all. So now it should determine that what is the right tool. It's the do joke tool. It's a different tool. There it are. The joke, do joke. The audience has picked up one of the arguments. It doesn't run here. Because the do next is do joke, it just goes off to the internet and gets a joke, and then we see the answer. So you may be thinking they all seem a little bit the same, the router, the tool use, the basic query. And I suppose they are because they're just function calls. At the end of the day, we're just doing Python functions, getting a response back from an API, then doing something with that response. The difference now is that we create our endpoint, we create our route on the client side, ship up that pseudocode. ship up the kind of query, and we get the answer back. And we also enable a certain level of autonomy because we can ask the LLM what it should do next based on the prompt that we sent it. So this is what is called tool use, or it's just function calling. And it's just a mechanism of how we do the function calling. And once again, the function takes place on our box. We don't run it on the LLM's box. And when we get the result of that information, we pass it back to the LLM. And we're going to see that when we come on to the what's called the reason act react type pattern for planning agent. So we've seen a few little pieces of how we can create simple AI agents, and some can be quite powerful. Your utilities, like we saw in the frequently asked questions or the report selector and What we're going to do now is actually look at the four main patterns. Andrew Ng, in his lecture, listed here, talked about the four main patterns. Reflection, where the LLM kind of re examines its own work. It sends it back to itself, but with a critique to say, make a critique of what I've just sent you. Tool use, we've seen an example of that. planning, where it comes up with a plan to execute a multi step goal, and also multi agent collaboration. So we've seen a number of these examples, and what we're going to do now is we're going to look at the reflection pattern, the tool pattern, planning, and the multi agent pattern. So let's go into the code to look at the reflection pattern to start with. We're now going to look at the reflection pattern. And let us not forget in essence what we're doing. We're just creating a big string job description that we send. We get some response. We may append that to a new request or start a new request from fresh. But in this reflection pattern, what we do is we generate a response with our first query, then add this content to the request in a second query where we've asked it to do something, and in this case to have a critique and further refinement. So in some sense, the first request, it can be considered as actually almost like RAG. That we're generating some content to add to a new query, augmenting it with a new set of instructions and getting a response. And what we're going to do in this one is we're going to ask it to generate some Python code, and then we're going to ask for it to critique it and make some adjustments to produce a final response. So we use our usual standard opening, getting the key, setting the models. And we can see here that we're setting the very first system message, first role, as a Python program tasked with generating code. And we're generating our chat history. We append our system content, the job description, as it were. We then add to that, list, the user query, which in this case is generate a Python implementation of requesting an API with request library. And we then send that to the LLM to get our response back. And here we get our response. And before we do that in for the next step, we also add what we got back. So we can see we get the response from the LLM. It's giving us some sample code with an explanation. Now we want to reflect on that. We want to send it back again and have a little critique or refinement. So in our chat history, we add in now another system message, and we're saying you're an experienced and talented Pythonista. You're tasked with generating critique and recommendations for the user's code. All of these messages are attached together and sent, because we must remember that the request is stateless, so it needs to know what went on before, so we must parse in that history of the conversation. And we then get our critique back and display it. And here is the output. Your code demonstrates a solid approach going through it all, all the way down. And then we add this again, this critique to our chat history, send it all in again to get a final summary. And here's our final summary in form of an essay. You scroll down, and here is our final output as requested. Now, of course, what we do and how many times we go through it is up to us, but the pattern is literally just Making requests, getting a response, taking that response, adding it back into our chat history, adding in a new prompt to do something with that and get a response. It's one function after another with inputs producing outputs get put into the next function as an input that produces an output. So this is a reflection pattern. And as we scroll down, we see the final answer. Key improvements. And, of course, we can make this a class and make the input come through a certain form field or through a chat bot. And once again, if we go up to the very top, we can remember, in essence, this is really what we're doing. We're making a function. We're passing in inputs, getting an output and really chaining it from one request to the other. In this particular reflection pattern, we're going to look at using planning or using reflection and tool calling. Once again, let us remind ourselves that we're looking in this talk to see what AI agents are in terms of their simplicity before we move into frameworks. And what we're going to do now is use a sort of pattern of react, reason and act. And essentially, we pass the output of each step as an input to the next request to an LLM, like we did in Reflection. But we're going to add in some tool calling, and we're also going to be adding in some effectively routing, because the agent will determine which tool to use. And how to use it. We'll do this with the 20 planning agent with loop dot py But we will also do it with the notebook where we will do this Looping manually so we can see how it works So let's just go to the python file and we can see that we run in everything As usual and we're creating an agent class where we're setting the client the system role We're using the under core method. We're using an execute function where we can just invoke the llm to get a response You And what we want to do in this example is calculate the total price for an item. And the two tools we have are calculate the total that given a price adds on the VAT. We also have another tool, function, get product price, that for a given argument gets the price of the product. And if we scroll down, we will see these two functions here. CalculateTotal, GetProductPrice. So let's look at our system prompt, and we're telling it how we want it to work. We want it to think, take an action, get an observation, and repeat the loop. So we give examples of what the tools are. What the kind of response we'd like to get back for both the calculate price, total and get product price. And we're also given an example session. So what's going to happen is a user is going to say, what is the total cost of a bike, including that? We want the AI response to be in this format. Thought, I need to find the cost of a bike. We're pipe delimbing it so we can get the function name and the price. arguments, but it's going to be an action type as opposed to an answer. We're going to get the tool call and we're going to get the argument. We will get the response back of an observation of the actual return of that function, which we will then use as an input to the next query where it now needs to calculate the total price including the VAT. It's an action. It knows to use the calculate total, and it has an argument passed into it. This is just a sample example so it can see the format of what it needs to do, and that we will always be passing in the result of our LLM course as observation pipe 240. That's what it can expect. Then we tell it that if you have the answer, print out an answer for us in this form. So let's just run this just to see what it will actually look like. And if we scroll down. We can see we've got a loop. We're not going to get into each line of the code, but basically it's going to be looping around. If it's an action, it will run that function. Get a value and pass it back through to be used again. If it determines it has an answer, it'll print the answer and exit the loop. So, for example, we've got three here. One is the cost of a bike, a TV, and a laptop. So, let's run that and see what happens. It's starting the loop. It has a thought. It has an observation. It gets passed in. And each time we get the result here, but we can see the summary answers. But if we look at a particular loop starting the loop, it has the thought, I need to find the cost of a TV. We've extracted out the action. We've extracted out that the function call is going to be get product price and the parameter. We get an observation of 200. That now gets VAT back in, and it now knows it needs to calculate the total, including the VAT. It's going to use a calculateTotal with the 200 that we passed in. The observation is returned back as 240. That gets put into the loop to get the answer. Now, if you notice, we've missed one here, and we have the answer here. And that's the nature sometimes of LLMs, that things do actually go wrong because it's probabilistic. But we get the first one, the price of the bike is 120. We do get the result, 240, but we don't get it printed out in the answer. And if we were to run this again, come back in here and open this up, and if we run it again, we'll probably see a different answer. Starting the loop, 120, answer found, the price of the bike is 120. Starting again, we found the price of the TV is 240. And now we've got the price of the laptop, 360, and we get all three answers. So it can show that we can't take things for granted because the Things will be missed out. It is not 100 percent deterministic. So we need to put in some checks and balances if we were using these results. But this is an example of actually how we can loop through. So what we're going to do now is actually go to the notebook version and see how would we do this manually. What's actually going on? So We use the same setup, use the same system prompt, same functions. But what we do now is we call an instance of this planning agent with the client and the system prompt. We set up an instance of it and we pass through the first question. What is the cost of a laptop, including the VAT? When we run that, we get back the result, which is thought. That we can now extract out by splitting on the pipe exactly whether it's an action, what the function name is, and what the parameter name is, what the argument is. And as you can see, when we do that, we get the function, we need to run the getProductPrice, and we need to do it on a laptop. So manually, this is what we would do. We'd say, if the next function is, which we've just determined here by extracting out this item here in the output. If it's get product price, run get product price with that value. The argument that we got is the third, fourth item here. So we get 300, and we said in our prompt that we would send it back in in the form of observation, pipe, and the result. So this is what we're going to send as the next prompt. So we run it with our agent. Next prompt, the result. It now determines it still needs another action, but it's the calculate total and it needs the price of 300 to be calculated with VAT. So once again, we strip out the function and the parameter. We need to do calculate total and the argument of 300. And once again, we run this manually. You know, it's to calculate total. And the next arc, there are different ways that we could run that using, an eval method. But for simplicity, we're just going to run it manually is calculate total. We get the answer 360. Once again, we're going to send that back in, in the form of observation result. That's what we specified in the system prompt. So we're sending this back in as the next query. That's the next prompt. Pass it in to the agent, the next prompt of observation 360. Print the result. Now that it's determined it has an answer, it prints it out in the format that we want. Answer. The price of the laptop, including that, is 360. So if we go back up again and basically change it, and now let's work out for the price of the TV and see exactly the same again. We're going running through. We now know the action is get product price, but it's for a TV. It runs the function. It gets the price that it's 200. We send it back in and saying the observation from the previous action is 200. It then extracts what it needs, and now it's starting to calculate the total with the VAT, 200, extracts again, runs. We then run the function with the extracted argument, we get 240. We now send this observation back in the form that we said we would, observation pipe result. You send this into the next query, goes into here, you print the result, answer the price of the TV, including VAT is 240. Now, there are many ways you can refactor this and use eval without actually having to specify in a step the actual function name. That's effectively what we do when we do our loop. We basically extract all that information and execute appropriately in our loop. So I hope this kind of shows that although we can have many different patterns, Essentially, it's basically involving reflection, sending back a previous output as an input, use of tool call, and also the use of planning of breaking up a task into separate steps that it executes. So they all sort of overlap and are interconnected. And I hope that this actually explains how AI agents work and how if we look at where is the AI in all of this, literally it is just in the execute function to the LLM. The rest of the AI part is just where we invoke the instance of the agent with the next prompt that produces all these results. The rest is we're just stripping with day to day Python, sending it back in to the loop, and then running again another AI API request. Well, we've just been through quite a lot of code. It's something really to digest offline. As I've said before, it's taken me quite a while just to kind of work through and really get it clear in my own mind to be able to explain it to people. So if you don't feel it's kind of sunk in, that's not a problem at all. That's probably to be expected. And that's why I've given you the repo and all the code examples here so you can work through them, chop it up, change them around to get understand it. So we saw the sort of four main, we've seen sort of reflection, we've seen tool use, we've seen a sort of planning where we're sort of basically having to break a problem down into steps and work through them. We haven't really seen multi agent collaboration. And the reason for this is basically it's a design pattern of how one wants to use what one gets back from each agent with another agent. Is it the active pattern, the pub sub, finite state machine, as in, for example, like Landgraf? And what I'd like to talk about in closing is the libraries and frameworks that are available. And I like to think of libraries as sort of frameworks without the framework, as it were. Lots of convenience, tools, simplicity, that we can build the overall design pattern. And Pydantic AI is a new one that's come towards the end of 2024. Pydantic is well known in the Python community. And as you can see, structured output. It's very important because we want to pass structured output from one agent or from one part of our app to the next. HuggingFace SmallAgents is another very small, lightweight framework library that they implement so that one can create one's own overall framework or design pattern. Then we see many different crews and swarms and frameworks. And these are basically design patterns for, essentially, intercommunication between bits of code, our AI agents. And we saw before in the AI agents directory, how many different frameworks there are around there. As you can see, if we start with the simplicity as we have been doing, we might refactor it, build our own library. And then realize actually other people could benefit from them, customize it to make it more easy to use. And then we add another framework to the AI agents directory, for example. So in terms of frameworks, there's LLAMA Index, LLANG Chain, LLANG Graph, AutoGen, Crew AI, and then very many more. All very good, all very useful. And hopefully, having gone through the simplicity of these AI agents, not only could you maybe just design your own mini framework, But when you come to use these frameworks, you will actually understand what it is they're doing and how they're doing it. So, in summary, I hope AI agents have been demystified and helped us understand what they can do, enabling us either to build our own frameworks or use existing ones, with a deeper appreciation and understanding of how they work. And Anthropic has, in their blog, has come up with this, when and when not to use agents. And I'll just read it out loud because it sums it up better than I could. When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all. Agentic systems often trade latency and cost for better task performance, and you should consider when this trade off makes sense. When more complexity is warranted, workflows offer predictability and consistency for well defined tasks, whereas agents are the better options when flexibility and model driven decision making are needed at scale. For many applications, however, optimizing single LLM calls with retrieval and in context examples is usually enough. Thank you very much for letting me present this topic on AI agents, their simplicity and their power. My name is Craig West. Thank you very much.
...

Craig West

Backend Pythonista

Craig West's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)