Conf42 Prompt Engineering 2024 - Online

- premiere 5PM GMT

Prompt Engineering Simplified

Abstract

Everything is going well until a prompt that usually works suddenly goes off the rails. Sound familiar? Prompts can be tricky, and models are non-deterministic, but with a few prompt engineering basics, you can regain control, improve consistency, and achieve more reliable outputs with AI.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hey everyone. How's it going? My name is Dan Cleary. I'm the co founder of PromptHub and today we will be talking all about prompt engineering. We'll talk about why prompt engineering, is it dead? system message versus prompt or user message, how different models require different types of prompt and prompting methods, a variety of different ways that you can try to get better outputs through prompt engineering methods, best practices. Does persona prompting even work? Meta prompting and a whole bunch of templates and takeaways throughout. So I like to start with this question of like, why prompt engineering? this was a pretty popular opinion. I'd say early on that prompt engineering wasn't really a thing, but I think Over the months and years since ChatGPT came out, I think people have found that it actually can be quite hard to get the type of output that you're looking to get from a model consistently. and that's where kind of prompt engineering comes into play. And small changes just end up making a big difference here, because of how models latent space works. So if you say, write a code to render this image. write secure code as if you were John Carmack, which is like a famous software engineer. You'll get drastically different outputs just by those small tweaks. And I think that will always be there in some capacity. you maybe have to do less of the engineering and method type work in the future, potentially, but I think there's always going to be a small place for this, at least. Another reason why it's important is that it's one of the three major ways to get Better outputs from LLMs. and it's the starting point. So you do a bunch of prompt engineering, you see where you're at, what problems you're running into and whether you need to turn to other methods to solve those remaining problems. It's just a starting point for a lot of teams and it's very accessible. You can be technical, you can be non technical, and you can get your, up and running very quickly with prompt engineering. And so lastly, you really can't avoid it. And so for all those reasons, that's why I believe it's important, at least for the time being, and I think into the near future here. And lastly, in the same way that having good UI UX product experience is a competitive advantage. Having prompts that work well is a similarly important competitive advantage for AI teams. So now we'll talk about the different types of messages that models can support, namely system versus user. So system message, as you can see here, you are helpful assistant. Then we have a user message, which is like the prompt where you're sending to the model to get an output. So on and so forth. System message is optional. Like when you're doing these things via the API. so this is the stuff that you are, is behind chat GPT's interface in terms of what open AI has programmed, the. Chatbot to sound like and think and so it's optional when you're sending it via the API. It's used to set context and rules, so just the higher level things versus low level instructions. So setting the role, context, guiding the model behavior, controlling format, things like that, versus the prompt is where you get more specific, more of the contextual info. The focus, so on and so forth. And we have a couple of examples in the wild here from companies like OpenAI. Anthropic also published theirs, on their documentation. So you could see the system messages that power their Claude, chatbots. And so now we'll talk about how different models require different prompts. So if you're interchanging between providers and even models within the same provider, you've probably run into this experience where each of them has their own differences in the way they handle tasks and the way that they sound and respond. so for example, we'll take a chain of thought. so this is thinking step by step, prompting the model to do some reasoning before giving an output. There's an experiment ran where they found that chain of thought actually reduced the performance of Palm two, which is, an older Google model at this point, but just goes to show that doing what is considered a best practice didn't help performance. In this case, it actually made things worse. And That just goes to show that one size is really not going to fit all, which is the second paper we'll talk about, which is from VMware, where they basically tested a wide variety of different prompt openers, descriptions, and closers. They put all these prompts together using all of these different parts here. Here are a couple examples of a math dataset with the various system messages here. Some have a role, some don't, so on and so forth. And what they did was they had the model generate the best prompt. for that specific task. And so they, they let the model decide and do its own metaprompting based on the task and the outputs that it was receiving and so on and so forth. This was what Llama 2 created and you could see, it's including kind of some Star Trek language here. So this is after running hundreds of prompts, doing hundreds of metaprompting in like an experimental way. setting where there is, a high degree of statistical significance. Versus Llama 2, 13b, so same family, same provider, doesn't know mention of any Star Trek things, like it's all very cut and dry. And then this was pretty popular last year, this kind of take a deep breath and work on this problem step by step. This was a similar experiment where they had the model figure out a top instruction for itself, and they ran a bunch of tests to see which one goes to the top here. And so they were, this is the top instruction for the A variety of different models here, and we can see GPT's fours is You know, five times larger than palm twos. And so it just goes to show that everything is going to be a little bit different depending on the model. And that's why it's important to test these things. And if you're looking for more information on models like max tokens, context, windows, costs, features, and functionalities, we have a, directory that we just launched that has all the information for basically all the model providers that are the most popular. All right So moving on to My two favorite prompt engineering practices and the one that we tell the teams that we work with to focus on is you know giving the model room to think this is a popular one and kind of relays into Chain of thought to a degree you want to let the model think you don't want to force it to give an answer Or overly constrain it. You want it to go through some sort of reasoning process and come to an answer on its own. And then using delimiters or some way to better structure your prompt. I can't tell you how many times I've had a team say, Hey, we're struggling with this prompt. Can you look at it? I look at it and I can't even. See what's going on. And so that's a good litmus test is have someone else look at your prompts, see if they can organize it or understand your organization. And if not start to provide some of that via, delimiters, backticks, quotes, whatever is going to help structure the prompt better. Now we'll look at a few. So just to set the stage, a zero shot prompt is just a normal prompt. So if you hear that ever referenced, it's basically just a typical prompt like this. A few shot prompt is when you include examples inside of your prompt. So this would be all one prompt sent, but we're going to say, show It's examples in line. So we're classifying the sentiment of this feedback. So this person's positive, then negative, then positive. And then we're going to let the model kind of fill in the, the blank here. And you could do this via multiple messages. So like sending an array of messages from the API. And that's typically, I think, technically what few shop really is rather than having it all be in one message. Because the model will handle it different if it has it in its history versus reading it all in a prompt. Both ways are effective. Both ways are worth testing. SureShot Prompting is really helpful in a variety of domains. It can help with structure. format, content, style, and tone, and those are really the really big areas that I've seen it be helpful. How many examples? The great benefit here is that you get a lot of the gains from just having one or two examples. And then it plateaus and can even degrade in a lot of situations. we say, hey, start with, Anywhere from two to five examples, if you're still not getting the performance you're looking for, you might need to look elsewhere because you have the chance of it starting to degrade, the performance. A couple of other important, best practices here. Use diverse examples. So if you're doing a sentiment analysis, don't use only positive ones. Use a combination. Have them cover a wide range of what you are going to be expecting in your application. So cover those edge cases. Random, randomly order them so you wouldn't have all the positives in one section and then all the negatives. And then make sure they follow a common format so the model can better learn, in context there. And we have a whole guide on this as well, with lots of examples and templates. Alright, next up is according to prompting, which is basically just trying to ground the model in a specific set of information. so you can see in this original prompt, it's asking a question, and then it just has, adds in according to Wikipedia. So it's trying to guide the model to the type of answer that you are looking for. are looking to. And this can be helpful, especially if you've done like some fine tuning. And here's an example of that. Again, all the templates are available in PromptTab under the templates tab for free too. And then last one I believe is called step back prompting. So this is a very similar kind of, I say, variant of chain of thought prompting, where you send a question, you have the model kind of think step by step in first kind of abstract concepts, and then Use those abstractions to reason through the question or the task. And so you prompted to do this thinking first, and this is what you see with a one preview and a one mini where it thinks about the step that reasons about them. And then it solves the problem. So again, yeah, these are all linked in prompt up. So app dot prompt dot U S slash templates. Last up. And my favorite one is persona prompting. So this is like very popular for a long time now. so this is like giving the. the model, a persona to solve a certain task. And there are a lot of papers on both sides of this in terms of how effective it is. I have come out to the other side to thinking it's actually not that effective in certain use cases. And the main reason comes from a learn prompting. org paper, which is linked here. I had this intuition that it wasn't great for doing accuracy based tasks, and this really reinforced that. So basically they set up an experiment. They ran 2000 prompts, and it was a MMLU case, so like a knowledge based, task, and they gave a bunch of different roles here. And really the kind of long and short of it is that. The genius when they were told the model, it was a genius. It got a lower percentage than when it told it, it was like an idiot or something like that. Yeah. The genius was actually the worst performing one here. And so how can you reconcile that and still think that role prompting works? I don't know. we have to look at other data, but this seems just pretty strong to rule out that conclusion and everything else. I would say it may just be anecdotal, but it is helpful in terms of, I would say, Tone and style. So if you're doing content generation, things like that, but not for increasing accuracy, last up, we'll talk a little bit about meta prompting. So what's meta prompting. It's a prompt engineering method that uses the LLM to help you write your prompt. So using chat GPT to help write your prompt. And we are big proponents of this. We think this is how prompt engineering should really be done alongside the model that you're working with. And the same way they use, AI and LLMs for writing and coding, you should do it for prompt engineering as well. work together to form a good prompt for your use case, and then go and test that, and continue that iterative loop. And there's a bunch of tools out there to do this. We have one that we launched, and specifically, we will run a different, Prompt to generate your prompt. So that meta prompt based on whatever provider you're using, because as we saw before, every provider is different. So we've baked in those differences into the meta prompt for each one of the providers. Leverages best practices. It's free. You can use it in our app without any account. so that's good to go. Anthropic was one of the first ones to do this, and I think they have a really great grasp on prompt engineering in general. So you can use this in the Anthropic console. A bunch of best practices built in. It's open source. it does charge. that's nominal. And then OPI actually just released one, past month or so. You can use it in the playground and it generates system messages only. but it's still usable and fun. And we did a little bit of prompt injections to get the prompt behind that because it wasn't open source. And so that was really cool to see. It's always interesting to see how The model providers are writing prompts. So that's available in prompt as well. Wrapping up four things you can do today, structuring your prompts with headers, delimiters, that's going to be a big help. The more specific you are in your instructions, the better your prompt will be. You can throw out all the other methods. if you can just nail that part, that's great. And if you don't nail it, all the other stuff really isn't going to help you. Examples of VFU shop prompting is a great way. I think metaprompting plus a few shot plus chain of thought is like really going to be the winning formula going forward. and then don't overly constrain the model. And thank you. I hope you enjoy this. If you want to talk about this, feel free to reach out. we're active on LinkedIn, and yeah, have a great rest of your day.
...

Dan Cleary

Co-founder @ PromptHub

Dan Cleary's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways