Prompt Engineering for Test Automation: Enhancing AI-Driven Quality Assurance

Video size:

Abstract

Discover how prompt engineering can revolutionize test automation by improving the accuracy, speed, and reliability of AI-driven quality assurance in modern software development pipelines.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Welcome everyone. I'm Ludovico. And today we're going to explore how prompt engineering can enhance test automation by improving the accuracy, speed, and overall reliability of AI driven quality assurance. But before starting, let me introduce myself. I'm Ludovico, nice to meet you. I'm the senior test engineer at NearForm, also SheTech ambassador and Grafana champion. As well, I co founded different startups in the tech field. I like to contribute to open source work, mainly in the educational field. And also have some passions like cars, animals, and video games. But let's start to introduce my company near form in quick words Near from is a company from ireland that makes software and as well. We are really Love open source contribution and doing open source as well And you can find right here in this slide the numbers About ours contribution in the open source world If you're interested to see more you can check the website and that's why we are hiring. So check this out You let's start with the talk. this session will address critical challenges in test automation. We will introduce some practical techniques and show how Prompt Engineering can help you. Let's say increase your skills for future proofing QA processes with AI. So let me Say why I bring you this talk because we want to solve real QA challenges that we have in the QA world Gain, maybe some practical skills and also hands on skills to check if this prompt engineering can solve the our problems And also stay ahead with the high driven qa trends that are going on right there So what is prompt engineering? I created a simple quick definition of it And for me is designing precise contextual inputs to guide AI models towards specific outputs but in other words We can say that is the art and science of creating the right inputs so that the model responses aligns closely with your goals But what is the context? between a prompt engineering and QA in the context of quality assurance. Prompt engineering is particularly valuable because it allows us to create test cases, simulate real world user actions, various, let's say handle various edge cases. And also validate some outputs produced by the model itself. this is essential to ensuring that AI driven products are available and perform as expected in different scenarios. And, before starting, let's see, let's see what we need to clarify. because, I think that success in this case for prompt engineering can be something like this. So we need to define, as you can see, miserable outcomes that aligns with your use case. Ask yourself what is a successful, successful response for the AI model should look like in your specific context. And as well, this can be accuracy, relevance, clarity, and any kinds of, formal output that is essential for you and for your case. as well, the evaluation method, because, with the success criteria in place, you can set up evaluation methods to measure how well your prompt performs. Are performing because for example, if you start writing down In a plain text what are the outputs that you want to have for a test case? For example, I want to create the task for a button and I want that the output of my prompt Will be that particular test cases. I can write that down and this involves a testing prompt Because you have let's say an outcome writing down and you can test if the prompts that you are using Are giving you the right output that you let's say ticket before so it's important to track performance Over the time because this will improve it directly the expectations and the performance of the prompt itself. As well, the last point that I've written down that you need to do for reaching the goal and is to start simple with the first draft prompt and refine that through different many iterations because your initial prompt doesn't have, let's say, to be perfect because, you can think, at this point, with this prompt as a baseline that you can use to improve upon in the time, because you need to begin with some kinds of idea and, maybe you want to, know that, that idea is something that, that the, the eye that you are using can accomplish. And as you go, during the time, you can use that, to, let's say, reach your goal. as well, you can use many different tools if you have, problems during structuring the prompt. like the Antropic console that you can find online. I know that Antropic, if you search on it, does have a prompt generator that you can use. so if you have some struggles during the prompt, iteration, improving performance, you can use as well, different tools like this one, that maybe, can help you generate and improve, better the prompts that will align with your needs. So moving on, we have this, we have this, engineering workflow. That is a screenshot from Anthropic Documentation. As you can see, we have this flow that is a step by step workflow for prompt engineering, for the effective prompt engineering. And we can see that we start with the developed test cases because we need to start, as we heard before by me, defining some test cases that can, let's say, cover different variety of scenarios, and different typical cases. Moving to the, let's say engineer primarily prompt. So with the test cases in mind, we can create an initial version of our prompts. and this will guide us to, towards the, desired outputs that we want. To have, this move to the test process where we actually test the prompt against we, against our test cases to see, how well it performs and as well to evaluate the model's responses for each of the scenarios that we have tinkered and notes where the improvements are needed. So after that, we have executed our, word prompts, we can write down every single outputs and see where we can improve that outputs and we move to refine prompt. So based on that results, we can refine our prompt to address any possible issues found during the testing. And this can be something like rephrasing, adding more context or just adjusting instructions to make the prompt more effectively effective. And the last point here is to share polished prompt because once let's say the prompt consistently produce the desired outcome. So we are, let's say, good to go across all the test cases that we are thinking. it's ready to be shared. So we are confident that our prompt is performing good and we can share the prompt with the other team members. So they can say, Hey, okay, this prompt is working fine. And they have the, the thing that they need. the final, let's say, polished prompt that we have is, The relevant version that we have in production and, this, let's say is the, the goal of this process. also what we can say about this, that is this cycle of testing and refining, is also called, evils. And it's crucial for me and for also Anthropic. To achieve a dyke level of accuracy and reliability in the AI responses, because if we talk with the AI and every time we need to create a prompt from zero, so from scratch, the result will not be something that is quality like, because every time that we let's say, start executing a new prompt, we don't have the, let's say, trust of the outcomes. And this is, this can be really a pain during the time. So really keep in mind this process and start doing like that. so now I'm giving you some examples that I've creating to, show you what a good prompt should be like. this is a simple prompt that I'm creating and, as you can read, I giving the context. the, I will start acting like a test engineers with 10 years of experience. after that, I will say. that I will give in the input some screens and after the screens, will be given to the prompt, it will output all the test cases in the given one, then format in both Excel files. And in the, chat, so you will see in a bit what is the result. This is the input that I've, inserted inside the prompt. This is just a simple website. It's just an e commerce one. And you can see now the output. So you have a table with plenty of test cases formatted in the way that I defined it. And you can see on the right, The file, the Excel files, and as well the link to download, the file. So this is just, it's just a quick example, and we can move to the challenges in the traditional test automation field. Because we can see that during the, the time, the test automation world have encountered different problems. Because, traditional test automation faces multiple challenges in real life. From adaptability issues to high manual effort to maintain, stability in the CICD pipelines. I think that prompt engineering can help address these challenges, by enabling, more adaptive test cases and also increasing the speed of testing. Of the, fixes as well. And here we are with some key techniques that we can use, in prompt engineering, each of these techniques from zero shot, few shot and chain of thought prompting are essentials for creating precise and adaptable responsive. In responses in a different testing scenarios and moving on beyond manual prompt engineering is because, while manual prompt engineering relies on trial and error frameworks like DSP, why allow us to, let's say, structure the prompts like code. So we don't have, let's say the prompt that we. right with simple plain text, we can actually use this DSPY framework to code the prompt, right? And so we can achieve more data driven results and we get as well an output that is much way better than the single prompt. So manual prompt can be a starting point, but maybe in the future, give it a try to the DSPY framework. How can we efficiently manage and track the performance of different prompts in a dynamic QA environment? This is a question that, different people asked me and I answered it with this prompt layer. Prompt layer is a tool that, That, allows you to create and manage different prompt versions in one platform easily. And you can track, you can track different versions of the prompt. You can log and analyze how the prompt is actually, going, if it's going good or not. So you can track the effectiveness to see which of the different prompts performs better during the time. And this is, I think, a really good A way to, let's say, iteratively, upgrade and improve the prompts with a platform like this. Because for the qa, for example, we can, use these to identify what are the prompts that are performing good and quickly applied updates, for the cases that we have, written before using the, flow that they presented you before. This is some screens about prompt layer. As you can see, it's a simple PLA platform. You can have different, features like create the prompt template, log the request, blah, blah, blah, and use as well the templates. And this is the editor that you can use to write actually the prompt and to check, after that if the, the prompt is performing good. right now I'm giving you some advice that I use. So for my side, I build actually a custom GPT for each of the project where I'm working on, because. I think that right now, custom GPT is the much, let's say much easy way to create something that really can help you out with your work. Let me dig it out for you. What is a custom GPT? So chat GPT itself, if you pay the plan, you get the plan. Can allows you to create this custom GPT that are actually some chatbots that you can Customize by inserting the instructions that the chatbot Will execute for you and also you can give the instructions and the data That you can, give to the chatbot and we will use them, for, generating the outputs better, based off the data. using the user story, take, sorry, taking the user story, taking the technical details, and the screens of a project and building a custom GPT for that will increase the chance to get outputs better, in the context of your project. So please give it a try. And I think in this case, as well as you can read in the second line, you will have for each of the prompt that we will create, maybe using prompt layer or other tools. Let's say more project specific for you, so you will have better outputs as well. Another thing that they really, let's say having the value to use is to create some documentations, chatbots with custom GPT. So for each tool, like maestro playwrights or anything else, the thing that I'm doing is to create a chatbots that takes the markdown pages. of each website documentation. And after that, let's say I have a powerful, chatbot that knows everything about single tool and, can answer my questions based on the latest versions of the tools. Because, sometimes when you ask ChatGPT for some questions, It will gives you the, let's say, unupdated code and, versions about, the tools that you are using and asking for. So I think using this approach can leverage the code that's actually being generated by, chatGPT in this case. And this is an example. So I just take the maestro for you and you can see that I'm giving some instructions, for example, to answer questions about maestro and generate code based on the provided official documentation that I'm giving. To, ChatGPT with this upload. So I take it every single page from the Maestro, Official documentation, put it in this chatbot and actually it works. So the outputs that I, let's say receiving from this chatbot, are much way better than the single outputs that I will have when trying to prompt inside a single official page of ChatGPT. So this is something that you can try and, Check if it's working for you. And also another topic is this the agents because if for example For the prompts you will have let's say a static result with agents. You will have much more dynamic Interaction because agents Can pull real data from the outside, interacts with APIs and adapt, in different contexts and making the responses more precise. we don't have the time to check and see some agents, but they know that there are some agents in the world that are starting, let's say, relying on the software testing field as well. So check this out, please, because it's something that can improve as well your outputs if you need something like that. Also, let's move to the understanding knowledge memory, topic because, knowledge memory, as the first that we have seen, in the chat GPT with custom GPT, when we are passing actually the, markdown files enables you, to let the eye remember the past interactions. So previous prompts, previous outcomes. as well, previous data that you have passed inside the chat that is really beneficial for your test that you have, let's say, in mind to, reach, for, involving sequential possible actions or complex workflow, for your AI. So knowledge memory is really important because, this can improve, as we told before, iteratively the outcome of the prompts. And as well, we define the word prompt tuning that you can hear in the future, maybe. That is something that optimizes the prompt itself. not the model, okay? Just the prompt. Because we can't improve the model by ourselves. The model has some, let's say, rules and some formulas that are behind that. Okay? So we can't improve the model, but we can actually prompt tuning the prompt. So the technique, this technique is really useful, when we, let's say, want to expand our detailed test cases, or we want to also evolve the test cases that we are using for checking if the prompts are actually working fine. And, yes, these are for me the best prompting techniques that you can use, the few shots, the chain of thought and the react. So a few shot, let's say, as you can read, use example for nuisance task, a chain of thought, helps AI expanding its reasoning and react dynamic responses based on observations. right now we are moving to these four phases that I've prepared for you, for me, it's like going to the design. So when we in QA, word, start writing down the test cases, starting thinking all scenarios, start thinking if, the things that we are, wanted to do during the user story creation. Actually are missing something. So with all of these phases, let's say I'm reaching you The possible way to use prompt engineering to help you out if you are test engineer in a real world scenarios So starting from the design phases prompt Engineering is possibly used to score the automatability on, the test of the test cases that we want to, implement. So if you have, list of possible test cases that you want to do, maybe you can ask with the prompt they are to check what are the, mail test cases that you need to start. Automate. And this can be also done better if you use a custom GPT like the one that I showed you before. one, the one that have the context of the project as well. this really can help you out if you have some troubles and you can't really know if some test cases can, let's say, automated Before someone else as well. Another one is to generate diverse, diverse user personas because, I think, having in mind that, a possible user that will use your website can be, let's say a normal one. so maybe using, the, hands, not both hands, but maybe just using one hands. So I think, having different user personas can be, useful when you need to create a more inclusive and comprehensive, test cases for your, test plan as well. So if we can move to the implementation phase, when we actually want to implement our, code, our test. We can say that during implementation, prompt engineering aids in identifying UI elements and generating diverse test data dynamically. Because, as you can see, we have the xpath name suggester that is Something that I've tried on, you can pass some images with some rules for each part and you will, it will tell you the name that you need to use for each of the possible, components in the page. A data driven test generator. that can help you out, maybe with the data to generate diverse, test data dynamically. So not talking about the test itself, but the test data. So maybe, the possible configuration of the users, the possible ways, of a user that can use your, your, your software, something like this. And also automated code generation, because some tools like autoplay write, can actually generate, code inside Playwright based on some prompts. So if you use this keyword driven approach. It will start, let's say, replacing the code itself. this, actually, if you use the tools like chatgpt before, can, let's say, allows you to implement also the test automation, Faster, instead of doing it with only code and reporting phase, because, this is really important one, because, in the reporting phases, prompts help you detect maybe a recooling failure, because if you can pass the logs and also the, let's say, the reports that are made by the frameworks that you can use like PreWrite on Node RES. Maybe you will can, you can, let's say, prioritize some issues that you have, ask, promise you to, fix something before, another one. And so this maybe will make it easier for you to focus on that. more on the critical areas for remediation instead of checking it manually. Also, maybe, as you can see, creating some kinds of strategy behind this failure. creating some, let's say, fixing practices based on risk, based on the issues that we are going on. And the last one is the reading reports phase because, in this case, I can translate the test results into some accessible summaries for those that are not technical. And this is, I think, is a really great feature because if you take something that is really technical, It is really techy, and you can translate this in really seconds, to let some other one in the team, like the stakeholders, read what you have done, what are the reports, what are the data that maybe you can't understand with the initial report. This can really help. help you out, assess the possible impact that the tests, are, are making also in the, in your team. And this maybe can guide also the QA focus of the team. As well, the last thing is to, suggest focus on the testing focus for the new features, because if we have all of this data stored in a place and we can remember this with the memory that we have seen before, this can also guide ourselves to, improve the strategy behind creating a new feature, because actually when you have collected all of this information and we want to, let's say, start and create a new feature, Another feature we can ask the eye with all of this data if It's a good one or it's a not good one because we have the data to think if that feature Maybe can address some the issues that we have found And yes, some three takeaways from this talk I think prompt engineering transforms QA, as you can see, and I think you will try to have a chance to create something like the things that I've shown you before, maybe the custom GPT, maybe trying some out, trying out some prompts or prompt layer, because I think, for my opinion, prompt engineering really can help you out doing your job, also if you are a QA. The effective tools and techniques as it exists, like the Anthropic that we have seen before and also the prompt layer. But, really in the web world, many tools exist. Maybe you can start by yourself using the chat GPT prompts. And after that, improving, maybe using some auto prompting strategy. so this is let's say a way to, how to insert prompts inside the eye to iterate automatically, until we get the output that we need. Or as well using maybe an agent that is a let's say next step For does not have so many skills in this field But you can for sure start with the chat gpt tools or prompt layer ones And also the last thing that I want to say that you need to remember is that these things are really cost efficiency for the qa processes because Think that, there are in the world a lot of persons that are thinking test cases every day. So with these kinds of technologies, we can, let's say, bypass having a lot of tip, a lot of people that are working in these test cases writing. And let's say, focus ourselves more on automate things and, check if the software is good or not manually or automated way. I think these are the three main points that I want to, remember. experiment with the promise engineering in your testing workflow, and also, start small and iterate, because if you want to do this at once, you will, fail, because it's not something that you can accomplish, in an easy manner. Thank you. I hope that you are really happy now, having discovered these things. And these are my contacts. You can meet me, write me a message, and I'm really happy to see you. Bye!

Slides

Download slides (PDF)

See all 40 talks at this event!

Conf42 Prompt Engineering 2024 - Online

November 14 2024 - premiere 5PM GMT

Prompt Engineering for Test Automation: Enhancing AI-Driven Quality Assurance

Video size:

Abstract

Summary

Transcript

Slides

Ludovico Besana

Senior Test Engineer @ Nearform

Join the community!

Featured event

2026

2025

Info

Conf42 Prompt Engineering 2024 - Online

November 14 2024 - premiere 5PM GMT

Prompt Engineering for Test Automation: Enhancing AI-Driven Quality Assurance

Video size:

Abstract

Summary

Transcript

Slides

Ludovico Besana

Senior Test Engineer @ Nearform

Join the community!