Conf42 Site Reliability Engineering 2021 - Online

How To Use Voice AI For Incident Reporting, Monitoring and Alerts

Video size:

Abstract

From this talk you’ll learn how to automate incident reporting, monitoring, escalations and alerts using Dasha voice AI platform. Novel approach allows a SRE to automate all these workflows.

Collect incident information, do follow-up calls, acknowledgement, escalation and more in 30 mins.

Summary

  • Today's session is on how you can use conversational voice AI to handle incidents in the site reliability engineering line of work. I'll do a live demonstration of a Dasha AI conversational AI application. You can enable your DevOps for reliability with chaos native.
  • Conversational voice AI is a set of technologies that lets youll create automated conversations powered by machine learning and artificial intelligence services. There are a few use cases. One is notifying you about incidents. Another is resolving these incidents over these phone. Third is handling incidents on the go.
  • Dasha Studio is an extension to VS code. Takes inbound webhooks from betteruptime and activates these Dasha application from the node JS code. Then calls me, talks to me, helps me to resolve these issues in real time.
  • There are three parts. The Dasha studio is where you write out the conversation flow. The second part is the Dasha SDK which lets you integrate with APIs. And the third is the cloud which gives you the AI as a service component. Let's now look at the actual code that makes these conversational AI applications work.
  • AI conversations can either acknowledge, resolve, or ignore the event. The resolve, the acknowledgement and the ignore nodes are all labeled not as a node, but as a digression. This is a way to give the dachshare apps that human like feel.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Are youll an SRe? A developer, a quality engineer who wants to tackle the challenge of improving reliability in your DevOps? You can enable your DevOps for reliability with chaos native. Create your free account at Chaos native Litmus cloud folks, my name is Arthur with Dasha AI. Today's session is on how you can use conversational voice AI to handle incidents in the site reliability engineering line of work. So these two concepts, voice AI incident handling, might seem like they exist in different worlds, which they may, but the fact of the matter is at least one of them, and that's incident handling, can gain a lot of benefit from the other. And that's voice AI. Here's what you should expect from today's session. We're going to start off with some definitions, so basically terminology, and we'll talk about why you might want to use why you might want to add conversational voice AI technologies to your set of tools in your line of work. I'll do a live demonstration of a Dasha AI conversational AI application which I built and integrated with better uptime, which handles incidents, closes them completely, resolves them if needed, acknowledgement them, notifies me about them and which gives me updates of the status of the Kubernetes cluster, et cetera. Then we'll talk about how it works and I will finally give you a rundown of the Dasha studio, the tool set that I use to build this conversational app and which you can use as well to add conversational AI to your set of site reliability engineering tools. Let's start with the definitions. Conversational voice AI is a set of technologies that lets youll create automated conversations powered by machine learning and artificial intelligence services. At Dasha AI, we call these automated conversations apps Dasha apps, and we'll go into a bit more detail as to how they are run and how they work a little bit later after these demo. In the how it works section, why you might want to use conversational voice AI in site reliability engineering there are a few use cases. So one is notifying you about incidents. Another is resolving these incidents over these phone. The third is handling incidents on the go. So essentially, if you're away from your machine, you get notified about an incident. Maybe youll can check some of the statuses online and acknowledge the incident and then resolve it when you get back to your computer. For example, you can handle incidents quicker while at your desk because you don't have to switch back and forth. You can ask the AI app what the status of your vital technologies are, for example, what's the status of your TLS certificates, what's the status of the Kubernetes clusters? And you can tell it to acknowledge the incident, to resolve the incident, et cetera, all online with your voice over, say, speakerphone. You don't have to tap around on your computer to do this. And actually that's one of the big benefits. Another is that you can tell the AI to quickly notify your colleagues, all of your colleagues or specific ones if you feel that there is a need. Get updates on the status of vital services. We already talked about that. You can take detailed notes of every incident. What I mean is, as youll are reporting the incident, you can dictate notes to the AI app and you can get those notes sent to your email address, or you can push them wherever you keep this information about incidents in the demo I show you today, we'll just send it via email. And finally you can resolve incidents quicker. So moving into the demo, you're seeing visual studio code and you're seeing better uptime side by side visual studio code. Why I'm using it? Dasha Studio is an extension to VS code. If you're familiar with vs code, you'll feel right at home with Dasha studio. I'll go over all the different components and how you use them to build these Dasha app a little bit later. Right now I'll just run a live demo for you. So what we've got here is two servers running. One is just a simple hello world server. It's being monitored by betteruptime. The other is a server which takes inbound webhooks from betteruptime and activates these Dasha application from the node JS code which then calls me, talks to me, helps me to resolve these issues in real time. So you might ask, why do this with betteruptime? Since betteruptime already does voice notifications. It does do that indeed. But those are just, hey, you've got an incident. Hey, incident was resolved by types of notifications. What I'll show you today is significantly more advanced. So I just killed the server that we're monitoring. And any second we're going to get a webhook coming into our server here and activating the Dasha application. There it goes. This is Arthur. Hello. Hello Arthur. This is Dasha calling you regarding your website. There has been an incident. You can acknowledge or resolve the incident right on the call with me. Please note I will listen and take notes until you mention that you are ready to resolve or acknowledge. Okay, let me take a look here. Okay, I'll wait. What's the status of the kubernetes cluster? Kubernetes is up and running. And what's the status of TLS certificate and health check? TlS certificate is active site health checks are not responding. Okay, I can acknowledgement the incident. Can you please confirm that you want me to acknowledge the incident? Yes, I can. Got it. I have set the status and better uptime of acknowledgement. The next step is to resolve the incident. Would you like me to notify other team members that there is an incident which may affect them? Yes, please. Okay, team members notified. I will wait for you to let me know when incident is resolved. Okay. Oh crap. What happened, Arthur? Did youll use the wrong terminal again? Yeah, something like that. Hold on. Okay, incident resolved. Can you please confirm that you want me to acknowledge the incident? Incident resolved. Can you please confirm that you want me to resolve the incident? Yes, I can. Well done, Arthur. I have set these status and better uptime is resolved. Thank you and take care. Goodbye. Bye. Right, well, there youll have it. So, as I mentioned earlier, we're also taking down the transcription of the conversation. And I should have just received an email with the transcript. There it is. So this is the transcript of the conversation that we have just had. It. Obviously, as you saw, these incident was resolved in real time. Acknowledged and resolved. And we are able to youll data from external functions regarding the kubernetes status and et cetera, et cetera. Pretty much anything that you might need at your fingertips or ear tips, whatever you can get with Dasha, because you can run all the HTTPs requests. Whatever, anything that you can do with node JS code you can do with Dasha. So I'm going to switch back over to the deck and just kind of give you a quick overview of the architecture. And then we'll come back to visual studio code where I will take you through the actual code which makes this app tick, and to take you through the architecture of the Dasha apps. So there are three parts. The first is the Dasha studio. That's what you have just seen me use to run the application, and it provides you such tools as analytics, debugger, visual editor, code editor. Essentially, the studio is where you write out the conversation flow using Dashascript, which is a domain specific language specifically designed as a Turing machine with nodes and states, and each node is responsible for something happening in the conversation. Youll might have nodes that don't show up in the conversation, but that youll use to do calculations, et cetera, or to call up external functions. You call up external functions in your index JS file, and from index JS you can call upon any external services, et cetera, et cetera. The second part is the Dasha SDK. So that's essentially used what you import into your node JS file, and it lets you integrate with APIs, lets you handle your telephony, et cetera, et cetera. And the third is the Dasha cloud. So this is the part of the whole system which gives you the AI as a service component. Alerts your conversations, have digressions, alerts you customize intents, entities. Slot filling provides an out of the box natural language generation, natural language understanding. Text to speech, speech to text calls, best in class, all proprietary technology. We're actually rolling out what's already in live testing. We'll be pushing into production emotionally charged speech synthesis so you can define what types of emotions you want to give the talker, if that's the type of thing that you're into. And by the talker, I mean the AI. And this is how the entire thing works at an overview. So you write the killer app in the studio, it's loaded into the Dasha cloud platform through these SDK, and then the conversation happens through a telephony provider with the user. So we've gone over the architecture. Let's now look at the actual code that makes these conversational AI applications work and interface with all of the services that you use, in this case with better uptime. So as I have mentioned, we've got a few main parts that we'll be looking at. The first is main DSL. DSL is Dasha scripting language. It's a domain specific language which specifically is used to create, to construct conversations. It denotes the structure of a conversation. The second file that we'll be looking at is data JSon. And this is the set of data which is used to train the Dasha AI neural networks in the Dasha cloud to recognize specific intent or to recognize specific named entities that the user requests. We'll look over that as well. The third, we'll barely look at it, is phrasemap JSon. I'll show you a couple of things in it and tell you what it's all about. And finally, index JS is sort of the file that puts this all together. And this is where we will actually start today. So, as I have mentioned, when I went into the demo, I had two applications running. One is the, this is, I guess, another file. One is hello world js, which is like about as simple of a server on node JS as you can set up, and the other is index js. So I didn't mention this at the demo, but obviously these are all running on my local machine. So I used Ngrok to give them a web address so that better uptime can actually monitoring one and send and interface with the other. Monitor this one, interface with this one. So index JS starts off by importing the Dasha AI SDK and obviously we're importing express to run the server here. We're using a few other things. So here is where the webhook listener app begins. So this is where our server actually starts. It gets data via webhook from better uptime. The most important piece of data for us is the incident id, but we also want to know whether the incident is acknowledged or resolved. The thing is that better uptime passes, sends webhooks no matter what happens. But we only want to get a call from Dasha if the webhook is initiated, if the incident is created, not resolved or acknowledged. So once we get that type of webhook for an incident that's been created, we launch the Dasha application and these Dasha application calls me and the conversation begins. So let's look at the body of the conversation here. We start off with two input variables, phone and name. You could look at index js and see right here in the Dasha app where we've got these input variables. There they are. So I'm storing these in the env file along with all these other things that I don't want to store directly in my code. We also declare external functions here. As mentioned earlier, external functions are a way for you to call up code within index js from the body of your AI conversation, which can then go on to do any manner of external service. Call any external services. So conversation starts with the node root. We wait to connect to the phone, and the application waits until the user says something. Then it greets the user and in this case tells them that. You can let me know when you're ready to resolve or acknowledge the incident. You've had an incident by the way. So then we can really take the conversation in a few directions. This is a pretty simple script as far as AI conversations are concerned. Essentially we can either acknowledge, resolve, or ignore the event. We can also ask the application about the status of some vital services, specifically TLS certificates, kubernetes cluster and site health check. So you could ask it to wait to repeat the last question and a bit of an Easter egg. Did you use the wrong terminal again calls. Right. So this is what I want to draw your attention to. The resolve, the acknowledgement and the ignore nodes are all labeled not as a node, but as a digression. What is a digression in the context of dasha? It's a node that can be called up at absolutely any point in the conversation. And we've really developed this for two reasons. One is it's a great way to navigate if you've got a huge giant menu, and two is it's a way to give the dachshare apps that human like feel. So when you're talking to a person, and if you're talking about whatever, you're talking about site reliability engineering, and suddenly your friend says, hey, by the way, what's the weather like where you are? You're able to reply to that and to kind of pass the sort of using test to give the user a feeling like they're talking to a human. We want the AI applications to be able to do the same thing. So these digressions do really well with that. And the digressions are activated by intents. So you can see here, conditions on message has intent ignore such, and such digression is activated. So in this case, we're looking for the intent of ignore. So when these phrases or any number of phrases utilizing these phrases show up, the digression is activated. Data json is a way for you to feed data easily into the dasha AI as a service, neural networks, which then are trained, and in ongoing conversations, they're trained over and over and recognize a variety of phrases which may include these words, or which may not even sound exactly like these words, but which have been identified to carry the same weight of meaning as these words do. So once the digression is activated in this case, we ask to confirm the action, and then the user can either choose to continue to confirm with the intent of yes, that they want to indeed, in this case ignore the incident, or to say no, in which case they will be moved over to node waiting, where Dasha says that she will wait for additional instructions from the user. Finally, what I want to show you is how we are identifying named entities when we're checking for status from external services. So these, this digression status is activated on two conditions. One is that the message has to have intent status, and the other that the message has to carry some data, specifically status entity. What is status entity? It's a named entity where it has a number of values. It's not an open set of data it's a closed set, which means that only these values will be identified. For example, if this was an open set, then dasha might substitute any number of words which are placed in the proper position by the user. But in this case we're looking to identify some very specific services. So Kubernetes, TLS and health check and here are the instructions provided to the neural network when the message has intent status and data status entity. These are the types of phrases that the user might say. What is the status of kubernetes cluster? What's the status of Kubernetes and TLS? Tell me about the status, or give me an update on the status of this or such and the other in the course of the conversation. This is how we got to basically check the status of these things. We call up an external service and the service returns the status to us. In the case of going back to we looked at the digression for ignore, but let's look at the digression for resolve. For example the same workflow. If the digression gets called up, the Dasha app asks to confirm that the incident is ready to be resolved, and if it is confirmed, it calls up the external function resolve. Here it is in our code resolve. It checks whether the incident id is null, and if it's not, it authorizes with the bearer API token and it sends an HTTPs post request with the incident id instructing better uptime to resolve the incident, after which the Dasha tells the user that the incident has been resolved. Take care and goodbye. By the same token, you could literally do any type of activities that you right now do manually with a Dasha app that's tailored to your specific needs. So to put into perspective how easy it is to build with. Took me probably around five or 6 hours to build this entire thing and I'm not a very experienced software engineer, and I hate to be the person who says it's that easy, but it's really that easy to build your own apps to make your site reliability engineering workflow even more efficient. The source code to this application will be attached below to the YouTube description and if you go to the GitHub you will find in the readme a bit of a tutorial on how to actually put all these into action, and it will also be up on the Conf 42 website for you to review, to download, to use, and to build out your own applications running on top of it. I hope this was exciting for you, as exciting for you to watch as it was for me to create and good luck making your site reliability workflows ever more efficient. Thanks, everybody.
...

Arthur Grishkevich

Citizen Developer Advocate @ Dasha.ai

Arthur Grishkevich's LinkedIn account Arthur Grishkevich's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)