Conf42 Observability 2024 - Online

The Next Frontier: Observability in Machine Learning Systems

Abstract

Discover the power of observability in machine learning systems! Learn how real-time monitoring and debugging enhance reliability and performance, driving better decisions and customer satisfaction.

Summary

  • Software engineer with over five years experience developing digital banking solutions for financial institutions in West Africa. Personally, I founded two companies in Nigeria. One is a waste management company called Dustbin Boy and the second is a music company.
  • Model observability is the practice of validating and monitoring MM model performance and behavior. It involves measuring critical metrics, indicators, and processes to ensure model work as expected in the production environment. Why is model observability important?
  • Model observability enables engineer to perform root cause analysis, identify the reasons behind specific issues. Benefits of this obviously is continuous performance improvements. It ensures expected behavior in production and streamlined machine language workflow.
  • In this presentation, I will talk about data drift, performance degradation and data quality. Data drift occurs when the statistical properties of the training data change over time. Maintaining consistent data quality in production is challenging.
  • Large language modules, otherwise known as LLMs, face some unique issues. A tailored model observability strategy can help address challenges and improve evaluation. Common techniques include user feedback, embedding, visualization, prompt re engineering, retrieval systems, and fine tuning.
  • So now let's go to challenges in computer vision. Here we have the image drift, which is changes in image properties over time. Also we should ensure high quality labeling with automation and regular reviews. Lastly domain adaptation. We should indicate when to fight two models based on data divergence.
  • So let's talk about explainability techniques in standard ML systems. Explainability is the capability of observability tools to provide clear, understandable insight into system behavior and performance. There are two techniques one can use to interpret the models decision making process.
  • We talked about model observability, which is the validation and, you know, measuring and ensuring the performance of our models. And we also talked about the degree to which our model can be explained and to which insights can be gotten from the model. And this brings us to the end of the presentation.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Before we go ahead, I would like to talk about myself. I'm a software engineer with over five years experience developing digital banking solutions for financial institutions in West Africa, that is Lagos, Nigeria and Freetown, Sierra Leone. Personally, I founded two companies in Nigeria. One is a waste management company called Dustbin Boy, which basically leverages technology to deliver waste management services in Lagos, Nigeria. And the second is a music company that is a record label. Well, because one of my hobbies is music. So now let's start by talking about what model observability is. Model observability is the practice of validating and monitoring MM model performance and behavior. It involves measuring critical metrics, indicators, and processes to ensure model work as expected in the production environment. So simply boot model observability is the process of validating, evaluating, measuring, monitoring, and ensuring our model performs the way we expect it to perform in production environments. So before we move forward, let's talk about how model observability is different from model monitoring. While model observability provides real time insights, model monitoring collects and analyzes metrics over time. Model monitoring detects anomalies and trends, while model observability diagnoses issues within processes. Model observability also reviews underlying system dependencies and understands why anomalies occur, while model monitoring ensures model operates within thresholds and generally focuses on system health. So why is model observability important? Why should we practice modeling observability? Easily on top of this list is transparency, because oftentimes AI functions as a black box that lacks transparency in its processes. So model observability is a way to gain transparent and to shed light into some of these processes. Model observability also helps in error detection because users may not notice when large language models like TPT four make mistakes. By also detecting these mistakes and providing transparencies and identifying some errors, the credibility of our model is increased. And by understanding and getting insights on why errors are occurring, we gain an understanding of our model and of course, all of this, which is maintaining visibility and understanding our model. And, you know, coping with the credibility helps users to sustain, and it's the trust and helps us to gain more trust in the AI system. So now let's talk about why model observability is important with a practical use case and a practical example using Google chatbot bad right after its launch, Bird claimed in a promotional campaign that the James Webb Space Telescope took the first ever image of an exoplanet. This was not true, and the consequences of this was well, customers raised a lot of doubt on the model's efficiency and well, Google reportedly lost $100 billion in market value because of this blunder. So now let's, now that we understand the practical use case of why, you know, what can happen without model observability, let's, let's talk about how model observability helps and the benefits that we get from model observability. Model observability enables engineer to perform root cause analysis, identify the reasons behind specific issues. That means it doesn't just generalize the errors and it doesn't just give us a basic overview of the errors. It helps engineers to go down into the root cause and helps them understand the specifics behind any specific issue. So the benefits of this obviously is continuous performance improvements. It ensures expected behavior in production and streamlined machine language workflow, of course, scalability, and it reduces time to resolution. So what are the components of key components of machine language observability? We have the event login, the tracing, model profiling, bias detection, and anomaly identification. So the Venn login deals with the detailed logs of model activities. The tracing involves tracking data through stages. The model profiling involves performance analysis, while bias detection involves identifying and mitigating biases, and anomaly identification involves detecting unusual patterns. So here we have it in the diagram. It shows from the discovery to analysis to diagnosis and to resolve. So what are the key challenges that face our model observability? Here in this presentation, I will talk about data drift, performance degradation and data quality. So for data drift, this occurs when the statistical properties of the training data change over time. It can include covariate shift, which is changes in inputs, future distributions, and model drift, which is the changes in the relationship between inputs and target variables. Causes of these drifts include changes in customer behavior, shifts in the external environment, demographic changes, and product updates and upgrades. Another key challenge, like we said, is the performance degradation, which is basically over time, as machine learning applications gain more users, their model performance can decline due to model overfitting, presence of outliers, adversarial attacks, and changing data patterns. Lastly, another key challenge is the quality of the data. Maintaining consistent data quality in production is challenging due to reliance on various input factors such as data collection, method pipelines, storage platforms, and preposition techniques. Some of the possible issues we can encounter here are the missing data labeling errors, disparate data sources, privacy constraints, inconsistent formatting, and lack of representativeness. So now let's move to model observability challenges in large language modules. Large language modules, otherwise known as LLMs, face some unique issues. While we have hallucinations, which is, you know, degenerates, nonsensical or inaccurate responses, we also have no single ground truth, which is when multiple possible answers are generated for the same questions, which make evaluation difficult. The response quality responses may actually be correct, but relevant or poorly tuned. And we have instances of jbreaks where, you know, some prompts can bypass security, leading to harmful outputs. And, you know, the cost of retraining, because this is because ensuring up to date responses over time requires expensive retraining. And these are the issues faced by, you know, so now that I've spoken about the challenges that model observability phases in large language models, let's talk about some of the evaluation techniques for large language models. A tailored model observability strategy can help address challenges and improve evaluation. Some of the common techniques that we use include user feedback, embedding, visualization, prompt re engineering, retrieval systems, and fine tuning. With user feedback, we collect and access reports, unbiased and misinformation. With embedding visualization, we compare response and prompt embeddings for relevance. With prompt engineering, we test various prompts to enhance performance and detect issues with the retrieval systems. We ensure our LLMs fetch correct information from relevant sources, and with fine tuning, we adjust the model with domain specific data instead of full retraining. So now let's go to challenges in computer vision. Here we have the image drift, which is, you know, changes in image properties over time, like lightning and background. We have occlusion, which, as seen in this diagram, objects blocking the primary objects, leading to misclassification, lack of annotated samples, which is difficult in finding labeled images for training. And we have, you know, of course, sensitive use cases where the cost of making errors and making mistakes is disastrous, like in medical diagnosis and self driving cars. So some components which are some ways to address challenges in computer vision. Well, on top of the list is the morning turning metrics, which means we should measure image quality and model performance. We should also use specialized workforce, which means we should involve domain experts in the labeling process and the quality of our edge devices. We should also monitor the most devices like camera and sensors in real time, the label quality. Also we should ensure high quality labeling with automation and regular reviews. And lastly domain adaptation. We should indicate when to fight two models based on data divergence. So some monitoring techniques that we use in machine language observability, we have the standard ML metrics like recall, precision, f one s called MAE. We have the Lad language model metrics like the blue material SIDAR for automated scoring. We also use human feedback, custom metrics and ARLHF for human based assessment. And we also have the computer vision metrics like the mean average position intersection over union, panoptic quality for tasks like object detection, classification and segmentation. So let's talk about explainability techniques in standard ML systems. Explainability is the capability of observability tools to provide clear, understandable insight into system behavior and performance, enabling stakeholders to easily interpret and act on the data. There are two techniques one can use to interpret the models decision making process. Here we have the sharp and the line. The sharp, which is the sharply addictive explanations, computes the sharply value of each vector of each feature, indicating future importance for global and local explainability, while the lime the local interpretable model, agnostic explanations per tubes and p data to generate fake predictions. It then trains the simpler model of the generated values to measure future importance. Here explainability, simply put, is the capacity of our observability to generate insights which can enable us, which we can easily interpret and act on to enable us make decisions on the data under our model. So now let's talk about explainability techniques in large language models. Here we have the attention based techniques where we visualize which word the model considers most important in an input sequence. It is useful in models like charge, DBT bed and t five that use transformer architecture. We also have the saliency based techniques which computes gradients with respect to input features to measure their importance. Masking features and analyzing output relations can reveal crucial features. So now let's talk about the explainability techniques in computer vision. Here we have the integrated gradient, Xari and Gradcam. I will show you the difference between the three on the next slide. For the gradcam, it generates a heatmap for CNN models highlighting important regions by overlaying the heatmap on the original image. For the integrated gradient, it builds a baseline image and adds features gradually computing gradients to identify important features for object prediction, while for the Xari enhances the integrated gradient by highlighting pixel regions instead of single pixels, segmenting similar image parties and computing saliency for each region. So as you can see here, the integrated gradient, the Xari and the grad camp. So the integrated gradient basically generates a baseline image and the XCRI, which is an extension of it. And as you can see, the grad camp generates an its map which for the for its own. So let's give a quick summary of everything we've discussed over before we end it. We started by talking about model observability, which is the validation and, you know, measuring and ensuring the performance of our models. I will talk about the observability to different from model monitoring. And then we talked about why observability is important. And we used Google's use case of Google's chatbot bad and the wrong information it gave and the effect on Google. Then we talked about the components of model observability, which things like which involve things like events, login bias detection and mobile model profiling. And then we talked about the key challenges in model observability in machine language, where we have the data drift, performance degradation and data quality. We also talked about the key challenges in light language models like hallucinations, jbreaks, and we talked about the challenges in computer vision, which is the occlusion, image drift. And then we talked about some measuring techniques. And we finally talked about the explainability and model observability, which is the degree to which our model can be explained and to which insights can be gotten from the model. So finally, let's talk about future trends in model observability. Here we have the user friendly XAI, which is developing techniques to generate simple, understandable explanations. We also have the AI model fairness, which is using axi to visualize learned features and detect bias. We also have the human centric explainability, which is combining insights from psychology and philosophy for better explainability methods. And we have the casual AI, which is highlighting why a model uses particular features for predictions, adding value to explanations and increasing robustness. So. So this brings us to the end of the presentation. Thank you so much again for having me, and bye for nowhere.
...

Omotayo Alimi

Senior Software Engineer @ Ceedees Investments

Omotayo Alimi's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways