Transcript
This transcript was autogenerated. To make changes, submit a PR.
Before we go ahead, I would like to talk
about myself. I'm a software engineer with over five years
experience developing digital banking solutions
for financial institutions in West Africa, that is Lagos,
Nigeria and Freetown, Sierra Leone.
Personally, I founded two companies in Nigeria. One is
a waste management company called Dustbin Boy, which basically
leverages technology to deliver waste management
services in Lagos, Nigeria. And the second is a music company
that is a record label. Well, because one of my hobbies is
music. So now let's start
by talking about what model observability is. Model observability
is the practice of validating and monitoring MM model performance
and behavior. It involves measuring critical metrics,
indicators, and processes to ensure model work as expected in the production
environment. So simply boot model observability is the process
of validating, evaluating, measuring, monitoring,
and ensuring our model performs the way we expect
it to perform in production environments.
So before we move forward, let's talk about how model observability is
different from model monitoring. While model observability
provides real time insights, model monitoring collects
and analyzes metrics over time. Model monitoring
detects anomalies and trends, while model observability
diagnoses issues within processes. Model observability also
reviews underlying system dependencies and understands
why anomalies occur, while model monitoring ensures model
operates within thresholds and generally focuses on
system health. So why is model observability
important? Why should we practice modeling observability?
Easily on top of this list is transparency, because oftentimes
AI functions as a black box that lacks transparency in its processes.
So model observability is a way to gain transparent
and to shed light into some of these processes. Model observability also
helps in error detection because users may not notice when large
language models like TPT four make mistakes.
By also detecting these mistakes and providing transparencies and
identifying some errors, the credibility of our model
is increased. And by understanding
and getting insights on why errors are occurring, we gain an understanding
of our model and of course, all of this,
which is maintaining visibility and understanding our model.
And, you know, coping with the credibility helps users to
sustain, and it's the trust and helps us to gain
more trust in the AI system.
So now let's talk about why model observability is important
with a practical use case and a practical example
using Google chatbot bad right after its launch,
Bird claimed in a promotional campaign that the James Webb
Space Telescope took the first ever image of an
exoplanet. This was not true, and the consequences of this was
well, customers raised a lot of doubt on the model's
efficiency and well, Google reportedly
lost $100 billion in market value because of this blunder.
So now let's, now that we understand the practical use case of why,
you know, what can happen without model observability, let's, let's talk
about how model observability helps and the benefits
that we get from model observability. Model observability
enables engineer to perform root cause analysis, identify the reasons behind
specific issues. That means it doesn't just generalize the errors and it
doesn't just give us a basic overview of the errors. It helps engineers to go
down into the root cause and helps them understand the specifics behind
any specific issue. So the benefits of this obviously is continuous performance
improvements. It ensures expected behavior in production and
streamlined machine language workflow, of course,
scalability, and it reduces time to resolution.
So what are the components of key components of
machine language observability? We have the event login,
the tracing, model profiling, bias detection, and anomaly
identification. So the Venn login deals with the detailed
logs of model activities. The tracing involves tracking
data through stages. The model profiling involves performance analysis,
while bias detection involves identifying and mitigating biases,
and anomaly identification involves detecting unusual patterns.
So here we have it in the diagram. It shows from the discovery to
analysis to diagnosis and to resolve.
So what are the key challenges that face our model observability?
Here in this presentation, I will talk about data drift,
performance degradation and data quality. So for data
drift, this occurs when the statistical properties of the training data change over
time. It can include covariate shift, which is changes
in inputs, future distributions, and model drift,
which is the changes in the relationship between inputs and target variables.
Causes of these drifts include changes in customer behavior,
shifts in the external environment, demographic changes,
and product updates and upgrades.
Another key challenge, like we said, is the performance degradation,
which is basically over time, as machine learning applications gain more
users, their model performance can decline
due to model overfitting, presence of outliers,
adversarial attacks, and changing data patterns.
Lastly, another key challenge is the quality of the data.
Maintaining consistent data quality in production is challenging
due to reliance on various input factors such as data collection,
method pipelines, storage platforms, and preposition
techniques. Some of the possible issues we can encounter here
are the missing data labeling errors,
disparate data sources, privacy constraints,
inconsistent formatting, and lack of representativeness.
So now let's move to model observability challenges in large language modules.
Large language modules, otherwise known as LLMs,
face some unique issues. While we have hallucinations, which is,
you know, degenerates, nonsensical or inaccurate
responses, we also have no single ground truth, which is when multiple
possible answers are generated for the same questions, which make
evaluation difficult. The response quality responses may
actually be correct, but relevant or poorly tuned. And we
have instances of jbreaks where, you know, some prompts can bypass security,
leading to harmful outputs. And, you know, the cost
of retraining, because this is because ensuring up
to date responses over time requires expensive retraining.
And these are the issues faced by, you know,
so now that I've spoken about the challenges that model observability
phases in large language models, let's talk about some
of the evaluation techniques for large language models. A tailored model
observability strategy can help address challenges and improve evaluation.
Some of the common techniques that we use include user feedback,
embedding, visualization, prompt re engineering,
retrieval systems, and fine tuning. With user feedback, we collect
and access reports, unbiased and misinformation.
With embedding visualization, we compare response and prompt embeddings
for relevance. With prompt engineering, we test various
prompts to enhance performance and detect issues with
the retrieval systems. We ensure our LLMs fetch correct information
from relevant sources, and with fine tuning, we adjust the model with
domain specific data instead of full retraining.
So now let's go to challenges in computer vision. Here we have
the image drift, which is, you know, changes in image properties over time,
like lightning and background. We have occlusion, which, as seen in
this diagram, objects blocking the primary objects,
leading to misclassification, lack of annotated samples, which is
difficult in finding labeled images for training. And we have,
you know, of course, sensitive use cases where the cost of making errors
and making mistakes is disastrous, like in medical diagnosis
and self driving cars. So some
components which are some ways to address challenges in computer vision.
Well, on top of the list is the morning turning metrics, which means we
should measure image quality and model performance.
We should also use specialized workforce, which means we should
involve domain experts in the labeling process and
the quality of our edge devices. We should also monitor the
most devices like camera and sensors in real time, the label quality.
Also we should ensure high quality labeling with automation and regular
reviews. And lastly domain adaptation. We should indicate
when to fight two models based on data divergence.
So some monitoring techniques that we use in machine language observability,
we have the standard ML metrics like recall, precision, f one s
called MAE. We have the Lad language model metrics
like the blue material SIDAR for automated scoring.
We also use human feedback, custom metrics and ARLHF
for human based assessment. And we also have the computer
vision metrics like the mean average position intersection over
union, panoptic quality for tasks like object detection,
classification and segmentation. So let's talk about
explainability techniques in standard ML systems.
Explainability is the capability of observability tools to
provide clear, understandable insight into system behavior and performance,
enabling stakeholders to easily interpret and act on the data.
There are two techniques one can use to interpret the models decision
making process. Here we have the sharp and the line. The sharp, which is
the sharply addictive explanations, computes the sharply value of each vector
of each feature, indicating future importance for global and local explainability,
while the lime the local interpretable model, agnostic explanations
per tubes and p data to generate fake predictions. It then trains the simpler
model of the generated values to measure future importance. Here explainability,
simply put, is the capacity of our observability to generate
insights which can enable us, which we can easily interpret
and act on to enable us make decisions on the data
under our model. So now let's talk about
explainability techniques in large language models. Here we have
the attention based techniques where we visualize which word
the model considers most important in an input sequence.
It is useful in models like charge, DBT bed and t five that use
transformer architecture. We also have the saliency based techniques which
computes gradients with respect to input features to measure
their importance. Masking features and analyzing output relations
can reveal crucial features. So now let's talk
about the explainability techniques in computer vision. Here we
have the integrated gradient, Xari and Gradcam.
I will show you the difference between the three on the next slide. For the
gradcam, it generates a heatmap for CNN models highlighting
important regions by overlaying the heatmap on the original image.
For the integrated gradient, it builds a baseline image and adds features
gradually computing gradients to identify important features for object
prediction, while for the Xari enhances the integrated gradient
by highlighting pixel regions instead of single pixels,
segmenting similar image parties and computing saliency for
each region. So as you can see here, the integrated
gradient, the Xari and the grad camp. So the integrated gradient
basically generates a baseline image and the XCRI,
which is an extension of it. And as you can see, the grad camp generates
an its map which for the for its own.
So let's give a quick
summary of everything we've discussed over before we end
it. We started by talking about model
observability, which is the validation and, you know,
measuring and ensuring the performance of our models.
I will talk about the observability to different from model
monitoring. And then we talked about why observability is important.
And we used Google's use case of Google's
chatbot bad and the wrong information it gave
and the effect on Google. Then we talked about the components of model
observability, which things like which involve things like
events, login bias detection and mobile model profiling.
And then we talked about the key challenges in model
observability in machine language, where we have the
data drift, performance degradation and data quality.
We also talked about the key challenges in light language models like hallucinations,
jbreaks, and we talked about the challenges in
computer vision, which is the occlusion, image drift.
And then we talked about some measuring techniques. And we
finally talked about the explainability and model
observability, which is the degree to which our model
can be explained and to which insights can be gotten
from the model. So finally, let's talk about
future trends in model observability.
Here we have the user friendly XAI, which is developing techniques
to generate simple, understandable explanations. We also have the AI
model fairness, which is using axi to visualize
learned features and detect bias. We also have the
human centric explainability, which is combining insights from
psychology and philosophy for better explainability methods.
And we have the casual AI, which is highlighting why a model uses
particular features for predictions, adding value to explanations and
increasing robustness.
So. So this brings us to the end of the presentation.
Thank you so much again for having me, and bye for
nowhere.