Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi. My name is Tempest Vansky, and I
am a machine learning engineer at Microsoft in a team called commercial software
engineering. And today I'm going to speak about responsible AI in
health from principles to practice. So, an overview
of what I'll speak about is the historical journey of
responsible AI, lessons from biomedical research
in responsible AI, some ethical questions to
ask about projects, and some tools to
help when working on AI projects.
So I'll start with the historical journey of responsible AI
as I see it. So, when new
technology is developed and unleashed, safety and
responsibility considerations usually follow.
So, for example, the development of cars. We take
car safety for granted these
days, but once there was the first car on the
street, and then there was the first car fatality,
and then can manufacturers started adding things like
windshields and headlights and traffic lights on the street and
seatbelts, and eventually a driver's test,
which only came into being in 1955.
So we take that all for granted, but it wasn't always there.
And I feel like responsible AI is
in a similar position to the very early days of cars,
where the technology has been released. And now
there's a lot of focus on safety and responsibility. And what's
been interesting to observe over the last couple of years
is that machine learning researchers and practitioners have
moved from asking, can we do it?
To should we do it? For an example, look at
facial recognition. A couple of years ago, this was really exciting.
It's a genuinely exciting engineering breakthrough where
we can get computers to recognize human faces.
It's something that people have been working on for decades. So there was
a lot of excitement about whether we could have this breakthrough. But now
that we've seen the consequences of releasing facial recognition
technology, people, even companies,
are asking, should we be doing this?
Should we be using this technology at all? So I will
now speak about my personal journey with responsible AI.
Now, my background is in biomedical research because
I'm a biomedical engineer.
It's been interesting to see that some of the concerns with AI today
have actually are quite familiar from biomedical research.
So one of the most important documents that was written
in the medical research ethics world was the Belmont
report in 1979. And this established
new norms for ethical research. This was in response
to some really bad research that
has happened, mistakes that had happened,
and this was a response. So this was decades
ago. So medical research, it has a couple of decades
of a head start on doing things in a more ethical way.
So that's quite interesting. So some of the lessons that
we can use from biomedical research, so some of the standards
that we see there that are now considered part of responsible AI
are the concerns of data transparency.
So when you publish a medical research paper with human subjects,
you have to be very clear about who the human
subjects were, how many people were there,
what was their race and sex,
and what level of education did they attain,
and what part of the country are they from,
that kind of thing. You have to state that really explicitly when you publish a
paper. And that's now being considered part of responsible
AI is being transparent about who was in your data set
and who was not in your data set.
Another standard is informed consent. So initially
it was informed concerns to have your data used in a study.
And in terms of responsible RAI, we're obviously now
asking, have people consented
to having their data used by this machine learning algorithm.
There's also the concept of group harm and individual harm.
So if the study could
harm an individual or even harm the whole group, that that individual
is from. Likewise, in responsible AI, we're starting to
consider both of these types of harm and how to avoid them,
and then privacy. So with medical research, privacy is obviously
about most important, keeping that research data private.
And likewise in responsible AI, we're very interested now know
it's our responsibility to keep people's personal data private.
Because I had this background in biomedical research, I think it had primed
me, and I had this kind of lens for
working in AI, and I
came across a project to work on. But I had ethical
concerns with this particular project. And thankfully,
I was supported by my leadership to ask, should we do it?
Kind of questions, should we do this project?
And the support that I got from my team and
the tools that I discovered for addressing responsible AI issues
have prepared me for recognizing and addressing responsible
AI issues on future projects. So I wanted to share some of
the learnings that I've gathered along the way, in case they're
helpful for you. So now I'm going to talk a little bit about responsible AI
reviews. So we have a responsible
AI review board on my team because we work across really
diverse customer AI projects. They're very complex,
they're in different industries, each one is different, and we see a huge
variety. So we have this responsible
AI review board, and it's a sounding board for people to
express different views and explore different ideas
and ultimately provide responsible AI recommendations for
our projects. And I find that the following
questions are very helpful to ask when thinking about the ethical
implications of an AI project. So, first of all,
let's remember that AI is not magic. So is
this problem actually solvable with RAI or can
it be solved in a simpler way? So, for example,
sometimes a SQL query will do the job.
So can we just write a SQL query? Or do we need
an advanced machine learning algorithm that needs a lot of maintenance
and has a lot of responsibility of its own?
And similar to this is, does this problem have
a technical solution, or is this a problem that could be solved with
some kind of social intervention? So get
that out the way. Do we need technology at all? Do we need AI
at all? If yes, then it's helpful
to think of who are the stakeholders in this project.
So think about each different group that this RAI impacts.
And think especially if there are any vulnerable groups.
So vulnerable groups might be children, the elderly,
immigrant groups, or any groups that have been
historically oppressed. And other stakeholders
might be regulators,
lawmakers, even company and
their companies and their brands. Think about all the different stakeholders
that could be affected by this technology. And once we've
identified a map of different stakeholders,
it can be helpful to think, what are the possible benefits and harms
to each stakeholder so exhaustively list benefits
and has to each it's useful to ask,
does the data used by this code contain personally identifiable
information? Most of the time when we're
training a model, we don't need to know people's names and address and telephone
numbers. So really we don't need to work with that data.
If for some reason we really need that data needs to be handled
in the appropriate ways, it's useful to ask, does this
model impact consequential decisions like blocking
people from getting jobs or looks or health care?
In these situations, we have to be extremely careful, and often in these
situations, the model needs to be explainable to
explain why that decision was made.
A couple more questions to ask are how could
this technology be misused and what could go wrong? And I
like to call this black mirror brainstorming. And the
idea is named after the UK tv series called Black Mirror,
where they explore how technology goes very, very wrong.
Does the model treat different users fairly? Is the model
accurate overall? But is there a particular group that
it's performing very badly for? How does the training data compare
to production data? So if we rai a language
model using tweets,
that is not appropriate for doing a medical literature search,
because it's a very different language. So it's
our responsibility to make sure that those align appropriately.
Another question is, what is the environmental impact of the solution?
And there's more and more interest in this topic recently.
So, for example, if we have a
huge language model that takes days or weeks to train.
That's using a lot of computational power, it's using
a lot of electricity. So what's the environmental impact
of that? It's worth thinking about. And then how
can we address concerns that arise? So, through answering all these
questions, what concerns have arisen? Do we need to reformulate
the problem, rethink it?
And are there some risks that we can mitigate? And I'm going
to discuss some tools that we might use shortly.
I did want to highlight some special considerations for healthcare
and responsible AI. And the first
one that might come to mind for people is privacy.
So health data is
extremely sensitive and private, especially genetic data,
which tells us so much about a person and even
their family members. So it's extremely important to
maintain that privacy and not let information leak
through the model somehow. And on a similar note is security.
So we need to follow the right
regulatory requirements to handle data securely.
And sometimes that means doing a lot of training, like HIPAA
training, to be compliant with handling that kind of
sensitive data. I think an important aspect of responsible AI
in health is collaborating with domain experts.
This is crucial, I believe, for all machine learning practitioners,
especially so in healthcare. Are there doctors or
nurses or even patients who can do
a sense check?
Do you have access to that domain expert? That's really important if
you're working on a healthcare project. And then there
is this idea of open, oversees, closed science, so we want
to get the balance right. So on one hand, we have open science where,
say, we have sequencer genome for cardiovascular
research. But hey, this data
set could actually be really useful for respiratory research
as well. So could we share it with those researchers?
Because that could benefit everyone.
So that's open science, and we've got to balance that
with keeping people's data private
and secure, so we have to get that balance right.
There's also the issue of unequal access to healthcare. So that's really something
that we have to keep in the forefront of our mind.
Healthcare people in wealthier parts of the world have better access
to healthcare. And something that
I have found in the USA, because I've recently moved to the
USA, is how important it is to consider the
bias that's introduced to the cost of healthcare, because healthcare
is so expensive in the USA.
Any data about billing costs,
prices, we have to be really careful of,
because it can contain a lot of bias because
of unequal access to health care. And I'm going to show an
example of that shortly. And then lastly, race and
sex can be extremely important disease
predictors as we've seen with COVID it affects
different groups differently. However,
race and sex can also introduce a lot of bias into the model,
because historically these groups have been unfairly
treated in healthcare. What I found works quite well,
actually, is not just turning away race and sex and just ignoring
them completely, because a model can still be biased without these features,
as I'll show in the next slide. But what works really well
is to capture that data and keep that data so that you can actually
audit how fair your model is
for those different groups. But you can only do that if you have the data.
That's my recommendation, is that they're actually really helpful to have.
So this is a really interesting paper by Obermayer Etel.
It's called dissecting racial bias in an algorithm used to manage the
health of populations. And they show how an
algorithm that was actually used in production in the
USA did not use race as
a feature at all, but it was still
very racially biased. So I think this is definitely
worth checking out this paper. So now I'm going
to talk about some responsible AI
tools that you could find helpful.
So the first tool helps us answer the question,
is there a good representation of all users? And this tool
is called data sheets for data sets. And I really like the
idea because it comes from electronic
engineering, where if you buy
an electrical component, like a little microcontroller,
you always get a data sheet with it,
and the data sheet tells you all about that component, how to connect it,
what the operating temperatures are, and so on. And the idea
is that when you build a data set, you should compile
a data sheet too, explaining how the
data set has collected, who was in the
data set, who was not in the data set, what limitations there are,
so that every data set is accompanied by a data sheet.
Another tool I would recommend using is it
helps us answer the question of whether a model treats different users
fairly. And one particular tool is called Fairlearn.
Fairlearn is produced by Microsoft. It's an open source Python
package. And here I've used
it to look at overall accuracy. The first line,
so this was a model which had an area under the RoC curve
of 92%, which is great, but then it helps you
break down accuracy by different groups. So we can see how
well the model performed for female and non female
people in this example,
and we could also break down the accuracy for different races and see
how accurate it is for each of these different races.
And you can use any sensitive feature here to check
how your model is performing. And then the
last tool I wanted to mention helps us deal
with models that need to be explainable. This tool is called
interpretml. It's also an open source Python package developed
by Microsoft, and it's a whole suite of functionality
and visualizations, and also a model called the explainable
boosting machine, which prioritizes explainability without
sacrificing accuracy. Actually. And here's
an example of the explainable boosting machine applied
to the adult income data set where we're trying to predict who
earns more than $50,000 a year. And you can see
that it gives a weighting to each of the different features
to say how important that feature was for this prediction.
So for this person, we can see why the model decided
whether or not they earn more than $50,000. So in
orange we can see what was for that decision.
So like number of years of education and
in blue we can see what has against that decision.
So for example, marital status and age,
and we can see what positively affects and negatively
affects the decision about whether someone earns more than $50,000.
So it's very transparent and explainable, which is
great. So I wanted to share some resources.
I am a machine learning practitioner and I look to machine
learning researchers that are at the forefront of thinking about this
topic and I like to read and
follow. Kathy O'Neill, Hannah Wallach, Temnet Jabru,
Rachel Thomas, Deborah Rai, Kate Crawford, Arvind Narayanan
just to name a few people. I would definitely recommend
watching the coded bias documentary on Netflix.
This is a great primer if you're new to this idea of responsible
AI. Kate Crawford's book Atlas of AI
just came out. I'm also looking forward to reading the
redesigning AI book by Darren Ashimoglu, which is a collection
of essays from some of these people. And here is
a link to the Obama racial bias article,
also to the different Microsoft responsible AI tools.
And another resource is the GitHub repo
that my team has where we share our best
practices for engineering and machine learning fundamentals,
including responsible AI. So we've shared that
at this link. And then lastly, my team,
commercial software engineering, the applied machine learning team is
hiring in a number of different places. You can find our
open job roles at this link and
thank you very much. It's very easy to find
me on Twitter and LinkedIn. And thanks very much to
conf 42 for having me.