Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi, everyone. My name is Peter. I'm CEO
at Bloom, where we are
researching multiparty computation protocols and
applying them to different tasks in fintech.
I'm really happy to be here with you at the conference,
talking about such an important topic as data privacy
and how we can enhance it.
Okay, let's jump on the talk.
No one will argue that some information should
be kept private. For example, your bank account
balance on your medical records. But what
does it exactly mean to be kept private? There are plenty
of different information security procedures and tools
that protect your data quite well. But is it enough
to just store the information securely?
Well, it's not. Privacy and security are
closely related concepts, but there is a quite important difference between
them. The point is that the security measures protect
your data from being stolen. It protects you from
bad guys. But not everyone who wants to get access to
your privacy data is bad. Just the opposite.
Most of them wish you well.
They just want to create better services for you, and that's
why they need your data. So data privacy
is a wider concept. And there are a couple of
examples. A nice guy who has
a credit card. This card stores
some information that is quite sensitive for Bob because
it's connected directly to Bob's bank account.
Now, let's imagine that someone who is not as nice as Bob
has stolen Bob's card. That is bad,
because Bob could lose money.
Honestly, it sounds like a disaster.
But this is a classic innovations security incident,
and there are clear procedures that must be followed to
fix the situation. For instance, Bob can
call the bank and block the card, or he can
just press a couple of buttons in his bank application.
Anyway, it won't take longer than ten minutes to
close the issue. Well, the next example,
that's Bob's annual income.
Bob knows it. Bob's employer knows it.
Bob's bank knows it. A lot of us who know,
and there are more who would like to know as well.
For instance, our banks, e commerce companies,
and so on and so on. And most of them
aren't bad guys, at least in the usual sense.
They are just developing their businesses.
Now, let's say someone has found out Bob's income.
That doesn't sound as bad as a credit card been stolen,
but if you think about it, it can be quite
unpleasant. Bob ends up receiving
banking promotions, which can be quite intrusive,
or seeing advertisements he doesn't want to see.
And worse, it's not easy to stop.
Even if pop's income changes, it will
take a while before we find out about it.
So I hope this is a good example of
data privacy relation. Not as tragic
as a stolen credit card, but still unpleasant.
And okay, another example,
that's Bob's face.
And he uses it not only to conquer the
world with his smile, but also to unlock his phone,
to pay for a flat ride at a coffee shop on the corner,
or even to pass through a biometric gate in
an airport. Quite a lot of important things,
isn't it? If someone steals Bob's biometrics,
that would be a real disaster. This is a tragic blend
of security breaches and privacy relations that can have real long
consequences. Bob can easily block his bank
card. He can even change his annual income.
But it's really hard to change one's face.
So, to confirm my words, I will give a
bit of data from the cost of data breach annual report
by IBM. As you can see,
not only the total cost of all breaches is increasing,
but also the per record cost as well.
This may indicate two things. There are
more leaks, and the cost of each individual
piece of leaky data has also gone up.
Anyway, it doesn't sounds good, does it?
We have been building information security system for
decades, but something went wrong.
Well, if you ask me, I will answer that
it's because we didn't expect to have to
protect our data from ourselves. We didn't
expect our data to be used so widely, and we
didn't expect us to give our data away to anyone
so easily. We have created a lot of brilliant
security measures, but ended up with data security
without data privacy.
Okay, not completely without, but with quite weak data
privacy. And good news,
we are changing that. There are some technologies
that aim to enhance your data privacy. And today
we are going to take a close look at the most promising
of them, secure multiparty computation,
or to shot NPC. And to
kick off, there are a couple words
about math that stand behind it.
MPC is based on quite famous cryptographic
primitive called secret sharing.
The idea is simple. Take a secret number and
break it up into pieces so that each individual piece has
no offense. These pieces are called secret
shares. And there are several ways to do secret sharing.
The most obvious of them is in the slide.
Let's just split the number into a sum of
random numbers of shares. If you send a share
to anyone, they can't extract any meaningful
innovations from it because again, it's just a random
number. To get the original number,
or as they say, to reconstruct a secret,
you must have all its shares. Even if you have
all but one, it's impossible to know original
number. Now let's see how
we can add numbers using its secret shares
only. You've already known Bob. Meet his friends
Alice and Mallory. Let's say Bob has
a secret number, three. Alice has six
and Mallory has one. Assume they
want to compute the sum of their numbers without
revealing any information about the numbers themselves.
Friends should not learn anything about each other's numbers,
and neither should any third parties. No one
should. It sounds like a trick, kind of,
but it's quite easy to do. Using the secret sharing from the previous
slide. Step one. Bob breaks up
his number into a sum of three random numbers.
Let it be two. Minus five and
six sends minus five to Alice and
sends six to Mallory. In other words,
Bob shares his secret with his friends,
and no one can restore it because all we get is
just random numbers. Okay,
Alice and Mallory do the same share with secrets.
Well, step two all just add
up the random numbers and get result,
which is random as well. Bob gets zero,
Alice gets minus three, and Mallory
gets 13. And finally,
step three. Bob sends his zero to Alice and
Mallory. Alice sends her minus three to Mallory
and Bob. And Mallory sends his 13
to Bob and Alice. So that's
basically it. Everyone just adds up all they
get, and he achieves. The result is
ten. Friends manage to sum up their numbers,
learning nothing but result variable terms
do not change the sum. All we do is make terms
from them. Okay, let's figure
out how to multiply numbers without seeing them.
Multiplication. Multiplication is a bit more complex,
and there are a few different protocols.
We are going to look at probably the simplest one
based on a special method of secret sharing
called replicated sharing. Assume Bob,
who has three, and Alice, who has six,
want to multiply their numbers in some private manner.
To do so, we will use three servers that are
ongoing to collude. These servers can be controlled
by different organizations, or they can even
be virtual private servers hosted by
different cloud providers. Anyway, let's execute
the protocol. Step one, Bob breaks up his
secret into three random fares. Let them be
one, four and minus two,
then sends one and four to server one,
four and minus two to server two and
minus two and one to server three.
The reason why it's called replicated sharing is
because each server gets two out of
three of Bob's secret shares. It's still
not enough for one server to restore the secret,
but any two of them can do that.
That's replication. Alice does the same.
She distributes her shares among free servers.
Step number two, our servers perform some calculations.
It's all in the slide. I won't read it out
loud. Number by number by. Pay attention to the results.
1310 and minus five.
If we sum it up, we will get 18.
That's exactly what we expect to get when multiplying three by
six. So it seems like our servers should
just exchange their numbers, but most of us.
I will give you a visual explanation of
why this way of multiplication works.
Let's say we have rectangle wave sides a
and b. Okay? Then the area of
rectangle is a times b. If we
divide our large rectangle into nine small ones,
then compute the area of each and add them all up,
we get the total area of our rectangle, which is,
as we already know, a times b.
This is exactly what we did in the previous slide.
Just ask each of three servers to compute the
areas of three small rectangles.
But there is an issue.
Each server can compute more than the area of three
rectangles. There are some areas that can
be computed by more than just one server,
so it could be insecure to simply exchange results.
In some cases, it could lead to data leaks.
What we aim to avoid.
Instead, we should consider the results of each
server as a private input and carry out
the addition protocol that
we took a look at recently. Servers distribute
shares of the results and sum an up.
As you can see in the slide, the result is still
correct. Three times six is 18.
By the way, this example shows the interesting feature
of multiparty computation. We can
compute quite complex functions by chaining
up the calculations. And we don't have to
reveal any intermediate results.
We operate with secret shares all the way to a final
step,
which gives us that function result. And this
result is the only one. I want
you to just take a look at this slide.
You immediately recognize this equation. It's a
linear regression, the real machine learning
model. And as you can see, it's no more
than a blend of addition and multiplication.
We can multiply numbers by operating very
secret shares. In other words, we don't have
to access the data to infer or even train
some machine learning models. Just like regression,
let's imagine that there are two banks
and each have some important data about their clients.
That data is more than just sensitive,
it's bank secrecy, and it's
often protected by law, so it
could be just illegal to reveal it on the
other side. Banks can profit a lot from data
collaboration. Let's also imagine that
there is another participant. Let's call it the
model owner. Let him hold the trained regression
weight. That regression consumes data from
both banks and brings about, let's say, a really accurate
credit score. So we can compute regression
in a quiet, private way by using free service,
just like we did when studying secret shared multiplication.
Banks will keep their data private,
and the model owner won't reveal anything about
his model. It sounds pretty good, doesn't it?
Of course, that example,
it's a bit toyish, but I believe it
gives an intuition of how multiparty computation
can be applied to real world tasks. And by
the way, we are solving quite similar tasks
in Bloomtech every day. Okay,
I hope I managed to convince you that secure
multiparty computation is a great technology. That may
change the way we process our data,
make it more privacy preserving. But as always,
there are some limitations. Firstly,
it requires infrastructure that can cost
more than the one you need for commodity computations.
Secondly, the MPC protocols are complex. They really
are. We have only covered the very basics.
The truly secure and efficient protocols you can
use in the real world are way harder.
MPC has a significant computation overhead.
You definitely notice that participants in
the computation kept sending each other messages
with random numbers. In real life, we use
networks for this, and networks are comparably
slow. To do one arithmetic
operation, we have to do many.
That's computation overhead.
So in the end, MPC is slower
than commodity and encrypted computations.
But good news, we are working on it
today. Protocols are way more efficient than
they were yesterday. Moreover,
sometimes it's worth waiting a bit when it
comes to your personal data privacy.
Well, we are almost done. Let's finish it up
with a couple of quite important notices. Let's talk about
some security considerations. The first thing
I would like to draw your attention to
is randomness. This may be surprising,
but it isn't easy at all.
In cryptography, we use special random number generators
based on some physical source of randomness.
And remember, using simple generators like
one from the Python standard library can be completely
insecure. The second is modular
arithmetic. In our simple examples, we use
ordinal numbers, assuming that they
are infinite in some way. In reality,
this is not the case, and we are forced to use
more complex algebraic structures like psychic groups.
The third is maliciousness.
Any protocol is a strict sequence of actions that
anyone who gets involved has to follow.
If someone intentionally deviates, it can lead
to negative effects. We can't let it happen, so we
have to build our protocols to be maliciously secure.
And finally, there is always a risk that
some participants can collude to cheat hours.
For example, if two servers in our multiplication
protocol can loot, they can restore all secrets.
If a trisk is unacceptable for your case, there are
other small, complex protocols that aim to mitigate it.
Anyway, all this is quite an advanced topic
worthy of its own talk, and maybe
someday we will discuss it in detail.
Or feel free to reach out to me anytime and ask
your questions. I will be happy to answer.
And if you would like to play around
with MPC and probably get some
hands on experience on the topic,
I can recommend a brilliant library by meta
called Krypton. Again, you absolutely shouldn't
use it in production due to the security considerations
we have covered above, but it's a good start to get
the right intuition on the MPC protocols.
Well, there are some conclusions
they are in the slide. In short,
modern MPC protocols are more practical and
efficient than other previously preserving methods.
They can be implemented completely in software to
execute on commodity hardware, which makes them vendor
independent, and they promote
privacy guarantees while keeping data useful.
So that's basically it. And I hope
my talk encourage you to investigate with topic on your
own contributing to a more secure digital world.
And thank you. Thank you very much for joining
me today. As I said, you can reach out
to me anytime. I will be happy to chat. And bye
for now. Live long and prosper and keep
an eye on your data.