Conf42 Python 2024 - Online

Next-Generation PeT: Exploring the Future of Multi-Party Computation Software

Video size:

Abstract

Dive into the future of privacy-enhancing technologies! Join me on a thrilling exploration of Next-Gen PeT, unveiling the Python-powered innovations shaping the evolution of Multi-Party Computation. A must-attend for Python devs eyeing the forefront of privacy tech!

Summary

  • Peter Bloom is CEO at Bloom, where we are researching multiparty computation protocols and applying them to different tasks in fintech. We'll talk about data privacy and how we can enhance it.
  • Security measures protect your data from being stolen. It protects you from bad guys. But not everyone who wants to get access to your privacy data is bad. Most of them just want to create better services for you. So data privacy is a wider concept.
  • MPC is based on quite famous cryptographic primitive called secret sharing. The idea is simple. Take a secret number and break it up into pieces so that each individual piece has no offense. Today we take a close look at the most promising of them, secure multiparty computation.
  • Secure multiparty computation is a great technology. But as always, there are some limitations. It requires infrastructure that can cost more than the one you need for commodity computations. The truly secure and efficient protocols you can use in the real world are way harder.
  • Security considerations include randomness, modular arithmetic and maliciousness. Modern MPC protocols are more practical and efficient than other previously preserving methods. I hope my talk encourage you to investigate with topic on your own contributing to a more secure digital world.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hi, everyone. My name is Peter. I'm CEO at Bloom, where we are researching multiparty computation protocols and applying them to different tasks in fintech. I'm really happy to be here with you at the conference, talking about such an important topic as data privacy and how we can enhance it. Okay, let's jump on the talk. No one will argue that some information should be kept private. For example, your bank account balance on your medical records. But what does it exactly mean to be kept private? There are plenty of different information security procedures and tools that protect your data quite well. But is it enough to just store the information securely? Well, it's not. Privacy and security are closely related concepts, but there is a quite important difference between them. The point is that the security measures protect your data from being stolen. It protects you from bad guys. But not everyone who wants to get access to your privacy data is bad. Just the opposite. Most of them wish you well. They just want to create better services for you, and that's why they need your data. So data privacy is a wider concept. And there are a couple of examples. A nice guy who has a credit card. This card stores some information that is quite sensitive for Bob because it's connected directly to Bob's bank account. Now, let's imagine that someone who is not as nice as Bob has stolen Bob's card. That is bad, because Bob could lose money. Honestly, it sounds like a disaster. But this is a classic innovations security incident, and there are clear procedures that must be followed to fix the situation. For instance, Bob can call the bank and block the card, or he can just press a couple of buttons in his bank application. Anyway, it won't take longer than ten minutes to close the issue. Well, the next example, that's Bob's annual income. Bob knows it. Bob's employer knows it. Bob's bank knows it. A lot of us who know, and there are more who would like to know as well. For instance, our banks, e commerce companies, and so on and so on. And most of them aren't bad guys, at least in the usual sense. They are just developing their businesses. Now, let's say someone has found out Bob's income. That doesn't sound as bad as a credit card been stolen, but if you think about it, it can be quite unpleasant. Bob ends up receiving banking promotions, which can be quite intrusive, or seeing advertisements he doesn't want to see. And worse, it's not easy to stop. Even if pop's income changes, it will take a while before we find out about it. So I hope this is a good example of data privacy relation. Not as tragic as a stolen credit card, but still unpleasant. And okay, another example, that's Bob's face. And he uses it not only to conquer the world with his smile, but also to unlock his phone, to pay for a flat ride at a coffee shop on the corner, or even to pass through a biometric gate in an airport. Quite a lot of important things, isn't it? If someone steals Bob's biometrics, that would be a real disaster. This is a tragic blend of security breaches and privacy relations that can have real long consequences. Bob can easily block his bank card. He can even change his annual income. But it's really hard to change one's face. So, to confirm my words, I will give a bit of data from the cost of data breach annual report by IBM. As you can see, not only the total cost of all breaches is increasing, but also the per record cost as well. This may indicate two things. There are more leaks, and the cost of each individual piece of leaky data has also gone up. Anyway, it doesn't sounds good, does it? We have been building information security system for decades, but something went wrong. Well, if you ask me, I will answer that it's because we didn't expect to have to protect our data from ourselves. We didn't expect our data to be used so widely, and we didn't expect us to give our data away to anyone so easily. We have created a lot of brilliant security measures, but ended up with data security without data privacy. Okay, not completely without, but with quite weak data privacy. And good news, we are changing that. There are some technologies that aim to enhance your data privacy. And today we are going to take a close look at the most promising of them, secure multiparty computation, or to shot NPC. And to kick off, there are a couple words about math that stand behind it. MPC is based on quite famous cryptographic primitive called secret sharing. The idea is simple. Take a secret number and break it up into pieces so that each individual piece has no offense. These pieces are called secret shares. And there are several ways to do secret sharing. The most obvious of them is in the slide. Let's just split the number into a sum of random numbers of shares. If you send a share to anyone, they can't extract any meaningful innovations from it because again, it's just a random number. To get the original number, or as they say, to reconstruct a secret, you must have all its shares. Even if you have all but one, it's impossible to know original number. Now let's see how we can add numbers using its secret shares only. You've already known Bob. Meet his friends Alice and Mallory. Let's say Bob has a secret number, three. Alice has six and Mallory has one. Assume they want to compute the sum of their numbers without revealing any information about the numbers themselves. Friends should not learn anything about each other's numbers, and neither should any third parties. No one should. It sounds like a trick, kind of, but it's quite easy to do. Using the secret sharing from the previous slide. Step one. Bob breaks up his number into a sum of three random numbers. Let it be two. Minus five and six sends minus five to Alice and sends six to Mallory. In other words, Bob shares his secret with his friends, and no one can restore it because all we get is just random numbers. Okay, Alice and Mallory do the same share with secrets. Well, step two all just add up the random numbers and get result, which is random as well. Bob gets zero, Alice gets minus three, and Mallory gets 13. And finally, step three. Bob sends his zero to Alice and Mallory. Alice sends her minus three to Mallory and Bob. And Mallory sends his 13 to Bob and Alice. So that's basically it. Everyone just adds up all they get, and he achieves. The result is ten. Friends manage to sum up their numbers, learning nothing but result variable terms do not change the sum. All we do is make terms from them. Okay, let's figure out how to multiply numbers without seeing them. Multiplication. Multiplication is a bit more complex, and there are a few different protocols. We are going to look at probably the simplest one based on a special method of secret sharing called replicated sharing. Assume Bob, who has three, and Alice, who has six, want to multiply their numbers in some private manner. To do so, we will use three servers that are ongoing to collude. These servers can be controlled by different organizations, or they can even be virtual private servers hosted by different cloud providers. Anyway, let's execute the protocol. Step one, Bob breaks up his secret into three random fares. Let them be one, four and minus two, then sends one and four to server one, four and minus two to server two and minus two and one to server three. The reason why it's called replicated sharing is because each server gets two out of three of Bob's secret shares. It's still not enough for one server to restore the secret, but any two of them can do that. That's replication. Alice does the same. She distributes her shares among free servers. Step number two, our servers perform some calculations. It's all in the slide. I won't read it out loud. Number by number by. Pay attention to the results. 1310 and minus five. If we sum it up, we will get 18. That's exactly what we expect to get when multiplying three by six. So it seems like our servers should just exchange their numbers, but most of us. I will give you a visual explanation of why this way of multiplication works. Let's say we have rectangle wave sides a and b. Okay? Then the area of rectangle is a times b. If we divide our large rectangle into nine small ones, then compute the area of each and add them all up, we get the total area of our rectangle, which is, as we already know, a times b. This is exactly what we did in the previous slide. Just ask each of three servers to compute the areas of three small rectangles. But there is an issue. Each server can compute more than the area of three rectangles. There are some areas that can be computed by more than just one server, so it could be insecure to simply exchange results. In some cases, it could lead to data leaks. What we aim to avoid. Instead, we should consider the results of each server as a private input and carry out the addition protocol that we took a look at recently. Servers distribute shares of the results and sum an up. As you can see in the slide, the result is still correct. Three times six is 18. By the way, this example shows the interesting feature of multiparty computation. We can compute quite complex functions by chaining up the calculations. And we don't have to reveal any intermediate results. We operate with secret shares all the way to a final step, which gives us that function result. And this result is the only one. I want you to just take a look at this slide. You immediately recognize this equation. It's a linear regression, the real machine learning model. And as you can see, it's no more than a blend of addition and multiplication. We can multiply numbers by operating very secret shares. In other words, we don't have to access the data to infer or even train some machine learning models. Just like regression, let's imagine that there are two banks and each have some important data about their clients. That data is more than just sensitive, it's bank secrecy, and it's often protected by law, so it could be just illegal to reveal it on the other side. Banks can profit a lot from data collaboration. Let's also imagine that there is another participant. Let's call it the model owner. Let him hold the trained regression weight. That regression consumes data from both banks and brings about, let's say, a really accurate credit score. So we can compute regression in a quiet, private way by using free service, just like we did when studying secret shared multiplication. Banks will keep their data private, and the model owner won't reveal anything about his model. It sounds pretty good, doesn't it? Of course, that example, it's a bit toyish, but I believe it gives an intuition of how multiparty computation can be applied to real world tasks. And by the way, we are solving quite similar tasks in Bloomtech every day. Okay, I hope I managed to convince you that secure multiparty computation is a great technology. That may change the way we process our data, make it more privacy preserving. But as always, there are some limitations. Firstly, it requires infrastructure that can cost more than the one you need for commodity computations. Secondly, the MPC protocols are complex. They really are. We have only covered the very basics. The truly secure and efficient protocols you can use in the real world are way harder. MPC has a significant computation overhead. You definitely notice that participants in the computation kept sending each other messages with random numbers. In real life, we use networks for this, and networks are comparably slow. To do one arithmetic operation, we have to do many. That's computation overhead. So in the end, MPC is slower than commodity and encrypted computations. But good news, we are working on it today. Protocols are way more efficient than they were yesterday. Moreover, sometimes it's worth waiting a bit when it comes to your personal data privacy. Well, we are almost done. Let's finish it up with a couple of quite important notices. Let's talk about some security considerations. The first thing I would like to draw your attention to is randomness. This may be surprising, but it isn't easy at all. In cryptography, we use special random number generators based on some physical source of randomness. And remember, using simple generators like one from the Python standard library can be completely insecure. The second is modular arithmetic. In our simple examples, we use ordinal numbers, assuming that they are infinite in some way. In reality, this is not the case, and we are forced to use more complex algebraic structures like psychic groups. The third is maliciousness. Any protocol is a strict sequence of actions that anyone who gets involved has to follow. If someone intentionally deviates, it can lead to negative effects. We can't let it happen, so we have to build our protocols to be maliciously secure. And finally, there is always a risk that some participants can collude to cheat hours. For example, if two servers in our multiplication protocol can loot, they can restore all secrets. If a trisk is unacceptable for your case, there are other small, complex protocols that aim to mitigate it. Anyway, all this is quite an advanced topic worthy of its own talk, and maybe someday we will discuss it in detail. Or feel free to reach out to me anytime and ask your questions. I will be happy to answer. And if you would like to play around with MPC and probably get some hands on experience on the topic, I can recommend a brilliant library by meta called Krypton. Again, you absolutely shouldn't use it in production due to the security considerations we have covered above, but it's a good start to get the right intuition on the MPC protocols. Well, there are some conclusions they are in the slide. In short, modern MPC protocols are more practical and efficient than other previously preserving methods. They can be implemented completely in software to execute on commodity hardware, which makes them vendor independent, and they promote privacy guarantees while keeping data useful. So that's basically it. And I hope my talk encourage you to investigate with topic on your own contributing to a more secure digital world. And thank you. Thank you very much for joining me today. As I said, you can reach out to me anytime. I will be happy to chat. And bye for now. Live long and prosper and keep an eye on your data.
...

Petr Emelianov

CEO @ Bloomtech

Petr Emelianov's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)