Conf42 DevSecOps 2024 - Online

- premiere 5PM GMT

The Race You Don't Want to Win - The Hidden Danger in Financial Transactions

Video size:

Abstract

As the CTO of a fast-growing financial app, I never imagined that a tiny flaw in our system could open the door to attackers. But one day, a hidden issue called a race condition allowed hackers to drain over 70 million in just minutes. We were blindsided.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Good morning, afternoon, evening, everyone. my name is Paul Edward and, I have over 12 years of experience in enterprise software development. I love the open source community. I contribute to the open source community. I love writing. I speak, I build, and most importantly, I'm a boxing advocate. It's I'll be talking to you about. A topic that is very dear to me, the risk you don't want to win. I've been fortunate to work on systems that undo sensitive transactions, which means security is always a top priority. Today, we share a story about a breach we faced, a flaw very subtle yet devastating that shook our system and our confidence. This is the race you don't want to win. So let's, dive in. So imagine two kids running to grab a cookie from a jar. The jar is supposed to hold just one cookie, but if the kids grab at this exact same time, the jar might magically let both of them win. That is a race condition. Two processes competing to access the same resources and most of the time the system gets very confused. So in, software development, this Apple went to process, try to assess or modify the same database record at the same time. Imagine your system is updating the user wallet balance and to withdraw request, it's the database simultaneously and there are no checks in place. In that case, the database might end up processing both of them, even though there was only enough money for one. So it is like your code is running just too fast for his own good. And here is the kicker. It doesn't matter what programming language you're using, Java, Python, even SQL database aren't immune. Condition doesn't discriminate the mess of any system. And that is if you're not careful. So this condition has cost the world a lot of money. Companies like, night capital has lost over 440 million, Dollars even this year gt bank has had the same issue according to the news report online And they were like, I think they had two weeks of transactional failures and so The smallest oversights most of the time. I think all the time they are very catastrophic consequence So That is why we are talking about it so here's what I put in our system, our app possesses wallet transaction. Let's say a user has 1, 000 in their wallets. Then a clever attacker discovered he could send two withdrawal requests at the exact same time. The system, thinking it is handling things one by one, approved both requests. The same 1, 000 was withdrawn twice. So it was like the system was just too polite to say, hey, wait your turn. Now, multiply this by hundreds of transactions and you are going to be staring at a massive loss. A financial loss of over 70 million and a loss of sleepless nights. how did we, tackle this? We immediately launched an investigation. Step one was to, identify the breach entry points. We come through the logs, replicated that issue in a controlled environment. We use, tools analyzers and debugging frameworks were our best friend at that point in time. It wasn't just about fixing the problem. We needed to understand. fully What really happened to be able to prevent future exploits and I will really advise you to do so as well So what are the countermeasures that we also did? Our first move was damage control. And we disabled the affected feature first of all and implemented a temporary fix just to stop that exploit. And long term, we redesigned the transaction flow to be able to undo we also added some very stricter monitoring and, monitoring and anomaly detection, ensuring that subtree regularities will just trigger an alert. And we also implemented an accounting feature, the principle of double entry, just to make sure that for every credit, there's a debit. For every debit, there's an equivalent, credit as well. The preventive measure that we took, we make sure we test for edge cases, this weird scenario where your code might break. You have to always test for edge cases and probably under a controlled environment, simulate a very high traffic, stress test your application to see if it holds up. You have to always think like an attacker. Yeah. Yeah. If you were trying to break your system, how will you do it? Ask yourself that question and review your code with concurrency in mind. Ask yourself, what happens if two people click the same button at the same time? Secure coding is all about asking the right question. Regularly test for vulnerabilities, anticipate edge cases, test concurrency on that. Eye loads as well. So I'm just going to be walking you through a very live demo to explain to you in real term how, this condition works. So I spinned up a PHP application, this wallet controller as a withdraw method. And this withdrawal method is taken into payload, the user ID and amounts. The user ID to be able to identify the wallet that we intend to make a withdrawal from. And here is where we're trying to get the, withdraw, the wallet that belongs to that user. And if that wallet is not found, we immediately throw an error that the wallet cannot be found. And if the wallet is found, we check if there's sufficient, if the balance is sufficient enough to be able to make a withdrawal. And if it, if there is no enough fund, we trigger insufficient balance straight up. This code looks very cool because you're doing all the right checks, all the things are rightly done here. And immediately after that, we deduct the money from the user, from the user's wallet balance and immediately updates our database record. And we. Is that instant? We also notified you that it would rise successful. As well. And if anything goes wrong, we throw an error telling the user that probably, okay, something went wrong. If you're looking at this code, we are doing all the right checks in place. Everything is done in the way that is expected to be done. And I'm sure if you try this, you probably throw a request at these, this to walk. very smoothly. The right amount of money that you're throwing at it is going to be, deducted accordingly. if we are going to be attacking these probably with this condition in mind, let's see what is going to happen. So here I spinned up a Python code base and, I'm calling, I'm making, A call to that endpoint. And then we have a user idea of one. And because I think I need to show you what is happening in the database here. So let's look at the wallet table. if you look at this, we have a user idea of one and the wallet balance is there. So I think let's make this, 600, and let's save this quickly. So really, so yeah, is the Python request, and, just a very busy Python script that sends, concurrent, multiple concurrent requests to these endpoints. And I think we're just going to be sending 10, simultaneous requests to these endpoints instead of one. I think, let me try to just, probably simulate just one request. And, and see, let's see what happens. We're going to be deducting 100 from the 600 box we have in our database. And if I'm calling this, probably only this Python scripts, what do we get? as you can see, your new wallet balance is 500 now, 500. And if we are to refresh this, you will see that. I think, balance has been updated to 500. Everything works very smoothly. So what about if we decide to make this, 10? Simultaneous request, like we are trying to call a end point 10 times. And, those 10 that we're trying to deduct a hundred, a hundred bucks, we should not be able to deduct the piece because a hundred multipli by 10, I think that will give us, 1000 and we just have 500 bucks in our, in our wallet balance. So let's see what happens. So I'm going to be initiating this end. If you can see, I was able to make this call 10 times 1, 2, 3, 4, 5, 6, 7, 8, 9, 10. And he's still telling me here that I have a hundred bucks as my balance. I was able to, make a withdrawal of, 1, 000 bucks, 10 on red box. It's telling me, yeah, that if I'm to refresh this, we have, a balance of a hundred bucks. What does this mean? I have cheated the system. Yeah. So just in case you have this, I would advise you to run this script against probably your code base and see, and probably the results might be very shocking to you. So one of the lessons that's, we learned from this is this, I'm coming back to explain how you able to fix things like this is, we, you need to know that small flaws have very big consequences. If single book, almost true brought down our system and prevention is better than cure tests. tests, proactively test and test repeatedly. Monitoring could have also caught this earlier, but I don't think we implemented all of this at the earlier stage. And also security is a team sport. It's a collective effort. Developers, testers, even users have roles to play in keeping systems safe. But the real tiki we here is this. Treat every bug like it could be your next big breach. I think let me repeat this again. Treat every bug like it could be the next big breach. Because sometimes it will be. Yes, it could be sometimes. So I really emphasize on also secure design and testing that Every stage as well. So before I would go back to the conclusion, let me quickly go back to the live demonstration and okay. How did we fix this? So one of the things that we did, let me go back. Yeah. Is one of the ways we can to prevents a race condition is to use things that we call disable deadlocks. So be able to control. resource access. Usual transaction logs and rollback mechanism like probably your DB transactions to, to make sure that when probably there is, a bridge at any point or there's a failure at any point, it can roll back the previous record as it has entered. And like I said earlier, introduce rigorous testing and probably in a controlled environment simulate high concurrent scenarios to be able to show that your code is running very smoothly. So let's go back to, the code base. And I think that will add more light to what I've just said. and that is, it's, so now we have a hundred books. I think I'm going to make this, 500. Now, and, let's save this. So I implemented another method here, where we call this, the withdraw fix. So we are taking in the same, payload, the user ID and the amount, but what did we do here? We, just like I mentioned in, my previous, on the slide that we should you should watch, use this, use, she used, transaction logs and go back mechanism. And that was what we did here. Laravel has a feature like this. I think this is for Laravel. other frameworks as their, other language as their own, method in also making sure for you to be able to, do your transaction log and your go back mechanism as well. But this is strictly for Laravel. So We started the transaction here that anything between this transaction when there is an issue here, it's When there's an issue here, it should immediately Roll back anything that it has done previously, so let's, sorry, I was not showing you the code base. so yeah, we started the, transaction. Yeah. I think, like I said, this is strictly for larval. So now I think let me close this. So we can have a full screen. And another thing we did while we were checking for the user that owns the wallet, we did what we call log for updates. So we're saying that the pending, when. This query is being run on the database, lock that, rule for any further updates till I'm done checking what is, what I need to do here. See when this query is successful, lock this for further updates. So and I think this is one of the first place we need to, we are, we actually started solving the issue of the risk condition and if the wallet is not found, you need to do the same thing. Thank you very much. If the wallet is not found, you notify the user that the wallet is not found. And yeah, we do what we call, we check in the database is, if you have, enough, funds in your DB and we do a row back here, just when, your wallet balance is less than, the amounts, we do that. And if not, just in case anything, any edit has been made to the database within these points. So we want it to be rollback. So here at this point, then we make the wallet deduction on the database and we commit the transaction. And if everything is successful, we return that successful message to the user. and in case of any error, we can also roll back here. I think I can also take these out, we go back here and just probably had one. this is just to be able to prove a point on how we can solve this as well. So now we are going to call the second endpoints. And I think what I call these is, I think, let me see, we draw fixed. And don't forget, we have all we are, we just have, 500 bucks in our wallet balance. So I'm going to call this again, I'm going to run this again. So we're going to still simulate 10 concurrent requests and let's see what happens. So I think 10 concurrent requests, that would still be 1000. I should not be able to withdraw more than 500 because that is what I have in mind. So we'll try it again. Good. Now, as you can see, something went wrong. Okay, we started. I hear something went wrong. Something went wrong. Yeah. Our first, but I would meet the first, 100 books. we draw a quest 400. 300, 200, 100, 100, zero, and it may start saying insufficient balance, as you can see. So we could not make that same withdrawal of 1, 000, box like we did earlier. So and if we are to check the database record, yeah, I think it should be zero right now. So we've prevented the issue of race condition. I really hope this is clear. I believe it's clear. So let's, and, let's go back to, the conclusion and yeah. Yeah. So security is in this point is a marathon. it is not just a checkbox. It is a mindset. We need to have security as a mindset. The reason you don't want to win is, is the race to fix your system after an exploits. So let's stay vigilant, test everything, and remember it is better to prevent the fire than to put it out. Thank you.
...

Paul Edward

Senior Fullstack Engineer @ Compado

Paul Edward's LinkedIn account Paul Edward's twitter account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)