Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
Today we'll talk about my favorite topic, anti fraud platform.
We'll mainly talk about the FinTech case study, but we'll also talk about
other industry examples and philosophies about generalized frameworks.
Would be not excessive to say that I've been working on anti fraud
for more than four years in a variety of B2C product companies.
I also successfully applied the exact math that I studied at university
and, Build, effective ML anti fraud pipelines in Yandex advertisement,
working with the cheaters in Playrix and fraudsters in Axios.
So the plan for today to begin with, we look at the example of what fraud
is and why anti fraud is needed.
We're going to analyze payment fraud in the FinTech industry
and the cheaters in game.
Next, we'll go from simple to complex.
As any ML engineer would think in solving the problem of anti fraud, what would be
the pitfalls and how to overcome them?
We'll talk about ML system design, real time processing, and even about
other step in pipeline development in anti fraud process, except ML.
In the beginning, let's delve into the concept of anti fraud.
Let's figure out why a company can benefit from it.
To understand what anti fraud is, first you need to define the fraud, of course.
Let's say these are customer sections that are not intended by the company, which
result in deterioration of k metrics.
If the client uses internal inefficiency of the company's mechanisms, fraud
calls, endogeneous, otherwise exogeneous.
Accordingly, we call anti fraud mechanism and process that
prevent this malicious action.
I'd say that the fight against fraud can be fully automated where the decisions
are made based on data and understanding of the product, but without manual
work, it is almost impossible to leave the anti fraud platform completely.
The definition though, the metric was not chosen by chance.
It's convenient for us to formalize fraud at the set of complex action of
an agent with our internal system, when eventually has an effect on our metric.
So for a unique user ID, for our fraudster, we can, compare
the corresponding drop in money retention and so on.
let's start with a simple example of fraud in mobile game dev.
usually free to play games are built with paywalls, a difficult place
where only a very skilled player can pass through, or they must pay in a
in game stop for additional benefits.
A froster can use some software to break the game and, bypass
the most difficult levels.
at the same time, if the game has an online competition with other
participants, they Same player can latch huge advantage over others, which spoils
impression for the rest people for sure.
Here the key metric of detection will be number of attempts per level.
How quickly a person passed a difficult level?
Did it happen on the first time, for example?
If yes, it's worth to think about it.
Now it's getting a bit harder.
why does any fintech, company have, so of course the place where
customers money is stored, which means they're only mechanism.
Also a mechanism for how this money gets there.
We'll discuss, the study in payments, a transactional fraud linked to
foreign exchange conversion rates.
Let's imagine that, a techie from Europe used a, Payment system
to place Euro into USD account.
at the time of submitting the application, the billing system generate a document
with a fixed price of Euro USD.
However, the person actually had the right to pay it or not.
Therefore, he has time to see where the Euro USD rate will move.
if the Euro becomes cheaper, then a person can make a transaction.
If it becomes more expensive, then, do not send money at all.
Don't do transaction.
Those, the payment billing creates an option for the client to buy dollars,
sell errors when, the insurance of this option costs zero for client.
Actually, all that remains for the client is just to monitor
the fixed rate and create a fraud with a positive math expectation.
Okay, let's build an, automatic anti fraud.
Assume that, we have a.
payment cost metric that takes into account the money that comes
to us, like in one currency in Euro or Indian rupee or whatever.
And, which goes to the payment, in another currency commissions, measure
and, normalizes anality somehow, and many other indicators, complex, big metric.
We will draw a time series for this metric with a blue line in time.
Let's take the classic gradient boot boosting, train the prediction of the
value, train, and predict variance of time series by autoregression.
And, look at the segments and people in particular.
draw orange line for a specific segment.
let's call it India by country, segment, already at that stage, if we
made a good, classical ML prediction of time series, we can identify
the pitfalls, segment in detail.
now we have a hypothesis that payment fraud may occurred rupee and dollar.
a few words about ML.
What about ML?
I will be blunt here and, suggest that you don't waste, your time trying to optimize
neural networks to catch anti fraud.
there is a very simple and, understandable reason why.
as theory of probability and ML algorithms in particular work when the same event can
happen many times in similar conditions.
If this is done, then we can create prop function, build statistic
and make hypothesis, whatever.
Obviously, it is quite difficult and long to train neural network.
But, it's even harder to make it stable.
And, here I can compare Antifraud with, crazy gladiatorian arena
in which you never know what will happen in the next moment.
So the same fraud can have many different complex variation.
It is very quickly adapt to anti fraud and therefore neural networks
go bad faster than you set it up.
So better to forget about it here.
At first glance, it's more or less simple.
The model is trained offline on a data lake.
Kubernetes launch a model in the persistent pod that
lists the data from Kafka.
this data is a user data.
why is it so easy?
if we don't have heavily loaded services and, we need only client data.
Technically any ML engineer can adjust it nowadays, but it's not easy.
In a realistic situation, usually predicting sometimes is quite a
difficult task, and we need a lot of different representative data to increase
accuracy, reduce variance also and so on.
In our case, market data is generally necessary by definition of our fraud.
What then?
then we need a hire of team C developers who will design an
effective reader of market quotations.
You can imagine how can, How much the load on services will increase, the
complexity of data security and so on.
it is already worth thinking about the infrastructure.
So the antifraud system doesn't slow down the load on the product itself and
the customer do not experience delays.
Therefore, we'll have a market provider and the aggregator block in our ML design.
In fact, this is an extremely simplified scheme since
aggregation itself includes many.
processes start from data qualities, checks and, and a fair architecture
of microservices that allows you to combine data in convenient tabular
form with the correct timestamp.
What other problem could be, it's too generalized, isn't it?
Of course, we are trying to come up with a function.
that, will well reflect the bad action of customers, but often
an ensemble of model is needed.
You need to clearly understand that in order to achieve high accuracy in
anti fraud with a decent recall, it's not enough just to train model well.
It's necessary to study very careful what kind of fraud you have and associate a
separate set of metrics for each fraud.
accordingly.
So in our example, the generalized payment cost can be divided into
commission costs, market arbitration, different platforms, arbitration time,
cost, for, interaction with the payment system, counterparties and so on.
this, our infrastructure is, expanding and requires an individual analyst,
data scientist, approach to identify clusters of metric models that will
give the best accuracy of forecasts.
On the top of answer in the model, you need to choose the ideal algorithmic
solution that will match your risk appetite, namely to answer the question,
how often we can make a mistake.
it's time to talk, how to make, this ensemble perform often.
in the beginning, the SMPU Spark was, used as a default solution,
but it was, sorely lacking.
Now Trina is used for, modeling and, computing in my current company,
which shows itself much better due to, completed work with, memory.
Another option here would be, GPU Spark calculation, but it's not
trivial to run properly sometimes.
A huge problem, especially in, transactional and market data is
an uneven load on infrastructure.
So our news, Yeah, we have a, periods with a high load in our case,
it's in use, but in your company, it may be a different period.
this period, period with a high priority at this moments, all prioritization
with all resources is given to the vital function of, of the product, making
transaction or, Transcription sponsored by RenaissanceRe Traits, whatever, while
the model, politely receives information, actually, it's just not efficient
and, we can't calculate it properly.
From this point of view of ML, a specialist, we have, two solution either
to, intelligently adjust the model.
to the high load intervals, or to make a separate model, not in real time
that is looking for fraud on the high load, the second wall of calculation.
In total, we'll have, fast, efficient, and accurate collection almost always.
But in the most difficult moments of product, we reduce amount of resources
allocated to anti fraud block and, either perform in a simple version
without the wasting time and, catching the big, biggest frauders, the most
painful in terms of metric or recalculate this entire block into the past a
few minutes after load stabilize.
So a flying support team.
in fact, in fact we come to a very important part of today's story and, what
about the before and after ML model part?
These are people, analyst and anti fraud officer standing there.
look at the probability formula, please.
If we.
competently built a pipeline for filtering good users from frauders from, fraudsters
while, filters will be independent is just the, with the right processes, we'll
make a huge improvement in accuracy of the entirely anti fraud platform because
our filters of analytical ML and the operational team will go independent way.
All different teams have a, they're all researchers and
thoughts how to capture fraudsters.
let's look at the full pipeline using an example.
The analytical department signal, signal that a strange activity was
happening in India, for example, dropping a little deeper our metric.
and, they noticed that the people associated with the one
particular payment system had dropped it significant on average.
Okay.
Now email engineers step into the game.
look at the metric, study data, predict this metric, and, catch
quantiles from people, with a particular strange behavior.
After the correct validation of the model, it became obvious that some
of them can blocked immediately.
These are, molecular fraudsters.
But, in the same part where the accuracy is, close to average,
ML engineers are not sure.
And, they give, these cases to the operational team.
they sort out the, borderline cases, they're digging into the deepest essence
of why the fraud occurred and, making decision with their heads and hands.
what next?
Monitoring.
Also very important part.
Monitoring both online and offline metric is crucial to ensure the
system operates efficiently.
One of the most important online metrics should reflect the
potential business cost of fraud.
such costs related to the payment system, may help us to
prevent it or calculate, uh.
it an efficient way, in particular tracking, not just the value, but
also the behavior of the cost.
it can provide the early detection mechanism for fraud, helping business
to balance fraud prevention, with the operational team as well.
Offline metric provide a deeper relation of the model security over
time, given the class imbalance.
Often present in the fraud detection, metrics like peer oak
and, particularly useful for, assembling the model's performance.
Additionally, monitoring the false positive rate is crucial.
as, because we want to minimize false positives and, we need to maintain.
A positive user experience while, effectively, fighting against fraud.
And, visualization of this metric allows to team, analyst on or,
operational team identify, emerging fraud patterns and, spike activities.
Therefore, as you can see, online metrics.
using visualization and the special tools go directly to rapid response people
to check complex edge cases, either when the model is uncertain or where
metrics are contradictory and they need to, detailed analysis, offline metrics
as usual will help us to determine the degree of degradation of the model.
And, over time we'll retrain it.
Once we need it.
so thank you for your attention and, thanks, to, Conf42 for the opportunity
to talk about ML in antifraud.
The main conclusion from my story can be drawn that, don't need to over
complicate places with the mathematics that you do not require it while the
proper process approach allows you to improve the quality of the model.
Multiple times.
Thank you.