Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone and welcome to my talk where I want to discuss
one three options to have data consistency in a distributed
system and this is an introduction to saga pattern
for distributed transactions. First,
let's recall what is a database transaction and why it is
important. So it is a series of operations
performed as a single unit of work and
it can include business logic checks.
On success will commit all operations to the database,
and on fail it will roll back any completed operations,
keeping data consistent.
This is usually referred as can acid acronym to translate the
requirements for transaction and transactions within
a single service are acid usually, but cross service
data consistency requires a cross service transaction management
strategy. So what is asset first?
Transactions must be atomic, so all
changes to data are performed as if they are a single operation.
That is, all the changes are performed or none of them are.
It's consistent. So data is in a consistent state
when transaction starts and when it ends isolated,
the intermediate state of the transaction is invisible to
other transaction. As a result,
transactions that run currently appear to be serialized
and durable. After transaction successfully completes,
changes to data persist and are not undone
even in the event of a system failure.
So yeah, some information about myself. I'm a software
engineer with eight plus years of experience. I write articles
about exciting back end technologies on medium.
I'm a certified node JS engineer. I also very well
versed in PHP development and DevOps.
Okay, so let's go in a monolithic application
example, when we have a single database, we can write our transaction
with the help of the database tooling that is provided.
And that is all. Modern databases will have
a mechanism to commit and roll back transactions automatically.
So let's review a stock trading application example here,
the front end application on the left will route user request
to our backend service, which is a monolith and
order service will process the request and coordinate
calls to two other different services which are fiat
currency service and stock service.
Fiat service will check if user who is purchasing
a stock has enough balance and the stock service will check
if user who is selling his stock has enough stocks.
And if everything is good, our applications will do.
Or rather, services within one single application will do four
records to the database as a single transaction
and order service will be notified of success and
will call a dependent audit service that will log four audit
trackers to database, one per each performed operation.
Now let's imagine we had a race condition and one of our
customers had not enough funds at the time
when the transaction has started. So this
is very simple to do with a monolithic application because rollback is
handled for us by database.
And in case of an error, our application
can usually just throw an exception and the transaction will be rolled back.
Funds or stock, in this example,
were not transferred and audit
logs were not created. So it's all good and
our database is consistent.
Let's now review this application. But when
it is split into microservices for each of the steps.
First, the order service receives requests from
the customer to purchase some stocks.
Order service will call both fiat currency and stock service.
To request operations with users balances.
After everything went good, order service will ask audit service
to create corresponding audit logs. And the happy path
here looks good, but this design is flawed.
Because if any of the step will fail,
we don't have a way to bring our system back to the consistent
state. And let's review an
example. In this case, our order service was
not unable to connect to the currency
service and confirm the operation.
But the stock service operation executed as usual.
And we have written stock logs to the audit service.
And we actually did transfer stocks from user to user.
But the currency balance has never changed for both users.
So, yeah, we have a problem here.
So, the saga design pattern. It is a way to manage data
consistency across microservices in a distributed transactions
scenarios. And Sega is a sequence of
transactions. That updates each service.
And publishes a message or event to trigger the next
transaction step. If the step fails, the saga
executes compensating transactions. That counteract
the preceding transactions. Let's review an example.
Now, our services are performing actions within a saga pattern.
So first order service pings currency and stock
services. To make transactions for both accounts.
Currency service passes all business logic checks
and creates a transfer record. Then it sends a message over
to Audi service to log this transaction. But the
stock service had an issue and rejected the transfer request.
And after this, the rollback
sequence is initiated with a series of compensating transactions.
So, audit service will mark previous
logs as rejected. And communicate an event to
the currency service. And after that,
currency service creates a compensating transfer between accounts.
And finally notifies the order service. That transaction
was fully rolled back. And there are two
approaches to designing saga transactions. One is
choreography, and the second is orchestration.
So, choreography here on the left is a way to coordinate sagas.
Where participants exchange events without a centralized
point of control. So, with the choreography,
each local transaction publishes events that
trigger local transactions in other services.
Orchestration is a way to coordinate sagas.
Where a centralized controller tells the saga participants.
What local transactions to execute. So the
saga orchestrator handles all the transactions and tells the participants
which operation to perform based on the events.
The orchestrator executes saga request stores and approach the states
of each task and handles failure recovery with compensating
transactions and selecting
can approach will depend on your use case because both
of them have their pros and cons.
So yeah, let's first discover the benefits of choreography
saga pattern. And they
are good for simple workflows that require few participants
and don't need a coordination logic,
doesn't require additional service implementation and maintenance,
and doesn't introduce a single point of failure
since the responsibilities are distributed across all the saga
participants.
Yeah, there are drawbacks for choreography saga pattern, and that
is workflow can become confusing when
adding new steps as it's difficult to track which saga
participant listens to which commands.
There is a risk of cyclic dependencies between saga participants
because they have to consume each hour's commands.
And integration testing can be difficult because
all services must be running to simulate a transaction.
Now, let's review the orchestration pattern and
its benefits. So it's really good for complex
workflows involving many participants or new participants
added over time. It doesn't introduce cyclical
dependencies because orchestrator depends
on the saga participants.
Saga participants don't need to know about commands for
other participants. So there is a clear separation of
concerns and it simplifies business logic.
And yeah, it's suitable when
there is control over every participant in the process and
control over the flow of activities.
And drawbacks are that additional
design complexity requires an implementation of coordination
logic. And there is an additional point of
failure because the orchestrator manages the complete
workflow.
So with the new knowledge, let's refactor our scenario
to use saga pattern with the choreography first approach.
I want to start with how it looks like for happy path
scenario. So our customer requests a new trade
with the order service. In return,
it creates an event called order create and pushes
it to a shared message broker. We have two listeners
for order create event assigned, and they start processing it
when everything is good, all microservices
and they process the operation. They do their corresponding
transactions, and each of the service create an event of its own.
For example, currency transferred and stocks transferred events
and send them to message broker.
Next, we have audit service listening through all the events,
and it creates audit records.
Once everything is done, audit service creates an order complete event.
And finally, this goes to order service telling that everything
went okay.
Let's review a possible failure scenario and how our saga pattern
can roll back across this distributed application.
So order service receives a request for a trade
from a customer and forwards it as usual to currency and
stock services.
Currency service responds fine. Yeah,
it records two transactions and emits a currency transferred
event, which in turn is picked up by
the audit service and audit logs are created.
Meanwhile, the stock service has experienced a voa
in processing and only after some time it responded
that there is not enough stock to trade on one of the accounts.
So stock service will emit an order cancel event.
But we already completed our transaction partially, so we
will have to react to this event and roll back all the changes by
running the compensated transactions only
in the currency service and in our order service.
After that, the order service is notified as well Olaysha
and can respond promptly to customers that the trade did
not happen. So this
is how choreography pattern events map
will look like for our simple implementation.
As you can see, it's only four services, but it's already a complex enough
task to document all of this.
So in the next step, let's review how we can apply the orchestration
pattern to the same application.
Okay, now let's review orchestration saga pattern applied and
we will start with the happy path.
So, communication between services will most likely be with the
help of a message broker again, but all the decisions will
be made in our orchestrator service.
So the order service first emit an order create event.
Orchestrator then notifies currency and stock
services that are required to perform transfer operations.
Once complete, both services report back to orchestrator
that everything is okay and orchestrator sends
an event to audit service for stocks
audit records, and then for currency audit records. And once
everything is reported to ZoC, orchestrator will notify order service,
the transaction is complete and
for an error scenario, let's review this example again.
Order service will emit an order create event.
Orchestrator then notifies both of our currency
and stock services to perform transfer operations
and once complete, both services report back to
orchestrator. The terrafin is okay. Again,
orchestrator sends this to audit service,
but for example, at this time, both transactions do not
pass an audit control and an error response is sent to orchestrator.
Orchestrator will initiate order cancel event for
all operations that already happened and will
wait for confirmation. And finally, it will notify
order service that transaction is cancelled.
So yeah, some things to care about when implementing
saga pattern. So it may initially be a
challenging task as it requires a new way of thinking on
how to coordinate a transaction and maintain data consistency.
The saga pattern is particularly hard to debug and the complexity
grows as participants increase.
Your implementation must be handling a set of potential
transient failures and provide impotence
for reducing side effects and ensuring data consistency.
And it's best to implement observability to monitor and track
the saga workflow right away.
Thank you for watching and good luck with software engineering.