An Introduction to Saga Pattern for Distributed Transactions

Video size:

Abstract

The Saga Pattern is a powerful technique for ensuring data consistency in a distributed system with multiple microservices. By breaking down a large transaction into smaller sub-transactions, each of which can be independently committed or rolled back, the Saga Pattern allows for more granular control over the transaction process. In my talk, I want to review Saga Pattern’s benefits, drawbacks, and when it’s applicable to use.

Summary

Saga design pattern is a way to manage data consistency across microservices in a distributed transactions scenarios. Modern databases will have a mechanism to commit and roll back transactions automatically. Cross service data consistency requires a cross service transaction management strategy.
Choreography saga pattern is good for simple workflows that require few participants and don't need a coordination logic. It doesn't introduce cyclical dependencies because orchestrator depends on the saga participants. drawbacks are that additional design complexity requires an implementation of coordination logic and integration testing can be difficult.
So with the new knowledge, let's refactor our scenario to use saga pattern with the choreography first approach. Let's review a possible failure scenario and how our saga pattern can roll back across this distributed application. Some things to care about when implementing saga pattern.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hello everyone and welcome to my talk where I want to discuss one three options to have data consistency in a distributed system and this is an introduction to saga pattern for distributed transactions. First, let's recall what is a database transaction and why it is important. So it is a series of operations performed as a single unit of work and it can include business logic checks. On success will commit all operations to the database, and on fail it will roll back any completed operations, keeping data consistent. This is usually referred as can acid acronym to translate the requirements for transaction and transactions within a single service are acid usually, but cross service data consistency requires a cross service transaction management strategy. So what is asset first? Transactions must be atomic, so all changes to data are performed as if they are a single operation. That is, all the changes are performed or none of them are. It's consistent. So data is in a consistent state when transaction starts and when it ends isolated, the intermediate state of the transaction is invisible to other transaction. As a result, transactions that run currently appear to be serialized and durable. After transaction successfully completes, changes to data persist and are not undone even in the event of a system failure. So yeah, some information about myself. I'm a software engineer with eight plus years of experience. I write articles about exciting back end technologies on medium. I'm a certified node JS engineer. I also very well versed in PHP development and DevOps. Okay, so let's go in a monolithic application example, when we have a single database, we can write our transaction with the help of the database tooling that is provided. And that is all. Modern databases will have a mechanism to commit and roll back transactions automatically. So let's review a stock trading application example here, the front end application on the left will route user request to our backend service, which is a monolith and order service will process the request and coordinate calls to two other different services which are fiat currency service and stock service. Fiat service will check if user who is purchasing a stock has enough balance and the stock service will check if user who is selling his stock has enough stocks. And if everything is good, our applications will do. Or rather, services within one single application will do four records to the database as a single transaction and order service will be notified of success and will call a dependent audit service that will log four audit trackers to database, one per each performed operation. Now let's imagine we had a race condition and one of our customers had not enough funds at the time when the transaction has started. So this is very simple to do with a monolithic application because rollback is handled for us by database. And in case of an error, our application can usually just throw an exception and the transaction will be rolled back. Funds or stock, in this example, were not transferred and audit logs were not created. So it's all good and our database is consistent. Let's now review this application. But when it is split into microservices for each of the steps. First, the order service receives requests from the customer to purchase some stocks. Order service will call both fiat currency and stock service. To request operations with users balances. After everything went good, order service will ask audit service to create corresponding audit logs. And the happy path here looks good, but this design is flawed. Because if any of the step will fail, we don't have a way to bring our system back to the consistent state. And let's review an example. In this case, our order service was not unable to connect to the currency service and confirm the operation. But the stock service operation executed as usual. And we have written stock logs to the audit service. And we actually did transfer stocks from user to user. But the currency balance has never changed for both users. So, yeah, we have a problem here. So, the saga design pattern. It is a way to manage data consistency across microservices in a distributed transactions scenarios. And Sega is a sequence of transactions. That updates each service. And publishes a message or event to trigger the next transaction step. If the step fails, the saga executes compensating transactions. That counteract the preceding transactions. Let's review an example. Now, our services are performing actions within a saga pattern. So first order service pings currency and stock services. To make transactions for both accounts. Currency service passes all business logic checks and creates a transfer record. Then it sends a message over to Audi service to log this transaction. But the stock service had an issue and rejected the transfer request. And after this, the rollback sequence is initiated with a series of compensating transactions. So, audit service will mark previous logs as rejected. And communicate an event to the currency service. And after that, currency service creates a compensating transfer between accounts. And finally notifies the order service. That transaction was fully rolled back. And there are two approaches to designing saga transactions. One is choreography, and the second is orchestration. So, choreography here on the left is a way to coordinate sagas. Where participants exchange events without a centralized point of control. So, with the choreography, each local transaction publishes events that trigger local transactions in other services. Orchestration is a way to coordinate sagas. Where a centralized controller tells the saga participants. What local transactions to execute. So the saga orchestrator handles all the transactions and tells the participants which operation to perform based on the events. The orchestrator executes saga request stores and approach the states of each task and handles failure recovery with compensating transactions and selecting can approach will depend on your use case because both of them have their pros and cons. So yeah, let's first discover the benefits of choreography saga pattern. And they are good for simple workflows that require few participants and don't need a coordination logic, doesn't require additional service implementation and maintenance, and doesn't introduce a single point of failure since the responsibilities are distributed across all the saga participants. Yeah, there are drawbacks for choreography saga pattern, and that is workflow can become confusing when adding new steps as it's difficult to track which saga participant listens to which commands. There is a risk of cyclic dependencies between saga participants because they have to consume each hour's commands. And integration testing can be difficult because all services must be running to simulate a transaction. Now, let's review the orchestration pattern and its benefits. So it's really good for complex workflows involving many participants or new participants added over time. It doesn't introduce cyclical dependencies because orchestrator depends on the saga participants. Saga participants don't need to know about commands for other participants. So there is a clear separation of concerns and it simplifies business logic. And yeah, it's suitable when there is control over every participant in the process and control over the flow of activities. And drawbacks are that additional design complexity requires an implementation of coordination logic. And there is an additional point of failure because the orchestrator manages the complete workflow. So with the new knowledge, let's refactor our scenario to use saga pattern with the choreography first approach. I want to start with how it looks like for happy path scenario. So our customer requests a new trade with the order service. In return, it creates an event called order create and pushes it to a shared message broker. We have two listeners for order create event assigned, and they start processing it when everything is good, all microservices and they process the operation. They do their corresponding transactions, and each of the service create an event of its own. For example, currency transferred and stocks transferred events and send them to message broker. Next, we have audit service listening through all the events, and it creates audit records. Once everything is done, audit service creates an order complete event. And finally, this goes to order service telling that everything went okay. Let's review a possible failure scenario and how our saga pattern can roll back across this distributed application. So order service receives a request for a trade from a customer and forwards it as usual to currency and stock services. Currency service responds fine. Yeah, it records two transactions and emits a currency transferred event, which in turn is picked up by the audit service and audit logs are created. Meanwhile, the stock service has experienced a voa in processing and only after some time it responded that there is not enough stock to trade on one of the accounts. So stock service will emit an order cancel event. But we already completed our transaction partially, so we will have to react to this event and roll back all the changes by running the compensated transactions only in the currency service and in our order service. After that, the order service is notified as well Olaysha and can respond promptly to customers that the trade did not happen. So this is how choreography pattern events map will look like for our simple implementation. As you can see, it's only four services, but it's already a complex enough task to document all of this. So in the next step, let's review how we can apply the orchestration pattern to the same application. Okay, now let's review orchestration saga pattern applied and we will start with the happy path. So, communication between services will most likely be with the help of a message broker again, but all the decisions will be made in our orchestrator service. So the order service first emit an order create event. Orchestrator then notifies currency and stock services that are required to perform transfer operations. Once complete, both services report back to orchestrator that everything is okay and orchestrator sends an event to audit service for stocks audit records, and then for currency audit records. And once everything is reported to ZoC, orchestrator will notify order service, the transaction is complete and for an error scenario, let's review this example again. Order service will emit an order create event. Orchestrator then notifies both of our currency and stock services to perform transfer operations and once complete, both services report back to orchestrator. The terrafin is okay. Again, orchestrator sends this to audit service, but for example, at this time, both transactions do not pass an audit control and an error response is sent to orchestrator. Orchestrator will initiate order cancel event for all operations that already happened and will wait for confirmation. And finally, it will notify order service that transaction is cancelled. So yeah, some things to care about when implementing saga pattern. So it may initially be a challenging task as it requires a new way of thinking on how to coordinate a transaction and maintain data consistency. The saga pattern is particularly hard to debug and the complexity grows as participants increase. Your implementation must be handling a set of potential transient failures and provide impotence for reducing side effects and ensuring data consistency. And it's best to implement observability to monitor and track the saga workflow right away. Thank you for watching and good luck with software engineering.

Slides

Download slides (PDF)

See all 54 talks at this event!

Conf42 Cloud Native 2023 - Online

March 30 2023

An Introduction to Saga Pattern for Distributed Transactions

Video size:

Abstract

Summary

Transcript

Slides

Dmitry Khorev

Senior Software Engineer @ Mero

Join the community!

Featured event

2025

2024

Info

Conf42 Cloud Native 2023 - Online

March 30 2023

An Introduction to Saga Pattern for Distributed Transactions

Video size:

Abstract

Summary

Transcript

Slides

Dmitry Khorev

Senior Software Engineer @ Mero

Join the community!