Conf42 Python 2024 - Online

Advanced API Design for Scalable and Fault-Tolerant Data-Intensive Distributed Systems

Video size:

Abstract

As data-intensive distributed systems become increasingly prevalent in modern computing landscapes, designing robust and efficient APIs becomes paramount. The talk addresses the challenges of scalability, fault tolerance, and performance in complex, distributed environments.

Summary

  • What is an API? The term itself answers this question. It is used for interaction between microservices in a system. API is like an abstraction to the client, where the client uses it to fetch the resources without having to know the details of the underlying implementation.
  • RPC and HTTP are both communication protocols used in distributed systems. They have different design principles, communication patterns and typical use cases. RPC is focused on direct invocation of remote procedures or functions. HTTP is a more general purpose protocol for transferring data on the web.
  • Rest transfer representational state transfer is an architectural style for designing networked applications. Restful APIs are stateless, meaning that each request from a client or an end user coming to the server must contain all the information necessary for the server to fulfill that request. Other important feature provided by the rest architecture is in these API calls which are made through HTTP.
  • Apache thrift is a software framework developed by Facebook for scalable cross language services development. Thrift is primarily used for defining and creating efficient and interoperable RPC services. With thrift you can do the development for asynchronous communication between different microservices.
  • The good API design choice is using callbacks with these API calls. What is asynchronous API calls with callback? Let's take a use case. It doesn't give a good user experience if you have to wait for such a long time. There has to be a way to be resilient to recover from failures.
  • What are these events? So imagine there is slack application, like a chat application. All of this system is slack, and external partner is GitHub. Whenever there is an event on the external partner like GitHub, when that event is triggered, it sends web books. Webhooks are always asynchronous, but the callbacks can sometimes be synchronous.
  • Another aspect of APIs is rate limiting. It's like limiting the total number of requests coming from a client in a specific time window. Why is it needed? Obviously you want to prevent abuse. This also promotes the stability for the system and of the distributed systems.
  • The last aspect of the API design is idempotency. Every request should be associated with an item potency key, a unique key associated with a specific client. These are the four important design aspects that needs to be considered while designing APIs for data intensive applications.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Firstly, let's get some introduction, right? What are these data intensive distributed systems? So all our end users of this in our day to day life example, Netflix streaming, e commerce systems like Amazon.com, TikTok, Instagram shop photo upload systems like Instagram, and Tinder proximity system like Yelp, Google Maps, all of them are data intensive distributed systems in the backend. And what is an API? The term itself answers this question. Right, an interface for application programming. So there are two perspectives to this. One is the system or the server perspective. It is used for interaction between microservices in a system, or for interaction across two or more servers or software applications, and the communication between systems in simple terms. And the other perspective is the client perspective, which is API is seen as a way for a client to communicate to the server with all the inputs and requesting for some resources. It is like an abstraction. API is like an abstraction to the client, where the client uses it to fetch the resources without having to know the details of the underlying implementation. It's like API is like a steering in a car where the driver doesn't need to know how the engine works. So the details of the API implementation need not be known. So with this, let's get right into the aspects of what is needed. First and foremost, the design aspect we need to consider is rest for HTTP API calls and thrift for RPC API calls. Now let's understand a bit about HTTP, why and where. So now let me introduce an architectural diagram for any modern data intensive distributed system, and it's a very most simplistic way, like this diagram. Firstly, it shows a user who is using any system, like a payment system or Netflix or any other example I gave earlier, any real world example. So obviously the request goes through the load balancer and load balancer redirects to the right backend system. There will be multiple systems, and then again from the load balancer it goes out. If there is an integration needed with an external partner, say for example in payments, you might have to integrate with visa or stripe or paypal, any of those things as an example. Now as you can see, some of these calls are HTTP and some of these calls are RPC. And why? So firstly, HTTP is like a networking protocol used by the client most of the times. Or the end user to call or invoke a server. Or HTTP can also be used by one software system or one microservice to call another microservice. It's possible it's used in both ways, but the RPC is a protocol which is used only within the back end system, like within the services. As you can see here, microservice one calls microservice two with an RPC, and gateway calls microservice one with an RPC because they are within a particular back end server. And why? Because RPC is highly secure and RPC is remote. Procedural call, obviously. And it is more about invoking a function from one service to another service. So when you're invoking a function, you need to know the exact request and response to the point, right? And HTTP is more generic, or it is more on the web exposed to the external world with those put, get, post and all those port types get put, post and delete. So in summary, RPC and HTTP are both communication protocols used in distributed systems, but they have different design principles, communication patterns and typical use cases. RPC is focused on direct invocation of remote procedures or functions like a process running, say microservice two runs a process and microservice one needs to call that process. It is done using RPC, while HTTP is a more general purpose protocol for transferring data on the web or requesting data on the web. And that's the reason why the external gateways and one service to another service are usually called through HTTP. So now let's talk about the rest for HTTP and thrift for now. What is rest in the API world? Rest transfer representational state transfer. It is an architectural style for designing networked applications, particularly these web services like oh, web services. Just that the distributed systems which are published and used by the outside world. Now, restful APIs added to the principles of REST and API design to provide a standardized way for systems to communicate over HTTP. And now, as I mentioned, the restful APIs use the HTTP methods, which I just mentioned. And another important feature of the rest and why it should be used for HTTP is stateless communication. So what is it? These restful APIs are stateless, meaning that each request from a client or an end user coming to the server must contain all the information necessary for the server to fulfill that request. The server does not store any client state between the requests, like request one to request two doesn't store any states. The reason for that is it improves the scalability and simplifies the communication. Imagine the server has to store any information from the previous call and not keeping them independent. It adds a lot of overhead and unnecessary information needs to be saved and added, and you don't want any strings attached between the client and the server or client and the web service, right? So that's the reason why the stateless communication is very important in this HTTP, which is provided by the rest architecture. And the other one is resource oriented design, meaning say, when you are requesting, say you are using Instagram, and what do you do when you want to look at some comments or post? You just go to the user and click on, say, comments, right? What happens internally is it makes an API call through HTTP and it makes that the uri like a specific domain, www.instagram.com users user id post. And similarly, if you're looking at some comments of a post, it goes through instagram.com posts post id and comments. And so you see that the hierarchy of say, you have users user id and post or post post id and comments, that is called hierarchical structure, and that is also provided by this architecture of ReSt. And it helps you identify the resource in the most simplistic and smooth form so that the back end system can retrieve it and uniquely return that particular data. And returning or all of this data is represented in JSON or XML or any other format in rest architecture. And the other important feature provided by the rest architecture is in these API calls which are made through HTTP by the end user or anyone who is making HTTP calls. You can add authentication, authorization, rate limiting. All these things are not core logic. These are exterior things which just needed to be added. Like say for example, if you want to look at the number of likes on a particular comment, or number of likes or number of comments on a particular post, they require some computation, although it is as simple as adding things. But you don't want to include all that computation. You want to keep all that computation separate from things like external things, like additional things like authorization, authentication, rate limiting, and all of these things. So all these things are provided, especially when you're making calls over the web or over the mobile or whatever it is from external outside of the actual server. So these additional features are always provided by HTTP, which ensures that the RPC calls can have only the logic being computed by invoking the APS within the microservices. Now, talking about the thrift, why thrift for RPC? Thrift is what is thrift, first of all? So thrift refers to Apache thrift. It's a software framework and also protocol. It is developed by Facebook for scalable cross language services development. So scalable cross language services development, that's the key here. So I'll tell about it. Thrift is primarily used for defining and creating efficient and interoperable RPC services. Now, how do both these things happen? Interoperable RPC communication and cross language services development. How do these things happen? There is one important feature provided by thrift which is called code generation. So it's like you can create a thrift file, say you want one microservice to talk to another microservice through some API, through some RPC call, and you can create a thrift file containing the request response and the API with these request response parameters. You can define it in thrift and you can run commands depending on whatever you use, like Golang or Java or whatever you use, you have the specific commands generate code like maven or any kite or gin frameworks. If you use for go, it generates all the boilerplate code which is needed, like request objects, response objects, and all the API skeleton and the things needed for concurrent request handling. All these things are provided by thrift auto generated code. So this auto generated code can be generated for different languages. Say your one microservice runs on Java, another microservice runs on Python, and you can use thrift to generate auto generate code for these APIs. Say in microservice two, you auto generated the code as a server code, and in microservice one, you auto generated code using this thrift file as a client code. You can generate both. And now the client can call the server by whatever through doing that RPC call. And it is cross platform or cross language. It can work because if you're manually doing this right, if you're writing the implementation, everything in Java and in Python and in go three services now have to interact. There will be lot of incompatibility issues and there are a lot of manual effort which needs to be done. But with this thrift you don't have to worry about it. Cross language support is in itself provided by this code generation and all the features provided by the thrift, it is absolutely amazing. And another thing is for RPC, right, you want the communication to be as fast as possible, and that is supported by binary protocol, the data. So thrift uses this binary protocol for efficient communication between services. This binary protocol is optimized for performance and reduces both network overhead and serialization deserialization costs. It's really quick and really efficient. So it is a lot better compared to the text based protocols like JsON or XML. So for between microservices communication or between systems communication, it's always better to use thrift, which provides all these advantages. Oh, and another big advantage is scalability thrift. Like I mentioned a bit a few minutes earlier about the concurrent requests, the thrift software framework provides by generating the code like the APIs can handle concurrent requests, right? It has support for it. And with thrift you can do the development for asynchronous communication between different microservices and how that can be done. Or what is asynchronous communication I'll be covering in my next chapter. And now moving on to chapter two, as we are talking about asynchronous API calls. So now the second aspect of the design, the good API design choice is using callbacks with these API calls. So what is asynchronous API calls with callback? Let's take the same example, but zoom in a bit more into the imagine. Let's take a use case. Right? So if you are a user making a payment on an ecommerce platform, do you want to wait until you know your payment? Do you want to wait until all the processing in the back end to be done, like microservice one to two to three and then external channel, your visa or Mastercard is processing. It doesn't give a good user experience if you have to wait for such a long time. So what you have to, and also with the amount of requests which are computing in, it doesn't scale, right? You can't just have all the calls synchronously waiting with each other. So for that, what we need is a synchronous communication. And how that is achieved is say you as a user make a pay request at step number before step number one here on the slide, and that pay request is sent to a distributed queue like a Kafka or RocketMQ. And that's it. That's it for the microservice one. And microservice one sends a response back to the user saying that it's in processing. So the user knows that it is in processing. And then microservice two has like a consumer which reads from this message queue and takes it and does whatever processing and then again puts in a different distributed queue at step number three. Now when you have to make a call, external call again, there will be a consumer which will be reading from that kafka queue, from Kafka queue two and makes a HTTP call. Now how does the external partner, or all these microservices know that the processing is complete? That is called callback. Now when the external partner processes the request or the API call which it received, it is immediately going to send a response. After it processes, it is going to use that callback URL to send a response back to the microservice two. And how this actually happens is we can see in the code. Here is an example, as you can see on my screen in this, there is this function, you can see in the middle process async. And that is the API, say that is the API which is being called from microservice three to external partner from steps three, four and five, right? Say consumer picks up the distributed queue message which is about calling process async and it calls step number five at step number five to the external channel. And once it calls a post, you can see I just added a simple sleep showing that it is doing some asynchronous processing. It is doing some whatever, all the logic and algorithm implementation or whatever. Once that is done in the request, if you see at the bottom request data has something called callback URL. That means the request received at this endpoint already has a URL at which the processing of the API can send the response back. Now that's what invoke callback does. So inside the processor sync, if you see the invoke callback actually gets the callback URL, like requestdata get callback URL and it invokes. Now when this invoke callback is done, it actually calls the microservice two here. So that's what I have mentioned. In the third point, after the processing is complete, the server invokes a callback URL at step number six in the previous slide. And so once it receives, when microservice two receives that callback, it knows that, okay, the previous request I sent, I received the callback and this is complete. Now, microservice two also needs to inform microservice one about whatever the pay is complete or not. Now I added another queue here which is distributed message queue three. So imagine as part of once you receive the callback in a microservice, the microservice might have to do certain things like storing the state in a database or make another RPC call to a different service to update something. And what if during that process of callback handling there is a failure? There has to be a way to recover, right? There has to be a way to, the system has to be resilient enough or fault tolerant enough to recover from such states. So that's why we have this q three. So now at step number seven, when the callback is sent or put in the queue, microservice one picks up that callback from the queue and it tries to do processing like saving into the database and all those things. And if it fails, if that process fails, that callback handler in microservice one won't acknowledge that it received the callback. It received the callback from the distributed message queue three. So when it doesn't acknowledge, the message still remains in the queue. So the microservice one will again fetch or read that callback message from the queue to reprocess it again. So that way it retries and retries until the message is totally consumed, which means the entire callback handler is done. So that's how this whole concept works. And the key point to note here is the design aspect of the request containing the callback URL. So that's the whole point here. So in the processing callback also two steps here. One is you need to have that callback URL in the request. And the second part is making a call to that callback URL once all the processing is done by the callback handler. In this case, the callback handler is in the external partner and it sends a callback to the microservice too. So that's the example I have demonstrated. Now, apart from the callbacks, there is another way you can do this. Asynchronous processing with APIs is something called webhook. I think webhook is again a very generic term. Callbacks or webhooks are almost similar, but just that webhook is something, callback is something where you send a URL in the request, the caller sends a URL in the request to the collie and the collie calls back, right? Webhook is like you keep a process or a URL open and then the collie puts the data onto that URL, onto that placeholder. You can think of Webhook as a placeholder in the caller, where the collie puts the data. So now let's go into some details or differences here. So the callback is initiated by requests. As I mentioned here, like one system, the caller calls another system, the collie, and the caller includes a callback function or URL as part of the request, indicating where the callie should send the response or notification. Right? Now, webhook are typically initiated by events or triggers that occur in one system, especially the sender. And when the event occurs, the sender makes a HTTP request to a URL to the receiver. So receiver is microservice two in this example, and the sender is external partner. And now the receiver does not actively request for data. Instead it waits for the incoming request from the sender. So you can think of it as outbound communication. So as a receiver of the webhook, you are actually pulling the data, you have something open, you are actually pulling the data into it. That's Webhook. And in callback that's not the case. Callback is like you have sent a request to the external partner and external partner uses the URL to push the data. So it's a push model. And callback is obviously if it is push model, from the receiver side, it is all about the data coming to it. So it's inbound communication right? Now, I know these terms are a bit computing to understand, but let me give an example. It's simpler. So the example I just gave for callback, as I demonstrated, is for a payment system, the external partner can be visa and it got the URL, it processed and sent back. Now let's take an example for webhook for clarity. What are these events? So imagine there is slack application, like a chat application, which is you often see at work as a developer, right? When there is a GitHub activity, pull request has been created, pull request has been merged, comments have been made, you get this notification onto the slack. How is that happening? That's because say creating a pull request is an event and that event is happening. And the slack is the microservice two here, or you can think of microservice two and one. All of this system is slack, and external partner is GitHub. And whenever there is an event on the external partner like GitHub, when that event is triggered, it actually sends web books are initiated by those events. And then the sender makes a HTTP call, which is the sender here is GitHub. It makes a HTTP call to the defined URL onto the receiver, which is slack, passing all the data onto it. And that's how the slack gets to know that, okay, there is a pull request update or whatever has happened. And as you can see, webhooks are always asynchronous. You're not waiting on anything, but the callbacks can sometimes be synchronous. Like once you send the request to a third party, it is possible that you can wait and you can get back the response. So moving on to the next chapter. Now is another aspect of APIs is rate limiting. Now designing rate limiting for APIs. So what is rate limiting and why? And the term itself says you have to limit the rate. So say if you are a user using, doing some payments on the payment system, or you're using again uploading lot of pictures or accessing lot of comments. You can click on, say, comments multiple times as a user, or you can actually click on too many pay button or payments on a payment system. And there has to be a way to limit, or you're trying to do a lot of purchases, right? So there has to be a way in the system to limit it. And how can you do it at the API level and why should you limit it? So it's like limiting the total number of requests coming from a client in a specific time window. Say you're making a request, API request in say ten requests in say 30 minutes, 40 minutes, that's fine. But you need to know it's a two dimensional thing. Ten requests in 1 second. So those are the things that need to be defined as a system capacity and things like that. Now why is it needed? I wrote it here. Obviously you want to prevent abuse and there might be a lot of other users who are using the same endpoint. So you want to have fair access to these resources and you don't want API to be overwhelmed by these excessive requests. And this also promotes the stability for the system and of the distributed systems. And also the reliability is guaranteed with you saying that, okay, you're not abusing the system. And you can see in this diagram, when there are a lot of requests which are coming onto the system, the rate limiter sends immediately 429 saying that, okay, you have exceeded certain limit on the number of requests that can go. And this logic can either be written at the gateway or it can be like at the HTTP level, or it can be written in the back end system itself. But I personally always designed where the systems do the rate limiting at the gateway layer. And again, this is a vast topic. It's a trade off. Again, we can discuss pros and cons in any different session. Now let's go to the implementation of this. It's quite simple. I took the same similar language example I'm using. Like the Java spring boot application. You can see a rate limit API here. The base URI is API and base path is API. And then for any specific resource, you have resource as the extended URI. And whenever someone calls at this endpoint, you can actually just annotate it. Say you see here, the annotation here is limit five duration 60. That means one client can't request or call this API at this endpoint for more than five times in 60 seconds. So if the client is making a request more than five times in 60 seconds, that means he will be rate limited. He or she won't be able to get a response saying 429 or saying that yeah, you can't access more than this or rate limited whatever, be the message. That is debatable. So it's just about annotating. So ensure when you're designing APIs and we're implementing, you need to have that rate limiting aspect in your mind as a very important one in these large data intensive applications when there might be lots and lots of requests you can't even imagine. Now the last aspect of the API design is idempotency. Now what is idempotency? Now I'll take the payment example again. Payment system, say you are an end user and you did a payment, right? So what happens when you make a payment? The request goes to the HTTP request goes to like a post pay request goes to a payment system. And what if you immediately attempted a pay the second time? Immediately, instantly, because the button is still enabled due to some UI issue or whatever, be the reason, right? Do you want your money to be deducted twice or what? No matter how many times you perform the activity of, say, pay or any other activity which are supposed to be idempotent, not all aps are idempotent. So that's an important one. So pay is definitely idempotent. The response should be same. Here in this example you see the first attempt payment and second payment retry both have the same response. Payment succeeded and there won't be any additional operation. In the first operation the money gets deducted from your account in the payment system and all the processing happens. And the second retry. In the retry, all that doesn't happen, the money doesn't get deducted. And how this can be prevented, how can this be done? It's simple. It's as simple as you can have a database. I mentioned it in purple box here. Every request should be associated with an item potency key, a unique key associated with a specific client, and this item potency key need not be added by the client. And for simplicity, I just didn't put any boxes in between client and payment system, but this can be handled by a separate service within the back end system itself or at the gateway. Again, that's debatable. It's a design choice. So let's assume that is sorted out and for every request, unique request which is coming from a specific client will have an item put in c key. And it's like a UUId. I gave that example. One, two, three, ABC. When the first payment is made that gets entered into the database and you can see when the second attempt, the payment retry happens. The backend system checks. Is there any entry in the database? It's like a simple key value. It can be a map or any key value like a redis cache or whatever it is, right? Again, that's a design choice. The system will check whether there is an existing key in the db or not. So it doesn't process the same request again the moment it finds that key in the DB. And once it checks and oh, there is already this existing one and then boom, sends the same response again without any additional processing. Let's take an example again like spring boot application I was showing you earlier. And the same example, you can see this is another API API handler like user controller. Say it's like update user. I took an example, update user. Someone is making a call to update some details of a user and when in the request, the user id, when it is obtained you can see the implementation. You are actually accessing the db. Here I use DB as a user map, as the dB dal get dB table data access layer gets the table and in user map you check whether that key is already contained or not. And this user id is the idempotent key and whatever the UUID I gave in the previous example. And if it is already contained then you just send that response saying user information is already contained or whatever, the payment is already succeeded and you don't have to do anything. Only if it is not contained in the DB, then you go ahead with whatever the database update or payment calculation by deducting from the account and things like that. So yes, that's it. So these are the four, some of the important design aspects that needs to be considered while designing APIs for data intensive applications. And there are many more security aspects of the system and pagination and filtering that can be done when you have lot of data coming from the server. And how do you do pagination and in the APIs and how do you implement that and how do you incorporate the security and other aspects of the system and error handling, because you have lots and lots of requests coming. You obviously have lots and lots of errors also coming. And how do you write the APIs in such a way that the response also contains the proper error message and error handling is in the right way in the API contract. So these are some of the other things which I couldn't cover in this session because I wanted to keep it short and maybe I can have some other opportunity to talk more in details about this. But yeah, thanks a lot for this opportunity. I really thank the Conf 42 Python team for providing me this opportunity to speak. And any questions anyone have, has, can reach out to me. I'm Santosh. Nikhil Kumar, available on LinkedIn. And, yeah, thanks a lot for watching the video.
...

Santosh Nikhil Kumar

Senior Software Engineer @ ByteDance

Santosh Nikhil Kumar's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)