This throttling or rate-limiting policy will reject the next request and give it a 429 response if there are already X_number_of_requests in progress from the client on a company.
The response will contain the header "x-rate-limit-policy: ConcurrentRequestsLimitPolicy” to tell the client which exact policy was violated. The server will not send a retry-after or similar header, since the server will not know how long it takes before the current requests are finished executing.
This is why the client itself needs to keep track of its requests. The client should have a counter for concurrent requests.
It needs to be increased on every new request sent and decreased on every response received.
The counter should be in "shared" storage if the client exists as several instances.
This way, the client will know if it is allowed to issue the next request; when counter < X_number_of_requests.
We are still gathering data to best determine how many concurrent calls we should be allowed. The X_number_of_requests variable will be clearly communicated beforehand.
We do not support queueing requests, other than what is standard for web servers. Neither will we implement it for the normal flow, where the purpose is a request/response pattern. Such queueing of requests would soon result in timeout errors on the clients, since they would have to wait for the responses.
However, we are working on a way to call the APIs with a special header to indicate that the entire request will be persisted in a storage or queue and will respond immediately with 202-Accepted, while the actual invocation is done as a background task. There will be a webhook notification when a response is available.
This approach is suitable for clients that do not need to wait for a response in order to continue its flow, or for clients that need to carry out an operation via the API that will take a while. This work is in progress and is expected to be delivered in Q1/22.
... View more