Dealing with unexpected failures is one of the hardest problems to solve, especially in a distributed system.
A Microservice needs to be resilient to failures, which means that it needs to be able to restart when a failure happens.
In the Microservice environment there are a lot of communication between services and sometimes a service may not be responding, which can be a problem to the services that are trying to communicate with the non-responsive service.
Let`s see how to implement two design patters that can help us to make a Microservice Resilient.
Exponential Back off Pattern
Sometimes non-responsive Microservices are transient and they get fixed after a short delay. There are several problems that can cause a Microservice to stop responding, for example: Network issues, timeout, etc…
When it happens all we have to do is wait a little bit and try to call it again.
The Exponential Back off Pattern will help us achieve that, every time that an API call fails the pattern will exponentially wait and try again. Let`s suppose that a call to an API failed for the first time:
- The system will wait 2 seconds (2s pow 1) and try again;
- If it fails again it will wait 4 seconds (2s pow 2);
- If it fails again it will wait 8 seconds (2s pow 3);
The formula is pretty simple: Wait Time = Seconds pow Number of Attempt.
Why should we wait exponentially?
If a Microservice is too busy handling too many requests, send more requests won`t help at all.
Waiting exponentially will help prevent socket exhaustion.
Should keep trying to call the Microservice forever?
No, we try a couple of times and if it still doesn`t work then the application should handle the error.
Circuit Breaker Pattern
When the error is not transient and it`s not gonna get fixed anytime soon we need to immediately fail all the API calls to that service and return an error warning that it`s not available.
This pattern will prevent the system do make more requests to a non-responsive Microservice to avoid socket exhaustion and give a break to the Microservice that is non-responsive.
This pattern will open a Circuit for a certain amount of time and any call to the non-responsive Microservice while the circuit is open will automatically return an error.
Implementing the Patterns
Create an ASP.NET Core Web Application
Change the framework version to 3.0
Install the package Microsoft.Extensions.Http.Polly
Create an interface that will represent the HttpClient class that will make calls to other service.
Create a class that will implement the interface
Note that we are not creating new instances of the HttpClient class, it will be created by the HttpClientFactory and injected to our class.
Let`s setup the HttpClientFactory in the startup class.
This will ensure that the HttpClient will be created by the .NET Core Framework and injected into our class.
In order to make the patterns work we need to use let the framework manage the instances for, never create an instance of the HttpClient class manually.
Let`s create the implementation of the Exponential Back off pattern
In the startup class create the method:
When we implement the back off pattern using Polly we need to filter the status codes that we are going to apply the pattern.
In the example above, every time the application gets a NotFound status code as result of a call, a new attempt will be made.
The max number of attempts is set to 3 and for each attempt we are waiting exponentially, in the example we are doing a 2 second pow Attempt.
Let`s create the implementation of the Circuit Breaker pattern
In the example above we created a policy to open a circuit that will automatically fails all the requests during 30 seconds after 3 failed attempts.
Now let`s add the policies we created to our client factory
We already have registered the factory to the ExampleClient class then all we have to do is add the policies to it, see below:
The Circuit Breaker and Back off patterns are excellent to improve resilience and they work together, while Back off will retry a service call a couple of times the other will prevent more calls for a certain period of time after those retries to prevent problems like socket exhaustion.