Skip to content

Commit f2753b9

Browse files
committed
docs: update circuit breaker
1 parent ccad122 commit f2753b9

File tree

3 files changed

+66
-248
lines changed

3 files changed

+66
-248
lines changed

circuit-breaker/README.md

Lines changed: 66 additions & 248 deletions
Original file line numberDiff line numberDiff line change
@@ -6,21 +6,22 @@ tag:
66
- Cloud distributed
77
- Fault tolerance
88
- Microservices
9+
- Retry
910
---
1011

1112
## Also known as
1213

13-
* Fault tolerance switch
14+
* Fault Tolerance Switch
1415

1516
## Intent
1617

17-
The Circuit Breaker pattern aims to prevent a software system from making calls to a part of the system that is either failing or showing signs of distress. It is a way to gracefully degrade functionality when a dependent service is not responding, rather than failing completely.
18+
To prevent a system from repeatedly trying to execute an operation likely to fail, allowing it to recover from faults and prevent cascading failures.
1819

1920
## Explanation
2021

21-
Real world example
22+
Real-world example
2223

23-
> Imagine a web application that has both local files/images and remote services that are used for fetching data. These remote services may be either healthy and responsive at times, or may become slow and unresponsive at some point of time due to variety of reasons. So if one of the remote services is slow or not responding successfully, our application will try to fetch response from the remote service using multiple threads/processes, soon all of them will hang (also called[thread starvation](https://en.wikipedia.org/wiki/Starvation_(computer_science))) causing our entire web application to crash. We should be able to detect this situation and show the user an appropriate message so that he/she can explore other parts of the app unaffected by the remote serv'ice failure. Meanwhile, the other services that are working normally, should keep functioning unaffected by this failure.
24+
> Consider a real-world example of an e-commerce website that depends on multiple external payment gateways to process transactions. If one of the payment gateways becomes unresponsive or slow, the Circuit Breaker pattern can be used to detect the failure and prevent the system from repeatedly attempting to use the problematic gateway. Instead, it can quickly switch to alternative payment gateways or display an error message to the user, ensuring that the rest of the website remains functional and responsive. This avoids resource exhaustion and provides a better user experience by allowing transactions to be processed through other available services.
2425
2526
In plain words
2627

@@ -32,265 +33,82 @@ Wikipedia says
3233
3334
## Programmatic Example
3435

35-
So, how does this all come together? With the above example in mind we will imitate the functionality in a simple example. A monitoring service mimics the web app and makes both local and remote calls.
36+
Imagine a web application that uses both local files/images and remote services to fetch data. Remote services can become slow or unresponsive, which may cause the application to hang due to thread starvation. The Circuit Breaker pattern can help detect such failures and allow the application to degrade gracefully.
3637

37-
The service architecture is as follows:
38+
1. **Simulating a Delayed Remote Service**
3839

39-
![alt text](./etc/ServiceDiagram.png "Service Diagram")
40+
```java
41+
// The DelayedRemoteService simulates a remote service that responds after a certain delay.
42+
var delayedService = new DelayedRemoteService(serverStartTime, 5);
43+
```
4044

41-
In terms of code, the end user application is:
45+
2. **Setting Up the Circuit Breaker**
4246

4347
```java
48+
// The DefaultCircuitBreaker wraps the remote service and monitors for failures.
49+
var delayedServiceCircuitBreaker = new DefaultCircuitBreaker(delayedService, 3000, 2, 2000 * 1000 * 1000);
50+
```
4451

45-
@Slf4j
46-
public class App {
47-
48-
private static final Logger LOGGER = LoggerFactory.getLogger(App.class);
49-
50-
/**
51-
* Program entry point.
52-
*
53-
* @param args command line args
54-
*/
55-
public static void main(String[] args) {
56-
57-
var serverStartTime = System.nanoTime();
58-
59-
var delayedService = new DelayedRemoteService(serverStartTime, 5);
60-
var delayedServiceCircuitBreaker = new DefaultCircuitBreaker(delayedService, 3000, 2,
61-
2000 * 1000 * 1000);
62-
63-
var quickService = new QuickRemoteService();
64-
var quickServiceCircuitBreaker = new DefaultCircuitBreaker(quickService, 3000, 2,
65-
2000 * 1000 * 1000);
66-
67-
//Create an object of monitoring service which makes both local and remote calls
68-
var monitoringService = new MonitoringService(delayedServiceCircuitBreaker,
69-
quickServiceCircuitBreaker);
70-
71-
//Fetch response from local resource
72-
LOGGER.info(monitoringService.localResourceResponse());
73-
74-
//Fetch response from delayed service 2 times, to meet the failure threshold
75-
LOGGER.info(monitoringService.delayedServiceResponse());
76-
LOGGER.info(monitoringService.delayedServiceResponse());
77-
78-
//Fetch current state of delayed service circuit breaker after crossing failure threshold limit
79-
//which is OPEN now
80-
LOGGER.info(delayedServiceCircuitBreaker.getState());
81-
82-
//Meanwhile, the delayed service is down, fetch response from the healthy quick service
83-
LOGGER.info(monitoringService.quickServiceResponse());
84-
LOGGER.info(quickServiceCircuitBreaker.getState());
85-
86-
//Wait for the delayed service to become responsive
87-
try {
88-
LOGGER.info("Waiting for delayed service to become responsive");
89-
Thread.sleep(5000);
90-
} catch (InterruptedException e) {
91-
LOGGER.error("An error occurred: ", e);
92-
}
93-
//Check the state of delayed circuit breaker, should be HALF_OPEN
94-
LOGGER.info(delayedServiceCircuitBreaker.getState());
95-
96-
//Fetch response from delayed service, which should be healthy by now
97-
LOGGER.info(monitoringService.delayedServiceResponse());
98-
//As successful response is fetched, it should be CLOSED again.
99-
LOGGER.info(delayedServiceCircuitBreaker.getState());
100-
}
101-
}
52+
3. **Monitoring Service to Handle Requests**
53+
54+
```java
55+
// The MonitoringService is responsible for calling the remote services.
56+
var monitoringService = new MonitoringService(delayedServiceCircuitBreaker, quickServiceCircuitBreaker);
57+
58+
// Fetch response from local resource
59+
LOGGER.info(monitoringService.localResourceResponse());
60+
61+
// Fetch response from delayed service 2 times to meet the failure threshold
62+
LOGGER.info(monitoringService.delayedServiceResponse());
63+
LOGGER.info(monitoringService.delayedServiceResponse());
10264
```
10365

104-
The monitoring service:
66+
4. **Handling Circuit Breaker States**
10567

10668
```java
107-
public class MonitoringService {
108-
109-
private final CircuitBreaker delayedService;
110-
111-
private final CircuitBreaker quickService;
112-
113-
public MonitoringService(CircuitBreaker delayedService, CircuitBreaker quickService) {
114-
this.delayedService = delayedService;
115-
this.quickService = quickService;
116-
}
117-
118-
//Assumption: Local service won't fail, no need to wrap it in a circuit breaker logic
119-
public String localResourceResponse() {
120-
return "Local Service is working";
121-
}
122-
123-
/**
124-
* Fetch response from the delayed service (with some simulated startup time).
125-
*
126-
* @return response string
127-
*/
128-
public String delayedServiceResponse() {
129-
try {
130-
return this.delayedService.attemptRequest();
131-
} catch (RemoteServiceException e) {
132-
return e.getMessage();
133-
}
134-
}
135-
136-
/**
137-
* Fetches response from a healthy service without any failure.
138-
*
139-
* @return response string
140-
*/
141-
public String quickServiceResponse() {
142-
try {
143-
return this.quickService.attemptRequest();
144-
} catch (RemoteServiceException e) {
145-
return e.getMessage();
146-
}
147-
}
148-
}
69+
// Fetch current state of delayed service circuit breaker after crossing failure threshold limit
70+
LOGGER.info(delayedServiceCircuitBreaker.getState()); // Should be OPEN
71+
72+
// Meanwhile, the delayed service is down, fetch response from the healthy quick service
73+
LOGGER.info(monitoringService.quickServiceResponse());
74+
LOGGER.info(quickServiceCircuitBreaker.getState());
14975
```
15076

151-
As it can be seen, it does the call to get local resources directly, but it wraps the call to remote (costly) service in a circuit breaker object, which prevents faults as follows:
77+
5. **Recovering from Failure**
15278

15379
```java
154-
public class DefaultCircuitBreaker implements CircuitBreaker {
155-
156-
private final long timeout;
157-
private final long retryTimePeriod;
158-
private final RemoteService service;
159-
long lastFailureTime;
160-
private String lastFailureResponse;
161-
int failureCount;
162-
private final int failureThreshold;
163-
private State state;
164-
// Future time offset, in nanoseconds
165-
private final long futureTime = 1_000_000_000_000L;
166-
167-
/**
168-
* Constructor to create an instance of Circuit Breaker.
169-
*
170-
* @param timeout Timeout for the API request. Not necessary for this simple example
171-
* @param failureThreshold Number of failures we receive from the depended service before changing
172-
* state to 'OPEN'
173-
* @param retryTimePeriod Time period after which a new request is made to remote service for
174-
* status check.
175-
*/
176-
DefaultCircuitBreaker(RemoteService serviceToCall, long timeout, int failureThreshold,
177-
long retryTimePeriod) {
178-
this.service = serviceToCall;
179-
// We start in a closed state hoping that everything is fine
180-
this.state = State.CLOSED;
181-
this.failureThreshold = failureThreshold;
182-
// Timeout for the API request.
183-
// Used to break the calls made to remote resource if it exceeds the limit
184-
this.timeout = timeout;
185-
this.retryTimePeriod = retryTimePeriod;
186-
//An absurd amount of time in future which basically indicates the last failure never happened
187-
this.lastFailureTime = System.nanoTime() + futureTime;
188-
this.failureCount = 0;
189-
}
190-
191-
// Reset everything to defaults
192-
@Override
193-
public void recordSuccess() {
194-
this.failureCount = 0;
195-
this.lastFailureTime = System.nanoTime() + futureTime;
196-
this.state = State.CLOSED;
197-
}
198-
199-
@Override
200-
public void recordFailure(String response) {
201-
failureCount = failureCount + 1;
202-
this.lastFailureTime = System.nanoTime();
203-
// Cache the failure response for returning on open state
204-
this.lastFailureResponse = response;
205-
}
206-
207-
// Evaluate the current state based on failureThreshold, failureCount and lastFailureTime.
208-
protected void evaluateState() {
209-
if (failureCount >= failureThreshold) { //Then something is wrong with remote service
210-
if ((System.nanoTime() - lastFailureTime) > retryTimePeriod) {
211-
//We have waited long enough and should try checking if service is up
212-
state = State.HALF_OPEN;
213-
} else {
214-
//Service would still probably be down
215-
state = State.OPEN;
216-
}
217-
} else {
218-
//Everything is working fine
219-
state = State.CLOSED;
220-
}
221-
}
222-
223-
@Override
224-
public String getState() {
225-
evaluateState();
226-
return state.name();
227-
}
228-
229-
/**
230-
* Break the circuit beforehand if it is known service is down Or connect the circuit manually if
231-
* service comes online before expected.
232-
*
233-
* @param state State at which circuit is in
234-
*/
235-
@Override
236-
public void setState(State state) {
237-
this.state = state;
238-
switch (state) {
239-
case OPEN -> {
240-
this.failureCount = failureThreshold;
241-
this.lastFailureTime = System.nanoTime();
242-
}
243-
case HALF_OPEN -> {
244-
this.failureCount = failureThreshold;
245-
this.lastFailureTime = System.nanoTime() - retryTimePeriod;
246-
}
247-
default -> this.failureCount = 0;
248-
}
249-
}
250-
251-
/**
252-
* Executes service call.
253-
*
254-
* @return Value from the remote resource, stale response or a custom exception
255-
*/
256-
@Override
257-
public String attemptRequest() throws RemoteServiceException {
258-
evaluateState();
259-
if (state == State.OPEN) {
260-
// return cached response if the circuit is in OPEN state
261-
return this.lastFailureResponse;
262-
} else {
263-
// Make the API request if the circuit is not OPEN
264-
try {
265-
//In a real application, this would be run in a thread and the timeout
266-
//parameter of the circuit breaker would be utilized to know if service
267-
//is working. Here, we simulate that based on server response itself
268-
var response = service.call();
269-
// Yay!! the API responded fine. Let's reset everything.
270-
recordSuccess();
271-
return response;
272-
} catch (RemoteServiceException ex) {
273-
recordFailure(ex.getMessage());
274-
throw ex;
275-
}
276-
}
277-
}
80+
// Wait for the delayed service to become responsive
81+
try {
82+
LOGGER.info("Waiting for delayed service to become responsive");
83+
Thread.sleep(5000);
84+
} catch (InterruptedException e) {
85+
LOGGER.error("An error occurred: ", e);
27886
}
87+
88+
// Check the state of delayed circuit breaker, should be HALF_OPEN
89+
LOGGER.info(delayedServiceCircuitBreaker.getState());
90+
91+
// Fetch response from delayed service, which should be healthy by now
92+
LOGGER.info(monitoringService.delayedServiceResponse());
93+
94+
// As successful response is fetched, it should be CLOSED again.
95+
LOGGER.info(delayedServiceCircuitBreaker.getState());
27996
```
28097

281-
How does the above pattern prevent failures? Let's understand via this finite state machine implemented by it.
98+
Summary of the example
28299

283-
![alt text](./etc/StateDiagram.png "State Diagram")
100+
- Initialize the Circuit Breaker with parameters: `timeout`, `failureThreshold`, and `retryTimePeriod`.
101+
- Start in the `closed` state.
102+
- On successful calls, reset the state.
103+
- On failures exceeding the threshold, transition to the `open` state to prevent further calls.
104+
- After the retry timeout, transition to the `half-open` state to test the service.
105+
- On success in `half-open` state, transition back to `closed`. On failure, return to `open`.
284106

285-
- We initialize the Circuit Breaker object with certain parameters: `timeout`, `failureThreshold` and `retryTimePeriod`which help determine how resilient the API is.
286-
- Initially, we are in the `closed` state and nos remote calls to the API have occurred.
287-
- Every time the call succeeds, we reset the state to as it was in the beginning.
288-
- If the number of failures cross a certain threshold, we move to the `open` state, which acts just like an open circuit and prevents remote service calls from being made, thus saving resources. (Here, we return the response called ```stale response from API```)
289-
- Once we exceed the retry timeout period, we move to the `half-open` state and make another call to the remote service again to check if the service is working so that we can serve fresh content. A failure sets it back to `open` state and another attempt is made after retry timeout period, while a success sets it to `closed` state so that everything starts working normally again.
107+
This example demonstrates how the Circuit Breaker pattern can help maintain application stability and resilience by managing remote service failures.
290108

291109
## Class diagram
292110

293-
![alt text](./etc/circuit-breaker.urm.png "Circuit Breaker class diagram")
111+
![Circuit Breaker](./etc/circuit-breaker.urm.png "Circuit Breaker class diagram")
294112

295113
## Applicability
296114

@@ -322,15 +140,15 @@ Trade-Offs:
322140

323141
## Related Patterns
324142

143+
- Bulkhead: Can be used to isolate different parts of the system to prevent failures from spreading across the system
325144
- [Retry Pattern](https://github.com/iluwatar/java-design-patterns/tree/master/retry): Can be used in conjunction with the Circuit Breaker pattern to retry failed operations before opening the circuit
326-
- [Bulkhead Pattern](https://learn.microsoft.com/en-us/azure/architecture/patterns/bulkhead): Can be used to isolate different parts of the system to prevent failures from spreading across the system
327145

328146
## Credits
329147

330-
* [Understanding Circuit Breaker Pattern](https://itnext.io/understand-circuitbreaker-design-pattern-with-simple-practical-example-92a752615b42)
331-
* [Martin Fowler on Circuit Breaker](https://martinfowler.com/bliki/CircuitBreaker.html)
332-
* [Fault tolerance in a high volume, distributed system](https://medium.com/netflix-techblog/fault-tolerance-in-a-high-volume-distributed-system-91ab4faae74a)
333-
* [Circuit Breaker pattern](https://docs.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker)
334-
* [Release It! Design and Deploy Production-Ready Software](https://amzn.to/4aqTNEP)
335-
* [Microservices Patterns: With examples in Java](https://amzn.to/3xaZwk0)
336148
* [Building Microservices: Designing Fine-Grained Systems](https://amzn.to/43Dx86g)
149+
* [Microservices Patterns: With examples in Java](https://amzn.to/3xaZwk0)
150+
* [Release It! Design and Deploy Production-Ready Software](https://amzn.to/4aqTNEP)
151+
* [Understand CircuitBreaker Design Pattern with Simple Practical Example (ITNEXT)](https://itnext.io/understand-circuitbreaker-design-pattern-with-simple-practical-example-92a752615b42)
152+
* [Circuit Breaker (Martin Fowler)](https://martinfowler.com/bliki/CircuitBreaker.html)
153+
* [Fault tolerance in a high volume, distributed system (Netflix)](https://medium.com/netflix-techblog/fault-tolerance-in-a-high-volume-distributed-system-91ab4faae74a)
154+
* [Circuit Breaker pattern (Microsoft)](https://docs.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker)
-23 KB
Binary file not shown.

circuit-breaker/etc/StateDiagram.png

-18.9 KB
Binary file not shown.

0 commit comments

Comments
 (0)