In my last blog I discussed some of the stability anti-patterns of Michael Nyguard’s excellent but tragically misnamed book “Release It!” I finished reading the patterns and best practices portion, discussed below:
- Timeouts. Traditionally a thread blocks until the resource is ready, but this can be destabilizing if the resource is never ready. Thus, a thread needs to be able to time out. A mechanism for delayed retries should also be set up.
- Circuit Breakers. The author gives an example of early real-world circuit breakers where with too many items plugged in (resulting in too much current and heat), the circuit breaker breaks the circuit instead of the house burning down. He introduces a logical circuit breaker where too many failures breaks the circuit, rejecting the next series of requests until the system is ready to try again (see the book for the actual algorithm).
- Bulkheads. A ship’s hull is divided into different watertight bulkheads so that if the hull is compromised, the failure is limited to that bulkhead as opposed to taking the entire ship down. By partitioning your system, you can confine errors to one area as opposed to taking the entire system down. These partitions can be hardware redundancy, binding certain processes to certain CPUs, segmenting different areas of business functionality to different server farms, or partitioning threads into different thread groups for different functionality. I remember how a previous company’s server would run out of threads, and there were no other threads dedicated to admin functionality to allow an admin to interact with it.
- Handshaking. By asking a component if it can take on more before asking it to perform that work, the system has a way to introduce throttling. If the component is too busy, it can tell the clients to back off until it is able to handle more requests.
I am now moving on to reading the Capacity anti-patterns and patterns and will blog on that later. Once again, I STRONGLY recommend purchasing and reading this book!