I am 110 pages into Michael Nygard’s “Release It!”. I ignored this book earlier because I pictured some namby, pamby book on release best practices, but this book is absolutely fantastic and fascinating to read!
The book focuses on anti-patterns then best practices for stability then capacity in production systems (typically focusing on web commerce). After that, it delves into system level design best practices. I am currently up to stability anti-patterns.
Once again, I am tempted to list all of them, but that seems unfair to the author so I will cover only a few. I will follow-up on this post as I get further into this book (352 pages). Here are some:
- Integration Points. This is the number one point of failure as engineers code the interactions expecting the systems to always be up, and when they are not, the resulting cascading failures bring the entire system down. Making an HTTP request without a specified timeout? Not a good idea when all your threads block waiting for a request that will never return and new requests are starved. A properly structured system will expect these types of failures and handle them appropriately to protect the other layers.
- Third Party Libraries. These little block boxes of functionality are not as robust as you need them to buy, and offer not as configurable as they need to be to handle failure scenarios. This can be a huge weakness in your production system.
- Scaling Effects. Your system works well, but as you begin to scale certainly aspects of it out horizontally, the other elements are not matched appropriately and then crash. This is particularly true with shared resources.
- Unbalanced Capabilities. Your overall system has a small percentage of its resources appropriately dedicated to handling home installation for your eCommerce web-site (e.g., like Best Buy). But your marketing department commits an Attack of Self Denial by offering a free installation promotion without coordinating with production. Suddenly, the home installation component cannot handle all the requests, and the cascading failures bringing down your site which can normally handle it due to the expected models (even including Black Monday).
All in all, this book is AWESOME! I hope to play with some of the patterns and anti-patterns in Java and Ruby and report back on that also. I will follow up by discussing Stability Best Practices in my next blog.