06 February,2014 by Jack Vamvas
"Everything fails, all the time" is a mantra attributed to Werner Vogels , CTO of Amazon.com.
Considering the ideas supporting this mantra leads to some interesting practical issues
1) An infrastructure will fail
As infrastructures grows , scale becomes non-trivial. Added to environments supporting multiple technologies , with constant change leads to increased instances of failure or errors. Accept failure will occur and manage systems based on this assumption.
Compare with the old style thinking which underpinned – “build-once” systems, where engineers through a system over the wall – and only reviewed in an outage
2) Testing for these failures in the Production environment , with engineers available to fix the problems
This point will raise debate. My colleagues argue that purposefully disabling infrastructure services should never occur in a Production environment. It should always be applied in Non-Production environment. How many organisations have an exact setup in Non-production as Production , with the same resiliency , server configurations, database loads etc?
3) If the failure occurs again , the infrastructure must recovery automatically without no disruption to the user experience
Once identifying repeated failures , apply fixes or steps which allow the services to recover automatically with no or minimal impact on the users. It’s a challenge – but worth it . It’s all about continuously testing and improving. Ultimately gaining back time to focus on core skills
Powershell and Disaster Recovery preparation
SQL Performance tuning - Asking the right question
SQL Server DBA Top 10 automation tasks
This is only a preview. Your comment has not yet been posted.
As a final step before posting your comment, enter the letters and numbers you see in the image below. This prevents automated programs from posting comments.
Having trouble reading this image? View an alternate.
Posted by: |