I read through the transcript of the 2014 Reith Lecture by Atul Gawande Lecture 2 titled “the century of the system”. Atul Gawande is a surgeon , public health researcher and writer.
In the lecture , he references public health and how to manage large scale public services. But many of his points are relevant to managing database servers and IT platforms
First point he makes “I think is that as we embark on the 21st century we have found that the 20th century has given us a volume and knowledge and skill that is beyond what any individual can simply hold in their head, can know how to deliver on, and simply do it on their own. The volume of knowledge and skill has exceeded our individual capabilities”
This statement raises all sorts of ideas about expertise and notion of an Expert. Is it possible to any longer be an authoritative expert on one subject? How do you collaborate with other professionals to analyse and apply solutions to complex IT systems? How do multiple experts work together to./ provide solutions and troubleshoot complicated problems?
Second point “And so we worked with a team from … from the airline industry to design what emerged as just a checklist – a checklist though that was made specifically to catch the kinds of problems that even experts will make mistakes at doing. Most often basically failures of communications.”
Anyone who’s worked on large IT projects, is familiar with all the difficulties of communication. I can’t count how many times seemingly straightforward implementations have failed due to inadequate communication. This could be something straightforward as lack of QA, weak implementation of architectural standards or ineffective collaboration tools.
Repeatable processes should be documented and scripted as much as possible. If DBAs are delegating the task to Operational teams than this becomes doubly important.
This breakdown in communication also happens with incident management. Who should hold the checklist during an incident? Quite often a resolution from an incident can be hastened by basic questions such as : the server name? contact details of SME ? error messages? Method communication between the team members.SQL Performance tuning - Asking the right question
Quite often , simply recognising experts can make mistakes and things can slip through the cracks, can be the first step to fixing issues quicker.
Most “experts” don’t like to deal with checklists, but once we begin recognising checklists (or similar methods) are an effective way to socialise processes with a wider group of team members. In large international IT teams – experts move fluidly through different projects and production environments