Requesting External Help for a SQL Server Issue can be tricky. Let’s look at a scenario
At 11.35 am on a business day – users start to report connectivity issues on a Primary Source application supporting a very large financial system . The DBA gets a call very soon afterwards by the application owner asking to investigate. Immediately it’s obvious there is a SQL Server database issue. The DBA has a quick discussion with the Incident management team – a decision is taken not to failover the server until there’s better understanding of the problem. The management team agrees to make the system unavailable for a 1 hr period – to allow some investigation.
After an hr – the DBA has a good idea of the problem and proposes a RESTORE of the last nights FULL BACKUP and roll forward the logs to just a few minutes before the problem started. The RESTORE starts and when it comes to rolling forward the logs and problem arises on the restore client. The problem amounts to not being able to skip over log backups occurring during the FULL BACKUP . It’s a long story – but the backup server was unavailable during the night – and the Operation team decided to run Production server backups in the morning. Essentially it’s a bug on the restore client software
At this point the DBA is stuck – they can either try and quickly develop some knowledge on querying the TSM server directly or contact External Help. Keep in mind , at this point the outage is about 3hrs – and pressure is mounting . I also forgot to mention , it’s the height of the holiday season and most of the DBA team is on holiday with just a skeletal staff.
The restore client vendor is contacted- with details of the bug. They have a patch – available as a download. In the meanwhile management need to get Production services going . External help is requested , the consultant steps in , grasps the problem – and fortunately had worked on similar problem recently. Systems are back up. This takes the pressure off the DBA who can progress with some root cause analysis
Lessons learnt
1) No DBA is an expert on very topic
2) External help can be expensive but compare to expense of the server outage
3) Weigh up the cost of an outage and subsequent cost of an inappropriate solution – with the potential recurrence of future outages.
4) A consultant may be much cheaper than 100 employees not being able to process orders. It’s important for management to have a perspective on business impact. Typical scenarios – may be database corruption, unfamiliar error messages, complex performance issues
5) The DBA ego. In many situation the DBA has in-depth knowledge and skills for solving a wide range of problems. They develop an aura of expertise. This can be a barrier to an awareness of their own limitations. Calling external help is an opportunity for a DBA to witness alternative approaches.
6) Documenting the problem , steps taken and generating traces. This makes the consultant life easier and can speed up the solution
Related Posts
What is a SQL Server Expert?
How to report a SQL Server performance problem
How to request SQL Server troubleshooting
SQL Server - Display restore history for a single database