19 August,2012 by Jack Vamvas
The week started with one of the main Backup servers failing. The Operations Team manage the monitoring of SQL Server database server Backups. Normally , if a SQL backup fails , Operations will follow a documented procedure to rerun the backup . If it fails again , they’ll pass onto an Engineer for diagnosis. Unfortunately, they didn’t communicate the Backup Server failure – which left the DBAs wondering why 25 servers failed. Eventually , we discovered the source of the failures. From a SQL Server DBA perspective – a Backup failure , particularly an OLTP Production Server – is critical. It is difficult to fulfil Service Level Agreements (SLAs) if files aren’t backed up.
Incident management systems come in different flavours , depending on the environment size and response time requires. A properly constructed Incident Management system would allow us to remediate some of the issues – as we could have redirected backups to other backup servers over the weekend , before the business week commenced. Incident management systems can become political hot beds. Individuals become concerned about reporting to management. My perspective is to maintain the Production system uptime. In other words, we’re confronted with a problem , get the right experts together, fix the problem, then do some root cause analysis. Spoken like an Engineer. But management tend to be sensitive – as millions of dollars are spent – and an expectation (not always accurate ) develops that suggest all problems will disappear. In an environment with 600 applications – some custom built , others purchased from third parties – it’s a big challenge to maintain uptime. Consider : software bugs, user mistakes, power outages, malicious attacks, performance degradation. Focus is required to deal with problems as they arise.
In a large IT environment – maintaining database standards can slip. For example , if a standard is set to maintain data and log files on separate drives , then it’s easy for a DBA to restore and forget to separate the files. It’s a typical example , of how over time compliance to systems standards can slip. One method I use to ensure these slippages are kept to a minimum – is to run daily reports. The daily reports are part of a DIY Powershell DBA scripts. One report produces a list of every database in the system , focusing on configuarions , such as AUTOSHRINK and location of data\ log files. I create rules in the script , if the rule is not met – the discrepancy is highlighted, investigated and fixed. I’ll write a post this week with the report – using Powershell
SQL Server – Format number with commas using money sql data type
SQL Sever - Dedicated Administrator Connection
SQL Server – AUTO CREATE STATISTICS
SQL Server – Restrict SQL Server Logons by IP with EVENTDATA and SQL Logon Trigger
SQL Server – Get SQL Server Version