Let’s talk about priorities. When I take on the role of the database administrator, I want to make sure both the company’s resources and my time are used efficiently. The three word title to this post addresses those priorities.
The first thing any new database administrator needs to address is to protect the data environment. There have been countless times when I’ve talked with a client, who insists that they’re taking backups regularly (though many times they’re not). I ask them when they’ve last tested those backups, and I get either the blank stare (in person), or deathly silence (on the phone.)
The other problem that I’ve seen is when the backups are done, but are handled by a third-party solution that is primarily focused on minimizing the time and space it takes to make those backups. (The solution does the data compression, so it’s “more efficient”.) So, when something goes wrong, and a database needs to be restored, it can take six to eight hours to get the database restored. What’s worse is when that solution doesn’t properly handle transaction log backups, so when the database is restored, it ends up causing many hours of data loss. That data is critical to the operation of the business, but that “more efficient” solution had a higher priority.
Regardless of the solution (and my preference for SQL Server’s native tools are well known), it’s important that the backup solution deployed be regularly tested to ensure both full recovery up to the point when the systems went down (RPO – Recovery Point Objective), and that that solution be completed within a reasonable time (RTO – Recovery Time Objective). The RPO and RTO parameters need to be discussed up front with the company executives, and reasonable objectives need to be set. (You’re not going to be able to restore a 4TB database in 10 minutes, ever.) What’s more, these processes need to be tested regularly, and adjusted as the database sizes grow, because they will.
It was about 3 years ago when I first had a customer get hit with ransomware, and the customer lost more than a dozen servers. Sadly, the hosting company that managed the servers for this customer didn’t have accurate records of the server configurations for the customer, so it was a few weeks before everything was fully restored and available. (Note that the critical servers were brought back online within a few days, but even that had significant impact on the business.)
Since then I’ve heard of dozens of companies who’ve been hit with similar attacks, and sometimes the data is recoverable, and sometimes it’s not.
My first responsibility is to protect the data. That includes not only the database backups, but also having a detailed inventory of the servers and their configuration, so that my customer’s environment can be rebuilt and restored in the minimum amount of time possible.
This should be self-explanitory, but too many times I’ve walked into an IT shop and it seems like Rube Goldberg set up their systems. Now, I know that it didn’t start out that way, it was usually the result of a business need that required something be set up quickly, and so shortcuts are made. I get it.
The important thing is, the systems that perform regular maintenance against a company’s systems need to be set up in a uniform manner, so it’s easy for someone new to come in and understand what needs to be done, and the work done against one server is largely identical to the work needed against all servers.
Simplification can come in many flavors, from a consistent naming convention, to using the same set of scripts against all the servers, to naming drive letters consistently. It can come in the documentation of the server configuration, so it’s easy to understand what needs to be done if problems occur. It can come in the form of consistent IP addresses for which type of servers get which set of addresses.
Once the systems are protected, and you’ve simplified the environment as much as possible, you can focus on automation.
The general guideline is that if you have to do something more than once, script it. My guideline is that if I have to do it once I’ll script it, because I’ll probably have to do it again.
That means scripting the maintenance tasks necessary to meet the RTO and RPO mentioned earlier. It means scripting the configuration of the servers so they’re set up uniformly, as mentioned earlier. It pretty much means scripting everything.
Back when SQL Server 2005 was in beta release, I started digging through the new Server Management Objects API for SQL Server. I built my automation using VB.NET at the time, until PowerShell was introduced in 2006. I still script most everything in PowerShell, but Transact-SQL still does much that isn’t exposed in the SMO library.
I’ve been very impressed with the work that the team has built on the dbatools project, led by Chrissy LeMaire (b/t) and Rob Sewell (b/t). They and the team that has swelled up around them have put together cmdlets and modules that make it easy to automate most everything you need to do in a SQL Server environment.
What’s important is that it’s easier than ever before to automate your processes so that you don’t have to think about the next step, just execute the script that does it.
So, to sum things up, three words will help you be successful as a database administer. Protect, Simplify, and Automate.