Your Business Has a Disaster Recovery Plan—But Have You Actually Tested It?

If the qualifications for playing in the Big Game were based entirely on regular season records, then the championship should’ve been between Tennessee and Green Bay. And yet, neither team made it to the game. In fact, both of the league’s best teams were gone by the end of the second round.

The lesson is clear: Things that look good on paper don’t always play out well in practice. 

This lesson extends far beyond sports. In fact, it’s a crucial one for businesses that think they’re prepared for disaster but then, when problems arise, discover to their horror that their carefully drawn-up recovery plans are useless.

I’ve seen disaster recovery scenarios fail miserably because someone overlooked something important. It’s often something incredibly simple that would’ve been obvious if the company had done some testing instead of placing all their hopes on a disaster recovery plan that took a lot of time and money to produce, but then got shoved in a drawer and ignored until it was needed.

The classic example is the homeowner who anticipates a possible power outage by purchasing a generator and storing it in his garage, where it will be available when needed. Confident that he has devised a plan for keeping his refrigerator running and his food fresh even if he has no electricity for several days, he shifts his focus back to more important matters.

That is, until the power actually goes out and he quickly discovers that his electrically powered garage door opener doesn’t work, making it impossible for him to get to his generator.

A bit contrived? Maybe. But had the homeowner actually shut off his own power in order to test out his plan, he would’ve spotted the weakness immediately.

The best way to see if your disaster recovery plan works is to test it

By far the most effective way to determine whether your company’s disaster recovery plan is actually worth the paper it’s printed on is to test it in a production-level environment. A Game Day is when you test invoking a specific failure mode into your system and watch to see how your operators and engineers respond to it, including how they implement any recovery/disaster plans. 

If that sounds like a lot of unnecessary chaos and stress and maybe even a little sadistic, consider the alternative. Frankly, I would much rather see what happens when I destroy a data center during a day when I’m fully staffed and traffic is relatively light than have it happen unexpectedly in the middle of the night, when my on-call engineer is just waking up and my customers are in an uproar over a sudden loss of services.

Make no mistake: statistics do matter, and having a solid disaster recovery plan is vital. But there will always be a potential gap of chaos and uncertainty between theory and practice. Carefully drawn-up plans can still fail, and the best teams don’t always win.

In other words, in business as in sports, the only way to truly know the outcome is to play the game.

More articles from Lee Atchison:


Photo by Tim Gouw on Unsplash.