Disaster Recovery Metrics: What They Are and How to Use Them

OCTOBER 14TH, 2013

Metrics are a big deal in the marketing arena. In the right system, they prove handy by helping you measure the performance of things like clicks, unique visits, and engagement. Similar measurements are used in a number of applications, including disaster recovery, where they probably aren't receiving the attention they deserve. Whereas marketing metrics are typically used to measure performance after the fact, disaster recovery metrics are typically used to measure your preparedness and manage expectations during recovery. When disaster strikes and you put your plan into effect, these measurements will act as the blueprint for IT and management to follow. In essence, they guide you in determining what you can afford to lose, how long you can afford to be down, and how fast you need to move in order to get back up and running.

Common DR Metrics

Disaster recovery metrics range from simple and self-explanatory to complex and multidimensional, pretty much meaning they may vary in definition depending on how they are applied. However, there are two standard metrics that can benefit any business continuity strategy.

1. Recovery Time Objective (RTO): Typically measured in hours, RTO speaks to the maximum amount of time a given system will be down. So if you assign a 24-hour RTO to your content management system, that means if your CMS crashes at noon on Wednesday, your IT guys need to have it back online by noon Thursday. This metric is quite flexible as it may apply to storage operations, operating systems, and individual applications.

2. Recovery Point Objective (RPO): Typically measured in hours, RPO defines the maximum amount of data a given system can lose. Suppose you assign a one hour RPO to your customer relationship management system. That means you need to take backups at least every hour, and that you're willing to part with only the data created between the last backup and the failure event, which, at the most, would be one hour's worth of data. This metric can vary greatly depending on the application and may play a huge role in determining the order in which you will restore your systems.

Implementing Metrics in Your Strategy

Making the most of disaster recovery metrics starts with giving them a value, which calls for you to first define acceptable service levels within your infrastructure. In the case of RTO, 24 hours may be ample time for one company to restore its email system following a disruption. For the organization that depends on the metrics and other data connected to its email system, an RPO of 24 hours could be unacceptable. Whether it's RTO or RPO, the values assigned to your metrics must be defined on an individual basis - by organization and by application. Metrics can be set as low as you desire, but you better make sure you're realistic in your expectations and diligent in your business continuity efforts. So if you assign your OS or VM an RTO range of 0 to 4 hours, you're essentially admitting that you have the IT expertise, recovery tools, and infrastructure needed to get back up and running quickly - even if that infrastructure is a cloud you're tapping into from a remote location.

DR Discipline 

This article detailing 10 of the most catastrophic IT disasters is full of examples of how crazy things can happen at any time. Kind of makes you realize how you would be lost without your own set of metrics and why it's a good idea to calibrate them once or twice a year. Your demands may change as your infrastructure evolves, so by keeping them fine-tuned, you can make sure your disaster recovery efforts continually meet the objectives of your business continuity strategy. 

You May Also Like