by. Robin Remines
It’s been said, time is money and that certainly holds true when you’re talking down systems. Today’s consumers have a high expectation of 24x7x365 access to their accounts while having little to no understanding of the complexity and costs involved in meeting that goal. But you’re committed to your members so you’ve performed your Business Impact Analysis and developed your RTO’s and built an infrastructure that supports quick recovery of critical systems. So why then are you still unable to achieve your uptime goals when systems systems/services are disrupted? To answer the question, we’ll have to look at the definition.
Recovery Time Objective (RTO) is defined as how much time can pass from when a disaster occurs at your organization and when you need to be back up and running again. Sounds pretty straight forward right? Not so fast, two very important time periods are often overlooked when calculating the RTO! Notice it says “from when a disaster occurs”. Let’s use an example scenario to see calculations can come up short.
Assumptions & Discovery
Assuming an RTO of <4 hours for critical systems (fairly normal for most organizations) – fire strikes your corporate headquarters at 10:00 PM taking out all communications and systems. It is reported at 10:45 by a passing motorist. Your leadership team is notified and gathers at a nearby branch at midnight to decide next steps. A disaster is declared at 12:30AM. Your DR provider is notified and begins recovery of critical systems – contractually, they have 4 hours from notification. So far your systems have been down 2.5 hours. Add to that the 4 hours your DR provider has and guess what? Minimum recovery time is creeping up to 6.5 hours – well over your desired 4 hour threshold! When calculating RTO, it is imperative to consider the time gaps associated with discovery, reporting and responding to an event.
Balancing and System Checks
Another area often overlooked when determining a true RTO (or at least one that the IT and business side will agree on) is the time gap between systems recovery time and when it actually becomes fully operational again. IT may be able to restore hardware, software and communications is a certain time but one must not forget that after an unexpected disruption, systems must be tested to ensure the integrity of the data. This process often includes manual updates/data entry to recover lost transactions. Depending on the length of initial downtime, this effort could take several more hours. So it’s easy to see how RTO’s are often miscalculated and misleading your organization to believe recoveries can occur in the expected timeframe.