Top Five Disaster Recovery Testing Issues

By Kirk Drake

If you are responsible for performing Disaster Recovery Testing for your organization or Credit Union this post is for you.   Disaster Recovery Testing is the process of simulating a recovery of your critical systems, business processes and data to validate that you can meet your Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).  In more basic terms – it is the process of making sure if something bad happened to your company that you could stay operational.  It is important to note that there is a big difference in conducting Disaster Recovery Testing vs. Disaster Recovery Exercises.  All that being said – there are five main issues that we regularly see when assisting our clients with Disaster Recovery Tests.

Local Network Issues

This is a big issue for almost all of our clients.  The problem is that everyone wants to make sure that all of their locations could be functional in an actual event.  The issue is that in order to accomplish this – you must create a separate network to isolate and securely validate the testing.  The process of doing this is generally overlooked and is much more challenging than you would expect. Plus, all of this extra work effectively negates how it would work in a real disaster.  Even so, if you have the right network skills on staff and the right switches and routers to separate the traffic – this is a great way to involve staff at other locations in the testing process while making sure you don’t accidentally use the test systems for real work and vice versa.  One example here is when you need to change the IP address of a server in your Test Environment.  Often times this triggers the need to change the IP Address of countless other servers, routes, and configurations.  This can take precious time and troubleshooting and seems to always lead to a duplicate IP address problem!

Server Recoveries

If you are still using an older product for data vaulting or tapes to do your recoveries this can be really painful.  What you want to avoid is products that require the correct match for your production hardware and instead use software that incorporates Physical to Virtual conversion.  Most modern evault products do this.  Ideally you want something that can do a Bare Metal Restore so you can simply point and click the recovery.  Even so, we find it always seems like there is a server or two that needs extra care to recover or folks don’t understand that most recoveries and tests are linear.  You can’t just fire off 57 server recoveries at the same time as modern Disk I/O configurations just can’t move or recover that much data at the same time.  Instead you want to be deliberate and plan out the recovery based on your Business Impact Analysis.  We find we can usually recover most servers in a few hours if it is planned correctly.  Don’t forget to also plan for network changes such as IP changes or routing configurations for your test.

continue reading »