Disaster-Resource.com

Taking the Pulse: A Business Continuity Program "Health Check"

By Howard Mannella
Senior Manager - BCP, Eagle Rock Alliance

Introduction

Disaster Recovery and Business Continuity Planning ("DR/BCP") are very much on the minds of business executives in our post-9/11 world. Board members, executives and regulators are asking themselves, "Could our company survive a disaster?"

Disaster events can and recently have come in many different sizes, levels of complexity and degree of drama, but all can be potentially devastating. In this age of "dirty" bombs and bio-terror, it is also important to maintain focus on the less glamorous but more probable threats to the viability of the enterprise. Examples include power outages (like the Blackout of 2003), fire, water and weather-related incidents. Research from Contingency Planning Research, a division of Eagle Rock Alliance, reveals some startling facts:

  • The average length of a business interruption is no more than several days, but the cost of downtime for over two-thirds of American businesses can be up to $250,000 per hour
  • Over half of businesses perceive their very survival to be at risk upon suffering an interruption of up to 48 hours
  • Less than 6% of businesses that suffer a total business interruption were found to be in business five years later
Many organizations have a DR/BCP Program in place. Subsequent to the events of September 11, 2001, many more are rushing to establish one. Having a program in place, however may not be enough to ensure your organization's ability to withstand a serious business interruption. How can you tell if your program is robust in all of its components? The questions below are designed to help you assess your organization's program and perform a quick "health check". Is your program healthy and able to thrive and serve the enterprise over time, or has it atrophied from a once-glorious start to a program in name only? This checklist may help you to assess the state of your program.

Methodology

Sound business practice suggests that organizations build a Disaster Recovery or Business Continuity Program by adapting the methodology depicted in Figure 1. Most programs begin with a Risk Analysis, or study of the threats, vulnerabilities and points of failure specific to the organization based on the business they are in, location, and other factors. This is followed by (or accompanied by) a Business Impact Analysis, or breakdown of the tangible and intangible losses over time resulting from a disaster event, along with determination of maximum acceptable objectives for recovery within the business areas. A Recovery Requirements Analysis, or examination of the technology and other assets used by the organization's business areas and business processes, and identification of critical resources needed for recovery, is the next step in the methodology. The last step in the initial analysis is a Technology Assessment, or review of the IT environment and infrastructure, in order to understand the data synchronization and applications dependency issues, and to identify current recovery capabilities.

Once management understands these critical pieces of information, they can develop a DR Strategy, or determination of the best way to ensure the recovery of critical business functions with minimal impact, from a short list of possible approaches. Once a Decision has been reached, the organization is ready to Implement the strategy and document the Plan(s) for responding to an unplanned business interruption. A properly choreographed and documented set of Tests, Exercises and Drills demonstrates that the implementation was successful and executable. The final component of a DR/BCP Program is its Continuous Improvement, to ensure that the program keeps up with changes in the organization, business situation and state of technology.

The questions in the "health check" are organized by component in the methodology. By answering each question with a "Yes", "Possibly" or "No", you will be able to assess which area of your DR/BCP program requires the most immediate attention.

This is an outline of a complete methodology. Each component can be as broad and detailed as required by an organization. Business executives should tailor this approach to suit their individual circumstances, state of their program, and tolerance for risk.

Risk Analysis

1. Have you had a walkthrough of your facility(ies) to assess the risk factors within the last twelve months?

2. Have you documented the risks and grouped them by risk type (human vs. natural vs. technical)?

3. Have you assessed the probability of each risk? Their relative impacts?

4. Do you have documented mitigations for each risk, where applicable?

5. Have you considered external risks (e.g., proximity to hazardous or high-profile neighbors)?

6. Have you identified Single Points of Failure ("SPOF's") that could compromise your infrastructure?

Business Impact Analysis

7. Have you identified every functional unit in your organization? Have you documented the business processes that each performs?

8. Have you documented the tangible losses that a disaster event would cause? Have they been quantified over time?

9. Have you documented the intangible losses? Their impact over time?

10. Have you established and documented, for each business area, Recovery Time Objectives ("RTO's")? Recovery Point Objectives ("RPO's")?

11. Have the business areas reviewed and accepted these findings?

Recovery Requirements Analysis

12. Have you identified the computer systems and applications that each area uses?

13. Have you identified, in writing, any specialized equipment that they may use (e.g., scanners, PDA's, time clocks, recording devices, mail handling)? Special forms or other non-computer assets?

14. Have these assets been mapped to business processes and expected recovery times and points?

15. Have these areas identified their staffing requirements for initial recovery? Ongoing ‘disaster-mode' business continuation? Have they identified, by name, their critical staff? Do these people understand their designation and role?

Technology Assessment

16. Have you reviewed the technology environment and infrastructure within the last twelve months? Have you documented server configurations, telecommunications capacities and other infrastructural components?

17. Does each major application or system have both a technical and business owner identified?

18. Has your IT staff documented how the various databases in the environment must be synchronized when recovering from a disaster event in order to be of use to the business? When the synchronization points occur? What the points of consistency are?

19. Has your IT staff documented the applications dependencies in the environment and the order in which they must be restored in order to be usable?

20. Is the current capability for recovery clearly understood?

DR Strategy

21. Do the decision-makers in your organization understand the factors and decision criteria for formulating a DR strategy, such as testability, complexity, risk, etc.?

22. Do they understand the tradeoffs, pro's and con's of each type of DR strategy (internal solution vs. vendor, dedicated vs. shared solution, reciprocal agreement or hybrid)?

23. Have you developed a ‘short list' of recovery options, with high-level cost indications for the implementation, recurring, declaration and ‘disaster-mode' usage costs of each option?

Implementation

24. Do you have assurance that you have obtained the best possible contract from your Disaster Recovery vendor? Do you know how to negotiate additional test time, early ‘outs', and other terms favorable to your organization's needs?

25. Do you know how to calculate coverage of an organization so that the implementation is adequate for business needs without being overbuilt or excessively costly?

26. Have all aspects of a recovery been accounted for? Computer equipment? Telecommunications? Voice/call center? Specialized equipment? Forms and other assets? Work area? Supplies? Conference facilities? Amenities such as catering, hotels, etc.?

27. Have you been putting the various vendors involved in the implementation in contact with each other, and have you laid out the ‘rules of engagement' with them to avoid finger-pointing and ‘passing the buck' on problems or issues?

28. Is the implementation of the DR/BCP solution based upon scenario planning for the types of interruptions your business can suffer?

Plan Development

29. Does the organization understand the lifecycle of a disaster event, from onset of the event through "return home", and the roles that each part of a plan play in managing the event?

30. Does the organization have a separate Emergency Response Plan? Technology Restoration Plan? Business Continuity Plan? Are the Business Continuity Plans customized for each business area?

31. Are the plans housed where the audience who needs them can get to them (either physical or on the Internet)?

32. Are the plans "ergonomic", that is, in usable form? Many DR/BCP plans are merely thick tomes of "binder-ware", with a wealth of information but very difficult to follow as actionable documents, especially in the chaos and confusion that accompanies a disaster event.

Tests, Exercises, Drills

33. Does your organization understand the difference between Tests, Drills and Exercises? Does your organization perform all three?

34. Are the business areas involved in the tests?

35. Do the tests incorporate all of the necessary ingredients and artifacts needed for success: Pre-test checklist? Directions to the alternate site? Test scripts?

36. Is the support you get from your Disaster Recovery vendor adequate?

37. Does the DR/BCP group publish an "after-action report" of test results?

38. Do the business areas sign off and accept/assure the test results? Is there a formal process for follow-up of issues and actions arising from the tests?

Continuous Improvement

39. Is there a process in place for periodic update of plan documentation to account for changes in the business? Changes in technology? Changes in staffing/organization?

40. Is there a process for plan documentation inventory control (serially numbered plans, produced in copy-resistant format, collected when new editions are published or employees leave the organization)?

41. Does the Audit group within the organization scrutinize the DR/BCP Program periodically?

42. Is there a process in place to ensure that the DR/BCP Program periodically passes scrutiny from the applicable regulatory agencies?

Representative Assessments

The responses to these questions can provide an assessment of coverage for an organization's DR/BCP Program. Some areas will show a high degree of coverage, and some areas will demonstrate the need for additional attention. This coverage can be displayed graphically to depict the "footprint" of an organization's program. It is important to note that ‘more' is not necessarily ‘better'; effective DR/BCP planning means paying the proper amount of attention to each area as the specific business situation dictates. Two representative "footprints" are depicted below:

The first ‘footprint' (Figure 3- Typical early-stage business 'footprint') depicts a hypothetical example of a typical generalized organization in the early stages of DR/BCP maturity. They do not have a fully-developed Risk Analysis or Business Impact Analysis, although they do have some sense of their requirements for recovery and understanding of their technology environment. The low assessment for DR Strategy may indicate that DR/BCP responsibility is delegated to the IT area with little business-area participation. They do have a program implemented and documented, although it may not be too robust or institutionalized throughout the business area. Basic testing (Technology only?) and a shallow Continuous Improvement program round out this ‘footprint'.

The second ‘footprint' (Figure 4 - Mature E-business 'footprint') depicts a hypothetical example of an ‘E-business' in a mature state of DR/BCP. They have paid more attention to Risk Analysis than the earlier example. They do not indicate a deep sense of Business Impact Analysis, but for an E-business this may not be necessary. If the business imperative is ‘engines have to be up, all of the time', then an extensive BIA may not be warranted; the Recovery Requirements may be fairly understood to be the technology infrastructure. As is appropriate for a "lights-out" enterprise where technology is the core of the business, the Technology Assessment, DR Strategy and Implementation are fairly robust. Plan Development appears light, but this may be appropriate if Technology Disaster Recovery Procedures alone meet most of the need for documentation. The light assessment for Testing may indicate testing of the technology only. This example shows a robust Continuous Improvement program in place.

Conclusion

This checklist should enable you to quickly take a ‘health check' of your organization's Disaster Recovery/Business Continuity Program. There is no "passing grade" for this test. Each "No" answer identifies a potential weakness, or opportunity to strengthen the enterprise DR/BCP program. The goal of those in charge of DR/BCP is to have a plan in place to address the weaknesses in the program, or to ensure that the weaknesses and risks are understood and acceptable to senior management. In any case, the results of the "health check" should be disclosed to senior management so that they can fully understand the degree of resiliency currently built into the enterprise and take action where appropriate.

Like medical health checks, the time to take them is before problems set in. So, perform this "health check" and "take the pulse" of your Disaster Recovery and Business Continuity program. Your business's viability may depend on it.

About the author

Mr. Mannella is a Senior Manager for Eagle Rock Alliance, the largest independent Disaster Recovery and Business Continuity management consulting firm in the United States. He can be contacted at info@eaglerockalliance.com or by calling 1-800-277-5511.