Disaster-Resource.com

Operational Resiliency
The Foundation for Business Continuity and Emergency Management

By Bob Burns


Overview
Operational resiliency planning is the foundation for effective business, operational and emergency management programs required for industry, schools, and federal and local governments to ensure continuous operations regardless of the event.

Introduction
Operational resiliency planning is the mainstay of private industry, school districts and federal and local government continuity, and emergency management programs. Success is unlikely without a continuity of operations plan and a methodology to respond to each incident. This has prompted many federal regulations such as FEMA’s National Incident Management System-NIMS, FPC-65, HIPAA, Gramm-Leach-Bliley Act, Sarbanes-Oxley Act and others to require that plans and procedures be put in place to ensure uninterrupted operations and respond to any event regardless of size and complexity. These plans are referred to as business continuity plans – BCP by the private sector and continuity of operations plans – COOP by government agencies. Their content is very similar and often interchangeable as we have done in this document while defining the importance of operational resiliency within their context.

The criteria for operational resiliency provide the foundation for building and maintaining an organization’s business continuity program. Traditionally business continuity planning included the recovery of business functions and their supporting IT infrastructure. The demand for 7 * 24 high availability services requires a change in the way we define business continuity. The new paradigm uses an operational resiliency model to define the business continuity framework supporting day-to-day operations. The traditional disaster recovery plans become a subset of the new business continuity process. This new process also leverages the internet by offering collaborative emergency management portals to share and communicate information.

Millions of dollars are spent each year adding technology to ensure that organizations can maintain continuous operations. The belief is that you can “fix it with technology”. At least that is what technology providers would like you to believe. While technology is important, and often required, there are many other key factors that are needed before organizations are safe from unanticipated operational disruptions.

Most operational disruptions can be prevented, and the rest mitigated to reduce their impact. The key is how an organization prepares itself and their partners to prevent unknown interruptions. The solution appears complex because too many try to solve the problem at the wrong end of the process. Much like Total Quality Management and the Six Sigma Quality Process, operational resiliency has to be designed into every critical step and threaded throughout the organization at each critical hand-off point. The final step is how that information is communicated and shared to mitigate the overall situation.

Management can play an important role at the outset of every new project by requiring that an operational resiliency strategy is defined and that funding is sufficient to ensure that the procedures, infrastructure, and planning are in place to meet the recovery time objectives for each critical process. Collaboration between organizations is also essential to define the interdependencies between people, processes, and technology. These relationships can be mapped by “following the data” through each process needed to deliver internal and external customer products and services.

Not all organizations have the resources or knowledge to assess risks, mitigate their impact, and develop business continuity plans for operational resiliency and disaster recovery. The scope of managing this initiative is substantial, especially as there are many interdependent processes that can affect the success of each organization.

The Operational Resiliency Process
The Operational Resiliency Process consists of the six phases: Resiliency Management; Operational Resiliency Assessments; Resiliency Strategies; Business Continuity Planning; Validation; and Collaboration. Each phase is designed to leverage the knowledge realized from the previous phase to support the overall Operational Resiliency Process.

Resiliency Management – This is the most important aspect of the Operational Resiliency Process. It provides the glue which allows organizations to meet their operational resiliency objectives while maintaining continuous uninterrupted operations. If not managed, each organization will develop plans that do not support the universal needs of the organization. These plans are often in different formats, and do not interface with one another.

The success of operational resiliency starts at the Board of Directors, senior management, and public officials. Only they can ensure that the proper funding, direction, standards, and resources are available to implement a realistic continuity of operations program. Once the objectives are established, specialized expertise is needed to lay the framework for the entire company. This expertise is rare within most organizations, and often requires unbiased third party expertise. Caution should be exercised when using third parties who are selling hardware or recovery services. Recommendations will include their “must have” products or services and ultimately will cost more than if business continuity and disaster recovery was part of the overall operational resiliency plan supporting day to day operations.

The management of resiliency information is a formidable task even when all other aspects of the program are working effectively. This requires expert continuity management software to assist with mitigating, and planning an overall integrated solution that can be securely accessed by an unlimited number of users, and protected against unforeseen interruptions. The software should be the repository for or have direct access to all critical information needed for rapid response, notification and recovery of all key processes for each organization. The protection of the continuity management software and its data should be given the highest priority and operated from an off-site facility with replication to guarantee 100% availability.

Operational Resiliency Assessment - One approach to understanding an organization’s operational resiliency is to perform a threats and vulnerabilities assessment followed by an operational impact analysis. The analysis should determine if the business recovery time requirements for each critical application can be achieved by the technology infrastructure and operational procedures supporting each business function. The analysis should compare industry best practices from ITIL, CORIT, NSA, NIMS and others against each operational, business and IT function. The result should provide a GAP analysis for critical risk areas including on and off-site facilities, business processes, security, networking, communications, applications, data center operations, storage management, disaster preparedness, process and documentation management, and incident response. The analysis should provide strategic and tactical improvement recommendations for all risk areas that jeopardize operational resiliency.

Resiliency Strategies – Each business function should have resiliency strategies that define each critical process, its criteria for operational continuity, and the dependent staff and IT resources needed to maintain their required continuity level. This also includes the resiliency preparedness of internal and external organizations which provide resources and data needed to deliver customer services.

Resiliency Strategies should typically define why, who, what, when, and where so previously identified risks and events can be mitigated. This translates to: why there is an interruption; who is responsible for responding; what is it they have to do; when do they have to respond; and where do they have to respond to. This may appear overly simplified, and it is, but it is a great place to start because most organizations have not defined their critical processes from an operational resiliency perspective.

Business Continuity Planning – The Operational Resiliency Assessment and the Resiliency Strategies provide the basis for the Business Continuity Plan. Each business organization should define their critical resources, including critical applications and their recovery time objectives, as well as the need for data protection supporting each critical process. These criteria establish the priorities for IT to ensure that the supporting infrastructures are designed and maintained to comply with each business requirement.

The Business Continuity Planning performed by IT, commonly referred to as Disaster Recovery Planning, is often performed separately and not integrated with business resiliency planning initiatives. The separation of the business and IT initiatives for business continuity and disaster recovery planning is frequently the source of unsuccessful recovery initiatives. The lack of joint planning and collaboration often result in separate agendas that don’t coincide until annual testing uncovers the need for change. Operational resiliency fixes are then glued in at the back-end of the process with limited success.

An effective business or operational continuity plan includes all aspects of operational resiliency for each critical process for every operational condition including catastrophic interruptions. The planning should include business and IT operations and cover a broad scope of topics including; emergency management, disaster assessment, recovery, testing, and business resumption. Each functional area needs to define their resources, recovery teams and tasks supporting each phase of the recovery process. Operational process documentation needs to be available and priorities established to minimize the impact of any unplanned event.

Validation – Once the business continuity plan is formalized it is then used for training and testing, and hopefully not needed to support an actual event. During the training process, all members of the organization should be trained on its use and have an opportunity to review content from an operational resiliency perspective. After the feedback is received, testing scenarios need to be developed to ensure their recovery time objectives can be achieved.

During the Validation Phase every recovery process should be tested, and exceptions noted. There are many variations during the Validation Phase, but it is important that each phase is successfully completed, each business function is tested, and the data is recovered without loss of critical information. If any portion of the testing fails, it needs to be rerun until success is achieved.

Emergency Management Collaboration – The operational resilience process, it is just that, a framework of plans and procedures. The final phase, and perhaps the most important step, is to implement a proactive capability to communicate and share information through an emergency management portal. The portal provides secure internet access to current information that is stored off-site in a disaster hardened facility. All current continuity and emergency plans are indexed to each organization’s needs for rapid access. Additionally, the portal provides for discussion boards, shared communications, emergency alerts, calendar of events, contact rosters, check lists, on-line training, test exercises, and other functions to improve information collaboration.

The Challenge for Success
Operational resiliency assurance is complicated. It requires a systematic approach and discipline similar to Total Quality Management where the product is the proactive delivery of uninterruptible operations. Too many organizations try and solve the problem with reactive fixes as opposed to integrating operational resiliency attributes into every aspect of each critical process. The interdependencies between key functions need to be well understood, and the supporting IT infrastructure has to be capable of delivering to the stated business objectives. Success will come slowly, but it requires teamwork, expertise, process standardization, communications, and management’s involvement.


About the Author
Bob Burns, Chairman, CEO and Co-founder of EverGreen Data Continuity, is a strong advocate of the Operational Resiliency Process as the framework for Business Continuity Management and its proactive deployment through secure Emergency Management Portals. Bob is the founder of NetVault storage management software, and was Vice President of CommVault Systems after serving more than 30 years at AT&T Bell Labs managing IT Operations and serving as an AT&T National Baldrige Quality Award Auditor. EverGreen has been in business for more than 10 years providing expert continuity and emergency management plans. Their Mitigator business continuity management software is rated by Gartner as one to the top industry solutions, and their EverSafe Emergency Management Portal was the basis for EverGreen winning FEMA’s Outstand Partner Award.