The pace of technological change and its complexity is challenging traditional business continuity paradigms. What was once considered a ‘Best Practice’ in Business Continuity (BC) no longer serves the new digital-world, and organizations can’t rely on these outdated processes to reach their future objectives. These ‘best practices’, and the standards/guidelines which are based on these best practices are unsuitable for the modern technologically-dependent organization because they were intended to serve a different purpose within a vastly different business environment. Some might question or be puzzled by the notion that long-standing and widely-accepted best-practices could be unreliable, however, that really shouldn’t be so disturbing. After all, ‘Blood-Letting” was once a medical ‘best practice’; one that no so long ago lead to the death of our first US President. It is time to modernize Business Continuity and align to the genuine needs of today’s technologically dependent organization.
Today almost everything an organization does relies in some form or other on information technology. Businesses use IT to link to customers, suppliers and partners; to increase their operating efficiencies, connect global supply chains and more. With advancements in IT, we now do more transactions online, of greater value, and faster than ever before. It could be said that the modern organization is entirely dependent on IT. In a world filled with thousands of servers that are executing petabytes of data and covering hundreds of miles of networks in less than a Nano-second, unforeseen ‘one-in-a-million’ glitch events can happen in the blink of an eye. The complexity in today’s IT is extremely different from the uniform and homogenous IT environment that was in place when many Business Continuity ‘best practices’ were designed. To rely on these old ‘best practices’ for your business continuity strategy creates blind-spots that may lead to significant oversights- oversights that profoundly affect the reliability of the overall strategy.
The prevailing ‘best practices’ in Business Continuity has favored a ‘Better Safe than Sorry’ approach to dealing with risk. In ordinary life ‘Better Safe than Sorry’ seems quite sensible since intuitively, it does seem better to be safe. However, this paradigm does not work when the cost of the safety is greater than the cost of the risk or- when ‘the cure is worse than the disease’. Safety, therefore, is not an ‘all or nothing’ condition. Risk comes in matters of degrees and mitigation actions have a variety of trade-offs. There are times when this perception of safety causes blind-spots that lead us astray and cause us to overspend or waste valuable resources; but feeling safe is not the same as being safe. As Robert Hahn points out to the US Congress, ‘This leads to a paradox that is becoming increasingly recognized. Large amounts of resources are often devoted to slight or speculative dangers while substantial and well-documented dangers remain unaddressed’1.
We can’t bet our organization’s valuable and scarce resources based on intuition and rules-of-thumb; and its harm is that when resources are disproportionally allocated to efforts based on precautionary heuristics then those resources will not be available for less obvious but potentially more harmful risks. This is what must change to manage continuity in today’s complex business environment; Business Continuity can no longer rely on outdated heuristics and precautionary ‘best practices’.
Managing continuity in today’s complex IT-dependent organization requires replacing the ‘better safe than sorry’ heuristics with optimal risk-reduction actions. Managing risk depends on measuring the size of the investment and the speculative-ness of the harm. The potential negative consequences of catastrophic events such as floods, fires, hurricanes, tornados, earthquakes, terrorism, pandemics, or a meteor strike is quite significant. The question is not whether these events are hazardous or whether they should be of interest to an organization. It is obvious that the loss of life and resources from catastrophic events can cripple a business, and being unprepared for such an event is equally obvious, but Capitalism is not about doomsday prepping. Capitalism is about calculated risk-taking: no risk taking, no innovation, no competitive advantage, and no shareholder value. Congressman Michael G. Oxley points out that “Capitalism is about taking risk, and that is what makes our system so productive.”2
The “Big Question”, the question that precautionary principal does not and cannot answer for Business Continuity is ‘when to stop-spending resources on safety’? The hard question for Business Continuity to answer is not ‘what to do’ but ‘how much to spend’. A business, after all, can’t spend all its money and resources on safety.
Many Business Continuity ‘best practices’ conceal the precautionary-bias by using legitimate sounding terms such as ‘risk appetite’, ‘risk tolerance’, and ‘risk aversion’, but these terms are never developed beyond heuristics and subjective judgment. These terms are just ordinary perceptions about risk and they are neither measurements of risk nor can they be used to calculate risk. They are simply terms about how we feel about risk. Still other Business Continuity ‘best practices’ mask their subjectivity and bias through the use of elaborate ‘High-Medium-Low’’ (HML) matrix-models. These tools don’t calculate risk- they merely rank perceptions of risk, providing no credible information or statistical grounding needed to make a rational decision on how to optimally reduce risk. These models just describe how we feel about risk, which does not help answer ‘what to do’ or ‘how much to spend’.
The precautionary-bias is peppered throughout the many Business Continuity standards, guidelines, ‘best practices’, as well as its certifications. Today it is more important than ever for a balanced approach to business continuity and precautionary guidelines that consistently ignore minor cracks in continuity will not serve that purpose. Our organizations would be better served if business continuity first looked for ways to proactively fill those continuity cracks rather than solving for the next Apocalypse. All in all “a stich in time saves nine”. The real problem with traditional approaches is not that they are wrong, but that they offer no guidance to modern organizations on how to optimally reduce risk; how to fill the cracks. The unintended consequence of these outdated Business Continuity methods has been that the operational aspects of IT have been systematically neglected, and this might be the biggest blunder in business continuity today.
With all these best practices, these HML-matrix-models and this talk of risk aversion, there seems to be a growing and significant disconnect with what is actually happening in our new digital-world. Business Continuity routinely dismisses IT-risks in favor of the prevailing ‘risk-of-the-month’ because of the best practices have a close affinity to the precautionary-bias. While few would argue that IT is becoming increasing important to every organization, a Business Continuity certification consultant recently stated at an industry event that “the ultimate goal of BC activities was to get out of the data center”; an antiquated notion which undoubtedly implies that the IT-infrastructure is unworthy of serious attention from business-oriented BCM practitioners. Nothing could be further from the truth.
The precautionary-bias coupled with people’s fear will trigger perceptions about worst-case scenarios that make them appear increasingly plausible. In 2008/2009, the United States suffered a major financial meltdown, one with an impact that many economists have estimated at $1.8 trillion3. While we intuitively understand the consequences of a loss at that scale, most of us fail to recognize the extent of a “silent” IT disaster unfolding under our virtual noses. According to IT complexity expert and ObjectWatch founder, Roger Sessions, organizations in the United States lose $1.2 trillion from IT failures every year. Worldwide, the total comes to $6.2 trillion. Although Sessions’ numbers have been challenged by other economists, their calculations remain sobering, concluding that threat worldwide is “only” $3.8 trillion!
The most notable aspect of Session’s math is this: the overwhelming majority of the annual $1.2 trillion loss is not caused by the low-probability/high-consequence catastrophes that capture attention, but by high-probability/low-consequence events that occur frequently, such as software bugs, hardware failures and security breaches. Worse, as applications become more complex, involving an ever-larger tangle of codes, data nodes and systems networks, the exposure to these “smaller” events becomes more frequent and their impact more costly.
The sheer size of these losses due to IT-failure should serve as a wake-up call for anyone related to Business Continuity. How could the very practices that were intended to provide continuity for our organizations allow interruptions that generate losses of this magnitude? Either Business Continuity’s target or its aim has been considerably off. While business continuity has been waiting and preparing for a catastrophic event, it ignored the real risk to continuity, IT. Business Continuity ‘best practices’ absolutely must start to do things differently. We need to start thinking rationally about where to devote our efforts and where to place our emphasis. Genuine Business Continuity ‘best practices’ must make certain that real and serious risks receive the attention that it deserves.
The “Big Question” as we discussed earlier, covers optimization of scarce resources in the present to achieve the greatest benefit for our organization in the future. After all, it is not about turning the lights back on once they fail; continuity is about ensuring that the lights never go off in the first place. For Business Continuity the “Big Question” has two components: (1) which risks are the serious ones and (2) what are the optimal risk-reduction actions. Traditional methods currently used in Business Continuity offer little advice to answering the Big Question. In fact, the current set of heuristics can be dysfunctional because it unknowingly distracts resources to slight or speculative dangers.
Many in the Business Continuity community share a mistaken belief that it is impossible to develop credible quantitative risk estimates. That belief is illusory, as real world experience shows that there is a wealth of data on which to base quantitative risk estimates with a degree of precision and confidence sufficient to support sound management decisions. We don’t have to be perfect, in fact we can’t be perfect, and perfection is infinitely expensive. We do however need to increase the probability of success by reducing our losses. We need to apply the appropriate level of discipline relative to the complexity of the situation; IT is too complex to use heuristics, rules of thumb and intuitive judgment.
While precise information is extremely valuable, even small amounts of reliable data is infinitely better that than relying on subjective value judgments when it comes to making sound decisions about managing IT-infrastructure risk. Risk-related data is increasingly available. There is a surprising amount of information from which to make realistic calculations and estimates about IT infrastructure and operational risks. As Immanuel Kant said “We Have a Duty - Especially Where the Stakes Are Large - to Inform Ourselves Adequately about the Facts of the Situation.” All in all, it is far better to use empirical data than rely on intuitive, subjective judgments.
Business Continuity must make informed estimates about future losses and then take appropriate action based on those estimates. The underlying economic models must be constructed to accurately portray all of the pertinent risk parameters, as opposed to measuring risk-perceptions. Cost-benefit balancing can be applied to ensure a proper proportional-response. To keep the odds in our favor we must economically quantify the operational risks of the IT-infrastructure so that we can properly evaluate the many tradeoffs and reach the optimal risk-reduction solution for our organizations.
With $3 to $6 Trillion a year at stake, understanding how to prevent the continuing spiral of IT failures will have substantial benefits. In these difficult economic times, there is a tremendous amount of goodness that $3 to $6 Trillion could add to our global economy. Making rational decisions about calculated risks which reduce the economic impact of IT failures will be key to achieving a competitive advantage in today’s dog eat dog business world.
About the Author
Dennis is a Senior Manager in Competitive Strategy & Market Insights for Symantec covering Cloud/Virtualization/Big Data. He has consulted with large Fortune 500 companies in over 20 countries worldwide developing architectures and transition strategies for highly available, resilient and secure infrastructures in heterogeneous IT environments.
1 Making sense of Risk: An Agenda for Congress, in Risks, Benefits, and Lives Saved 183, Robert Hahn, ed. (New York: Oxford University Press, 1996).
2 Rebuilding Investor Confidence, Protecting U.S. Capital Markets, House Committee on Financial services, Michael G. Oxley, 2003
3 These and subsequent figures are from The IT Complexity Crisis: Danger and Opportunity, Roger Sessions, November 8, 2009