![]() |
|
N-Tier Client/Server Applications: The Achilles Heel of Disaster
Recovery Since the late 1990s, it has become increasingly fashionable for DR planners and consultants to recast themselves as “Business Continuity Planners.” BCP practitioners decry the “limited focus” of traditional DR planning on the IT infrastructure, the mainframe and data center. They urge that the focus of planning should be expanded to encompass “business processes” –- the totality of both IT infrastructure supports and employees who perform manual and automated tasks. The old guard had it wrong, they argue: the business process, and not the system, is the appropriate focus for contingency planning. Acronyms and organizing principles aside, however, the differences between BCP and traditional DRP remain comparatively minor. The focus of BCP, on holistic business processes, may differ somewhat from older DRP views, but the techniques of traditional disaster recovery planning persist. For example, neither discipline has articulated any disaster recovery methods that are appropriate for distributed client/server applications (let alone applications distributed over a network of geographically-disbursed supply chain partners). Leveraging older mainframe-centric recovery methods, both BCP advocates and DRP traditionalists have opted to approach the problem of client/server DR planning through the application of one-for-one redundancy in software, middleware and host hardware. From the standpoint of both cost and efficiency, the “replacement-through-redundancy” approach to client/server system recovery is nothing short of a bust. Yet, both practitioners and vendors continue to approach client/server recovery from this perspective, making it an Achilles Heel both for actual recovery and for contingency planning budgets. SOME BACKGROUND DR planning originally focused on the recovery of mainframe operations. Beginning in the early 1960s, mainframes provided the predominant platform for mission-critical business information processing services and, by all accounts, continued to do so through the end of the Millennium. Some analysts contend that the majority of critical applications continue to reside on mainframes. Thus, many planners and vendors are content to continue to utilize time-tested mainframe replacement strategies as the centerpiece of their contingency plans –- and rightly so. However, in a growing number of business environments, distributed computing platforms – sometimes called open systems platforms – are proliferating. Driven by a number of factors, companies are rolling out distributed Enterprise Resource Planning (ERP), Manufacturing Resource Planning (MRP) and Customer Relationship Management (CRM) applications (just to name three) on these distributed platforms to meet mission-critical information needs. These and other client/server applications account for a large part of the growing percentage of critical apps that do not reside on mainframe hosts and yet require comprehensive backup and recovery strategies. Some mainframe backup service providers – “hot site” vendors – argue that there is little difference between the requirements for backing up client/server apps and the requirements for backing up complex mainframe-hosted applications. Traditional DR planning methodology provides the steps involved: 1. Identify the application and its host requirements. 2. Size host platform resources (including communications requirements) to fit minimum processing requirements in recovery mode operations. 3. Subscribe to the necessary replacement resources at the hot site.
In many cases, vendors argue, the recovery platform required will have a “smaller footprint” in terms of resources and capabilities than the actual production platform. Since every application used by a company in normal operations is not equally critical, often a smaller host system (sometimes called a minimum equipment configuration) can be used to operate those few applications that are deemed critical. Vendors caveat the above with a simple assertion: if the critical applications involved are very complex or have complex multi-tier client/server hosting platforms, it may be necessary to provide a 1-for-1 replacement of production platform resources at the recovery site. While they admit that 1-for-1 platform replacement strategies are inherently more expensive than minimum equipment configuration approaches, planners are often advised that there are no alternatives. In the case of multi-tier client/server applications, the cost for a 1-for-1 platform replacement can be quite daunting. DEVIL IN THE DETAILS While not all client/server applications require 1-for-1 element replacement in order to be recovered at an alternate site following an interruption, this is frequently the case, especially for “homegrown” apps. One reason has to do with the manner in which application functionality is expanded over time. The rollout of client/server applications is often an iterative process. Usually, a basic set of functionality is provided initially, with additional functionality being added at various intervals over time. Such a protracted application implementation cycle opens the door to a number of factors that can limit recovery options. For one, different middleware products may be used as new functionality is added to the application. According to integrators who have been engaged to “web-enable” multi-tier client/server applications for customers who are interested in capitalizing on the Internet, it is commonplace to encounter “n-tier” client/server applications whose components are held together by a kludge of middleware products from different vendors. These different products approach the problem of intra-application messaging from very different viewpoints. Some middleware products identify the components of a client/server application by a filename on a particular server. Others may use addressing schemes based on MAC addresses, server IP addresses, or other mechanisms hard-coded within hardware platform components themselves. In the final analysis, it is a miracle that such platforms perform at all efficiently in a normal production environment. Replicating the requirements for proper performance in a recovery environment can be a time-consuming nightmare. Vendors of commercial off-the-shelf client/server applications, including leading ERP software vendors, claim to have the solution. Purchasing a leading ERP app and using vendor-recommended (or, in some cases, provided) middleware components can expedite rollouts and reduce the kludge factor. According to integrators (and a few inside sources at leading client/server application suite vendor shops), this is not true. In the competitive world of MRP/ERP/CRM software, vendors are constantly jockeying for position in the market by enhancing their products with new functionality. Often, as part of an effort to keep up with an opponent's product, a vendor will simply purchase technology from a third-party software company (or they will just buy the company). Rarely, however, is time sufficient to thoroughly integrate the acquired technology with the existing product. Vendors commonly use “patches” in the form of quick middleware fixes to get the products to market quickly. Thus, whether homegrown or commercial-off-the-shelf, n-tier client/server applications are typically difficult, costly and time-consuming to deploy and just as difficult, costly and time-consuming to recover. The devil, as they say, is in the details. ANGELS IN THE ARCHITECTURE Despite the criticisms that may be lodged against client/server from a DR perspective, this has not curbed the appetite of modern corporations to invest in this model for mission-critical application delivery. Client/server isn't going away, so DRP/BCP practitioners need to deal with it. Borrowing the IBM mantra of yesteryear, modern contingency planners need to view the challenge of client/server as “an opportunity to excel.” The key to expanding the options for recovering client/server applications in the wake of disaster is to become more proactive. There are numerous documented cases of distributed systems providing speedier recovery from disaster than did centralized systems confronting the same disaster event. My personal favorite is Tokio Marine, a property and casualty insurance company that deployed its key applications on a distributed platform and recovered operations within four hours of the 1995 Kobe earthquake. Meanwhile, a competitor's mainframe shop down the street required over a week to accomplish a recovery. The reason for the different outcomes: Tokio Marine had designed its systems for resource replication within the system itself. When servers in one part of its network were taken out by the earthquake, replicated elements on other servers could be brought on-line rapidly. The Tokio Marine case and many others demonstrate that client/server systems can be made resilient and recoverable if disaster recovery considerations are kept clearly in mind when architects first set designs to paper. Too often, however, DRP/BCP practitioners lack even a rudimentary understanding of modern client/server technology and tend to approach planning with a mindset that they must not interfere with application development processes, but rather “play whatever hand they are dealt.” Given an understanding of client/server technology, a good relationship with application architects, and a bit of management backing, DRP/BCP practitioners can influence the development of applications so their availability and recoverability is enhanced. A few approaches may include:
CONCLUSION While important, the Business Continuity Planning “extensions” to traditional Disaster Recovery Planning have not contributed significantly to the methodology for systems recovery. With the advent of the Internet and the appearance of the ubiquitous web browser, nearly every application within an organization is being converted (or will be shortly) into a client/server application. This trend will facilitate recovery in other areas –- for example, web-enabled applications open new doors for end user recovery strategies in which end users work from home during an emergency -– but it challenges older methods for systems recovery. Instead of continuing debate over what should be “the appropriate focus” of contingency planning (business processes versus systems) and whether a certification in one discipline is preferable to certification in the other, perhaps contingency planners need to get back to solving a fundamental problem: How will we recover mission-critical applications within the shortest possible timeframe following an unplanned interruption?
|