IBM Mainframes High Availability Concepts and Sysplex


Parallel SysplexHigh availability can be achieved in IBM mainframe environment utilizing parallel sysplex. Parallel sysplex is clustering technology which enables total capacity of multiple processors to be applied against common workloads as if they are part of a single computer (See IBM Z Connectivity Handbook).

Although parallel sysplex implies additional hardware costs, adds another level of complexity to software environment and does not deliver additional end user facilities; more and more enterprises go for it. A well managed, properly configured parallel Continuous Operationssysplex can decrease planned outages 90%, unplanned outages 10% and provide near continuous availability (See Share San Jose 2017 Parallel Sysplex: Achieving the Promise of High Availability Session 20697).

Continuous operations and high availability are componenets of continuous availability. High availability masks unplanned outages from end High Availabilityusers, continuous operations mask planned outages from end users. Continuous operations are achieved through non-disruptive hardware and software configuration change together with coexistence. High availability is achieved through fault tolerance, automated failure detection, recovery and reconfiguration. As a result continuous availability masks both types of outages from end users.

Serial vs Parallel ComponentsInstallations with highly redundant sysplex configurations may fail to meet availability objectives. To maintain availability objectives, it is necessary to investigate all components in detail. All of the steps required for a system to operate form serial components. If there are too many serial components then the system will be unreliable. Additionally it is necessary to replicate serial components to allow parallel processing.

To improve reliability for a system with parallel components, number of components are increased up to a point. After some point, increasing number of components does not increase reliability. This is known as the point of diminishing returns (See Achieving the Highest Levels of Parallel Sysplex Availability ).

People, Processes and ProceduresAs outages are caused by component failures, we tend to believe that redundant components will eliminate outages. This may not be true. Redundancy is necessary but not sufficient. Only 20% of the outages can be avoided by redundancy which are caused by environmental factors, software or hardware failures. 40% of outages caused by operational failures and another 40% caused by application failures. Operational errors can be avoided by process management, training and automation; Application errors can also be avoided with similar investment.

High Availability vs Disaster RecoveryHigh availability and disaster recovery address different aspects of business objectives. High availability obstacles are component failures within the datacenter. So objective is restore full service within the datacenter. Disaster recovery is triggered when full datacenter is out of service and critical services are restored at an alternate site.

An organization should be clear about objectives and roles of high availability and disaster recovery in providing business continuity and the reality of what can be achieved. After determining objectives, should invest appropriate amounts in infrastructure, people and processes.

This entry was posted in IBM zEnterprise Servers, IT Service Management and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s