As senior staff in a high-tech company’s data center in the early 1980s, I was assigned a nonsensical task: to write a program to verify that a new, to-be-installed mainframe generation (4341) was fully compatible with its predecessor (370/148).
Why was this nonsensical? Because IBM has always had squads of people—armed with detailed internal specifications—dedicated to ensuring familial similarity, while I had only the definitive IBM manual, Principles of Operation, and the pocket reference affectionately known as “Green Card” user-level documentation. It was an interesting project, and—unsurprisingly—it confirmed that the replacement would be uneventful, because for decades IBM had used the formal and elaborate System Assurance Kernel (SAK) proprietary suite, which the company continues to employ today. (SAK was described in depth in the 2002 IBM Journal of Research and Development paper “IBM eServer z900 System Microcode Verification by Simulation: The Virtual Power-on Process.”) Far too complex for manual checking, modern processors—with billions of transistors on board—require sophisticated tools for validation and testing.
In the ’80s, the timeworn joke among data center staff was that users existed to test system programs. It was a time when the prevailing attitude was that applications people think about how it will work; systems people think about how it will fail. Now that most everything mainframe-based—transaction processing, databases, analytics, big data, private clouds, worldwide services—must run nonstop, these attitudes have evolved, if not totally disappeared. After all, everyone has a stake in systems and applications working, and an equal interest in anticipating and avoiding failure.
While (mostly) invisible externally, mainframes internally are mission-critical system-of-record data servers. Gone are the days when it was sufficient to sysgen, IPL, maybe run a few test batch jobs or interactive scripts, then wait for user feedback. For instance, users on mobile devices use mainframe-hosted applications to manipulate and present data. This makes it essential to validate and measure end user experience—not just a mainframe’s internal function or even performance.
Just as punch cards have vanished from IT, so has casual inspection to verify system security, replaced by rigorous—often proprietary—technology-based tools.
Mainframe of mind
The term “IBM mainframe” includes many hardware generations dating back to the 1960s System/360, plus the several operating systems supporting it (currently z/VSE, z/VM, z/OS, Linux on IBM Z, and z/TPF, which all evolved from earlier versions). Of these, z/ S is often considered the flagship operating system, but it isn’t synonymous with “mainframe.”
From concept through design, coding, testing, and rollout, application implementation requires systems thinking—sometimes called the mainframe mindset—which is all about anticipating what can go wrong. Testing for and handling glitches such as bad, no, or duplicated inputs; user mistakes; I/O errors; network problems; and more should be second nature to mainframe application developers. They needn’t master system internals, but they should appreciate the environment’s facilities for problem detection and resolution, and should fully apply them.
Another key part of this mainframe mindset is to never suffer the same problem twice. (Think how many times you’re told to fix PC problems by rebooting!) Of course, that requires initially gathering adequate diagnostic information. (Dan Skwire wrote the book on this: First Fault Software Problem Solving: A Guide for Engineers, Managers, and Users.) Sadly, an obstacle to customers helping to diagnose and resolve mainframe problems is IBM’s OCO (Object Code Only) policy, begun in the early 1980s to counter theft of proprietary intellectual property and technologies and to thwart competitors. Prior to OCO, source code for most IBM products was available, satisfying the desire to fix it in the language in which it broke. Without source code, installations are less able to resolve problems with testing and debugging, and must therefore rely more on vendor support.
Though developers may lack source code and internal logic information for operating systems and many commercial products, collaboration and teamwork are essential for locally developed applications. For example, practices such as design presentations, code reviews and walk-throughs, prototype demonstrations, and shared documentation (best written before implementation) can lead to solutions that are easier to apply sooner rather than later.
The trouble with mainframe testing
The mainframe mindset dictates caution, so mainframe software testing is best practiced in an isolated environment—whether a separate central processor complex (CPC), logical partition (LPAR), or virtual machine—and never mixes test activities with production. But skills scarcity has led to a devaluation of testing and documentation, says David Boyes, president and CTO of Sine Nomine Associates, a Washington, DC-area engineering firm. While development should include disciplined analysis and testing of general and edge cases, time is often too short to do it. Automation and structured testing, however, can reduce the amount of expertise needed to detect problems.
Meanwhile, the days of all-IBM (true blue!) installations have ended. Today, many mainframe sites blend vendor products to support silos—each with unique test procedures. It’s hardly unusual to encounter an operating system from one vendor, another supplier’s database, an industry-specific niche application, and someone else’s performance monitor and dashboard. Testing must include both single-product exercises and full configuration validation. Many companies also begin a system or component rollout or upgrade with the vendor-supplied installation verification procedure (IVP).
IBM business partner vendors of hardware components used with mainframes (such as Vicom Infinity) face special challenges, since their products can affect overall system stability. They use proprietary tools to validate their custom functions, plus the IBM SAK internal-use-only tool to fully exercise system architecture, plus esoteric boundary conditions related to storage keys and address modes.
Independent software vendor testing should take place before, during, and after code is written, says Ken Meyer, a former software architect who has spent decades working on z/VSE software products. Start by minimizing external characteristic changes, then loading new code on a simple test machine to check new and changed features as well as their impact on the whole. Complications are iteratively introduced until testing culminates with other company products to ensure successful integration. A separate QA procedure can look for variances from expected outcomes.
Alan Altmark, senior managing z/VM consultant at IBM, suggests testing applications with all available good program temporary fixes (PTFs). Then spin off new production levels when requirements demand it, and as governance requires that your risk tolerance permits.
The next best thing to magically creating problem-free applications is to simplify and automate testing, emphasizing meaningful test cases versus test implementations. But, as mainframers know, automating interactions requires specific tools, since mouse and keyboard alone can’t access individual application elements and objects, such as those found on web pages. So image and text recognition is required to automate processes, as are driving mouse and keyboard actions.
Boyes recommends Robot Framework, a sophisticated test automation framework that can build tests with keywords. The recent addition of user-contributed IBM 3270 terminal support to interaction models allows the testing focus to change from code- to interaction-based. “The move to UX-oriented testing has increased the need for something that can simulate a user,” he adds, which also makes it “more interesting to develop tests.”
Leapwork, a test automation vendor, supports functional testing with design flows, executing processes from a mainframe terminal as if they were user interactions. One company using the Leapwork Automation Platform, a Danish insurance provider, notes that 24/7 testing provides insights on the stability of web solutions and facilitates maintenance of automated test cases. Leapwork’s product was designed primarily for functional testing, says Senior Product Evangelist Kasper Fehrend, but “creative minds use it for performance testing” and process monitoring.
Teleprocessing Network Simulator (TPNS) is another test automation tool. First released in 1976, it’s used for functional, regression, and system testing; capacity management; benchmarking; and stress testing. A similar, more current tool is IBM Workload Simulator.
Timothy Sipples, IT architect executive at IBM, also reports soaring use of what might be called disposable z/OS instances. He notes that “in cloud environments, wherever they run (including on mainframes), automated testing frequently involves provisioning and deprovisioning large, complex, temporary deployment landscapes.”
These landscapes should include z/OS, with new, ephemeral z/OS instances. If something breaks, it’s discarded for a fresh start. Indeed, z/OS system programmers needn’t be involved any more than, for example, Linux system administrators are involved when a test team fires up a score of Linux virtual machines with hundreds of containers and then throws them away.
Go slow and break nothing
Few mainframers tolerate the culture of “move fast and break things”—the consequences are too severe and too visible! Mainframe systems are complex, usually combining multiple applications, databases, and technologies, as well as poorly understood legacy code.
It’s a perpetual cliché that the teams working in data centers and central IT are themselves obstacles to quick development turnaround. To some extent, that’s true: These teams must ensure that changes and new developments are sound and don’t break anything else. As internet connectivity and scalability have led to increasingly hybrid systems and interoperating platforms, there’s more awareness of the need for end-to-end testing that reflects real-world—that is, unpredictable—use cases.
My longtime colleague and friend Stan King, CEO and CTO of Information Technology Company, praises classic methodologies once commonly used during product engineering such as Yourdon, DeMarco, and SDM. King, who has designed, developed, and supported mainframe products for decades, echoes Boyes’s dislike of the mantra “get it done fast.” Fast is rarely good, he says, and often overlooks problems.
Experienced mainframers react with mixed emotions when faced with (even pleasantly) surprised reactions to learning that these machines still exist, or that they are heavily relied upon by enterprise-scale industries such as banking and high-volume transaction processing. But, however closely one does (or doesn’t) work with mainframes, it’s useful to understand the mainframe mindset. It’s how big iron has earned, and kept, its quality reputation.