Constructing Scientific Meta-Computations

Patrick T. Homer
Richard D. Schlichting

Department of Computer Science
The University of Arizona
Tucson, Arizona 85721, USA
{patrick, rick}@cs.arizona.edu

Abstract

The increasing complexity of High Performance Computing (HPC) applications, especially scientific simulations, has led to the development of the meta-computation model of scientific applications. In this model, an application is constructed from a collection of independent software components potentially written in different programming languages and targeted for different machine architectures. A software interconnection system is then used to connect these heterogeneous components spanning the Internet into a single logical program, and to provide configuration and control capabilities over the resulting computation. This paper describes the meta-computation model, the current state of an interconnection system called Schooner that provides the software infrastructure needed to realize this model, and the use of the model in an on-going NASA jet engine simulation and monitoring project.

1. - Introduction
2. - Interconnection Systems
3. - The Schooner Interconnection System
3.1 - UTS
3.2 - Stub Compilers
3.3 - Runtime System
4. - Engine Simulation and Monitoring
5. - Other Approaches to Heterogeneity
6. - Conclusion
Acknowledgments
References

1. Introduction

The simulation of scientific processes on high-performance computing (HPC) systems is playing an increasingly important role in scientific research because of its potential for performing controlled experiments where cost, danger, or other factors limit real-world experiments. For example, the development of adequate flight simulation environments and numeric wind tunnels can reduce the danger and expense involved in prototyping new aircraft designs. Climate modelling provides another example where the possibility of predicting long-term climatic effects can help in determining appropriate measures to avoid future problems.

The current state-of-the-art in high-performance computing, however, presents a dilemma to software developers. On one hand, the complexity of a real-world system requires a corresponding complexity in any attempt to model it computationally. For a given application, this complexity often mandates not only a substantial number of cycles, but the use of heterogeneous hardware and software resources to implement the various parts of the solution. Unfortunately, the other side of the dilemma is that little software support is available for incorporating such heterogeneous resources into a single logical program. The result is a model in which an application consists of multiple individual programs or components, where each is executed separately and files or other manual methods are used to transfer data from one component to the next.

This paper describes an alternative model in which HPC applications are constructed as heterogeneous distributed programs or meta-computations [23]. In this model, the application is composed of multiple components potentially executing on diverse hardware as before, but with a software interconnection system being used to combine the components into a single logical program. In addition to providing the programmer with facilities for doing procedure calls from one component to another without having to worry about heterogeneity, the software interconnection system also gives the user enhanced configuration and control capabilities over the resulting computation. This model varies from traditional RPC and object-oriented systems in its focus on heterogeneity, support for application-level programming, and its emphasis on scientific applications.

In the next section, we give an overview of the problems that must be addressed by the meta-computation model, and some additional requirements imposed by our focus on scientific applications. Section three describes the major features of Schooner, an interconnection system that provides the software infrastructure needed to realize this model. Section four illustrates the use of the system with a jet engine simulation being developed as part of NASA's Numerical Propulsion System Simulation (NPSS) project. Section five offers concluding remarks. Other papers have described earlier work with the Schooner system [18, 19] while a dissertation [20] describes the development of the system in some detail.


2. Interconnection Systems

Two specific problems can be identified when considering HPC applications that require access to heterogeneous hardware and software resources. The first is the interconnection problem, i.e., how to transfer control and communicate data among the heterogeneous components that comprise the application. Without adequate software support for communicating data among components, the developer is forced into one of two inadequate methods: data files or approximation. The data file approach has each component write its results to one or more data files, with these becoming the inputs to other components. Beyond the burden this places on the user to manage the files, this method does not provide the frequency of data exchange needed in many situations, and implies an essentially disconnected execution of the components.

The second method avoids data exchange altogether and uses approximation instead. For example, in computational fluid dynamics, a common practice is to represent the upstream fluid data with a boundary that approximates the data with either a constant set of values, or time-varying functions to approximate patterns present in the real system. Yet neither can provide the realistic data needed in many simulations, nor provide the bidirectional data that is critical in understanding how the physical processes on either side of the boundary interact.

Solving the connection problem yields a meta-computation, but leaves the user with the second problem, which we call the configuration problem. Where the user had been executing each component separately, the situation now requires all components to execute at the same time. As a result, the user is presented with the difficulties of manually starting components on a varied collection of machines, establishing the communication links needed among the components, and controlling the thread of execution as it passes from component to component. Currently, little support is available for configuring such a collection of components into a single HPC application, or for controlling the subsequent execution flow.

A software interconnection system solves these two problems by providing connections among components and implementing a configuration and control system. Each component in the interconnection system contains one or more computation codes or data manipulation tools that accomplish a specific set of tasks. Since a component can be developed using the combination of programming language, model, or architecture that is most suitable, the component becomes the unit of heterogeneity in the system. At runtime, components export services for use by other components in the application. This is accomplished by attaching an automatically generated interface to each component. The interface advertises those services the component makes available to other components, and handles the job of locating and accessing the external services needed by the component. The figure below illustrates this concept.

Our focus on HPC applications imposes two additional requirements on an interconnection system beyond those that might be imposed on systems oriented to a different application domain. First, the system must be easy to use, so that the computational scientist who actually constructs the application will be willing to make the transition from more traditional methods. Second, the impact on the source code for each component in the system must be minimized, since these codes may have been independently developed, may be maintained by a different group, or may simply be unavailable.

Use of an interconnection system results in a meta-computation distributed over a variety of machines, requiring configuration tools to assist the user in controlling the component applications. The configuration management features of the model give the user both static and dynamic configuration control. Static control allows the user to select the components and hosts that will be needed for the execution, and to start and execute the meta-computation. Once execution has begun, dynamic control allows components to be added or removed as needed by the user or through system calls issued by the components themselves.

One key benefit of this approach is the potential for improved user interaction with HPC applications. For example, the meta-computation can be configured to include a visualization tool that allows the user to monitor the results of the computation in real-time. By using this tool to control the parameters for some or all of the constituent components, the user can steer the simulation based on intermediate results.


3. The Schooner Interconnection System

The Schooner interconnection system provides the software underpinning needed to realize the meta-computation model. It does this by providing facilities that allow one component to invoke procedures exported by another component, and by supplying a configuration and control mechanism for executing heterogeneous distributed computations. This functionality is implemented by the four, mostly orthogonal parts that comprise the Schooner system: a specification language, an intermediate data representation and accompanying data exchange library, a set of stub compilers, and a runtime support system.

3.1 UTS

The Universal Type System (UTS) combines two of the four Schooner services: an intermediate data representation and a specification language [16, 17]. The intermediate data representation allows data to be represented in a machine- and language-independent manner. It includes most simple data types, plus full support for array and record types and procedure parameters. Library routines are provided to convert data to and from the UTS representation. In most cases, these routines are only invoked within automatically generated stub procedures. The representation used by UTS includes type tags on each data element. These tags allow the component on the receiving end of a communication to validate the type of data being received, and also facilitates the streaming of data between components.

The specification language portion of UTS is used to specify the interface, essentially the number and type of the arguments for each procedure that can be called remotely in the application. For each procedure, one export specification is associated with the component containing the code for the procedure, and an import specification is associated with each component that might call the procedure. For example, the interface specification for a component of a jet engine simulation is shown below:

Since components are separately compiled, often on different machines, a major use of the specifications is in runtime type-checking of import and export specifications.

3.2 Stub Compilers

In addition to being used for type checking, the specification file is also used as the basis for creating the interface code that is a part of each component. This code is generated automatically by a stub compiler and consists of stub procedures that primarily contain calls to the UTS library for converting the procedure's parameters between native and UTS representations. There is one stub compiler for each supported programming language, currently C, C++ and FORTRAN.

After stubs for a component are generated from the specification, they are compiled using the appropriate language processor. The resulting object module is then linked with the user's code, the UTS libraries, and the Schooner runtime support libraries to produce the component.

3.3 Runtime System

A component in Schooner is implemented by a process at runtime. In addition to the components that make up the meta-computation, there is one Manager process for the meta-computation and one Server process for each host machine. The Server process cooperates with the Manager in starting components on the various hosts. The Manager is the central coordinator of the meta-computation, handling the twin configuration tasks of mapping components onto hosts and binding components into the meta-computation. The mapping task is carried out with the cooperation of the Server. The binding task is implemented by a component registration procedure for exported procedures and a table lookup function. On the first call to an imported procedure, a component sends a lookup request to the Manager.

Beyond the basic interconnection tasks, the runtime implements two advanced features, both added as a direct result of the needs of scientific collaborations. The first of these is a limited form of concurrency. Concurrency occurs in HPC applications at four levels: fine-, medium-, and coarse-grain parallelism, and the concurrent execution of multiple experiments. From the perspective of an interconnection system, computations employing fine- and medium-grain parallelism are encapsulated within a component, as illustrated in Figure 1. Coarse-grain parallelism and execution of multiple experiments are supported in Schooner through the line abstraction. Each line is a sequential execution of procedures within a group of components that executes independently of the other lines with no implicit cross-line synchronization. New lines can be started and previous lines terminated without affecting other lines.

Dynamic configuration is the second advanced feature of Schooner. This allows the modification of the binding of components to lines and the mapping of components to hosts after a meta-computation begins execution. Thus, the meta-computation need not be fully defined at the beginning of execution, but rather can be modified during execution to adapt to the needs of the calculations or the availability of hosts. Mechanisms are included in Schooner that allow dynamic creation and deletion of lines and the components that comprise them, and a limited ability to move an executing component from one host to another.


4. Engine Simulation and Monitoring

The Numerical Propulsion System Simulation (NPSS) project is a multi-year effort sponsored by NASA to improve the state-of-the-art in jet engine simulation [10, 11]. Two goals of this project are to experiment with steering engine simulations and to combine low- and high-fidelity engine component simulations. Schooner is currently employed in a demonstration project addressing these goals in collaboration with researchers at NASA Lewis Research Center, the University of Toledo, and Cleveland State University. A one-dimensional engine model, called TESS, is being extended to include a high-fidelity simulation of the fan component. In a parallel development, a graphical interface and expert system are under construction that will allow the user to monitor the execution progress of engine components, and provide assistance in steering the simulation [2].

The Turbofan Engine System Simulator (TESS) provides a complete one-dimensional transient thermodynamic aircraft engine simulation [25]. TESS represents each of the principal components of an engine as a module in the AVS scientific visualization system [1]. An engine is constructed in AVS by connecting the TESS modules to represent the airflow through the engine. The particular engine to be modeled is determined by the modules chosen to represent the engine, and the inputs to each module.

The fan component chosen for integration into TESS is the Advanced Ducted Propfan Analysis Code (ADPAC) [15]. Originally developed for the study of high-speed ducted propfan aircraft propulsion systems, ADPAC has become a general solver for turbomachinery components. It predicts the flow field in and around the fan through an Euler/Navier-Stokes numerical analysis that uses a three-dimensional, time-marching procedure along with a flexible coupled 2-D/3-D multi-block geometric grid representation.

Including a high-fidelity component in a low-fidelity simulation is difficult, due mainly to the inability to accurately resolve high-fidelity data fields from a single value as supplied from the low-fidelity simulation. Typically, averaged boundary values will not initially match the low-fidelity simulator value. To solve this problem, multiple runs of the high-fidelity component are executed in parallel, varying the method used to expand the single-value inputs into the needed two-dimensional boundaries. TESS then constructs from the results of the runs a single-curve performance map that is interpolated for the needed one-dimensional result.

A tool has been developed to allow the user to monitor the intermediate and final results of the ADPAC runs. This monitoring tool, constructed using the TAE+ GUI toolkit [30], reports the residual -- a measure of how well ADPAC is converging -- detects warnings, and reports final results for each ADPAC instance. A watch-dog process is created on each machine executing ADPAC to monitor ADPAC's output and send results to the monitoring tool. An expert system is under development using CLIPS and will assist the user in monitoring and steering the results of the ADPAC runs [12]. Currently, the prototype expert system detects and reports several types of warnings.



The figure above sketches the structure of the TESS-ADPAC tool. The TESS simulation executes on an AVS-equipped workstation. The monitoring tool and expert system execute on another workstation. Up to eight ADPAC and watch-dog instances execute in parallel on multiple platforms, with Schooner being used to provide the interconnection and configuration facilities. For each round of ADPAC runs, the user can vary both the number of ADPAC instances, and the machines on which they execute. This system has been tested in a number of configurations, including a demonstration at Supercomputing `94 with TESS on an SGI Onyx and the monitoring tool and expert system on a Sun Sparc 10, both machines on the display floor. The ADPAC instances were executed on nodes of an IBM RS6000 cluster located at NASA Lewis Research Center [3].


5. Other Approaches to Heterogeneity

Other researchers have recognized that heterogeneity is important in some contexts, and have developed systems that may duplicate certain aspects of Schooner's functionality. These systems can be divided into three categories based on the type of problem each is designed to solve.

The first category contains message-passing libraries that are designed to facilitate the implementation of parallel algorithms. These include PVM [5, 29], p4 [8], and Zipcode [27], as well as the recent Message Passing Interface (MPI) [24] standard that has grown from the success of these earlier systems. Despite common themes of heterogeneous processing and HPC applications, the most significant difference between these systems and Schooner is that the systems have different goals: message-passing libraries seek to facilitate parallel solutions to specific problems, whereas Schooner is designed to interconnect multiple such computations into a single, higher-level application.

The second category seeks to exploit inherent heterogeneity to achieve parallel speedup within a given problem domain through a combination of approaches that includes both hardware- and software-based solutions [9, 13, 23, 26, 31, 32]. Hardware research includes the design of multi-processor, heterogeneous machines that combine processors in unique ways to exploit both the parallelism and heterogeneity present in the target applications. Software research includes communication improvements among the processors, compiler and language design work to detect heterogeneity, and scheduling and profiling techniques to provide the best match between task and processor. Schooner differs from this work in two ways. First, Schooner seeks to construct HPC applications from new and existing codes through interconnections, whereas these other efforts typically seek to design new codes that can exploit heterogeneity internally to achieve parallel speedup. Second, as a result of this difference in orientation, these efforts have a focus on detecting and exploiting fine-grain and medium-grain heterogeneity, and on improving communication among closely-coupled processors, while Schooner deals primarily with coarse-grained computations and exploiting long-distance networks. Thus, as was the case with message-passing libraries, these factors make this work and our work with Schooner complementary rather than competitive.

The third category includes systems that are related to Schooner's RPC mechanism. These other RPC schemes have features such as external data representations, specification languages, and stub compilers [4, 7, 28, 33]. Several of these systems also emphasize heterogeneity, including Matchmaker [22], Horus [14], HRPC [6], and Cicero/Nestor [21]. The primary distinction between this work and Schooner is again one of orientation: the main aim of these systems is to support interprocess communication for client/server style operating system services rather than application-level programs.


6. Conclusion

The use of the meta-computation model in general and Schooner in particular has been explored in a number of projects in addition to the jet engine simulation described above. These include an earlier NASA collaboration to create a prototype engine simulation executive, as well as molecular dynamics and neural net applications. In current projects, the meta-computation model and Schooner are being used in another NASA project to monitor a jet engine test cell in real-time, as well as an NSF-funded HPCC Grand Challenge project on ecosystem modeling at The University of Arizona. Current research on Schooner internals and other interconnection system issues is exploring the role of fault-tolerance, improving the communications support to take better advantage of high bandwidth networks and light-weight protocols, and examining possible uses and implementations of heterogeneous distributed shared memory.


Acknowledgments

The NPSS project is managed by the Interdisciplinary Technology Office at NASA Lewis Research Center (LeRC). Thanks are due to Greg Follen of LeRC, Abdollah Afjeh and John Reed of the Mechanical Engineering Department at the University of Toledo and Henry Lewandowski of the Industrial Engineering Department at Cleveland State University. This work supported in part by the National Science Foundation under grant ASC-9204021 and by the National Aeronautics and Space Administration under grant NCC3-374.


References

[1] Advanced Visual Systems Inc. AVS Developer's Guide (Release 4.0), Part number: 320-0013-02, Rev B, Advanced Visual Systems Inc., Waltham, MA, May 1992.

[2] A. A. Afjeh, P. T. Homer, H. Lewandowski, J. A. Reed, and R. D. Schlichting. Development of an intelligent monitoring and control system for a heterogeneous numerical propulsion system simulation. Proceedings 28th Annual Simulation Symposium, Phoenix, AZ (April 1995), 278-287.

[3] A. A. Afjeh, G. R. Follen, P. T. Homer, H. Lewandowski, J. A. Reed and R. D. Schlichting. Numerical Propulsion System Simulation -- 1D/3D Zooming and Monitoring in Jet Engine Simulation. Entry in the Heterogeneous Computing Challenge, Supercomputing `94, Washington, D.C. (November 1994).

[4] G. T. Almes, A. P. Black, E. D. Lazowska, and J. D. Noe. The Eden system: A technical review. IEEE Transactions on Software Engineering SE-11, 1 (January 1985), 43-59.

[5] A. Beguelin, J. J. Dongarra, G. A. Geist, R. Manchek, and V. S. Sunderam. Graphical development tools for network-based concurrent supercomputing. Supercomputing `91, Albuquerque, NM (November 1991), 435-444.

[6] B. N. Bershad, D. T. Ching, E. D. Lazowska, J. Sanislo, and M. Schwartz. A remote procedure call facility for interconnecting heterogeneous computer systems. IEEE Transactions on Software Engineering SE-13, 8 (August 1987), 880-894.

[7] A. D. Birrell and B. J. Nelson. Implementing remote procedure calls. ACM Transactions on Computer Systems 2, 1 (Feb. 1984), 39-59.

[8] R. Butler and E. Lusk. User's Guide to the p4 Parallel Programming System. Technical Report ANL-92/17. Mathematics and Computer Science Division, Argonne National Laboratory. October 1992.

[9] S. Chen, M. M. Eshaghian, A. Khokhar, and M. E. Shaaban. A selection theory and methodology for heterogeneous supercomputing. Proceedings: Workshop on Heterogeneous Processing, Newport Beach, CA (April 1993), 15-22.

[10] R. W. Claus, A. L. Evans, and G. J. Follen. Multidisciplinary propulsion simulation using NPSS. 4th AIAA/USAF/NASA/OAI Symposium on Multi-disciplinary Analysis and Optimization, Cleveland, OH (September 1992).

[11] R. W. Claus, A. L. Evans, J. K. Lylte, and L. D. Nichols. Numerical propulsion system simulation. Computing Systems in Engineering 2, 4 (April 1991), 357-364.

[12] CLIPS Reference Manual, Basic Programming Guide. Software Technology Branch, Lyndon B. Johnson Space Center. CLIPS Version 5.1, September 10, 1991.

[13] R. F. Freund and H. J. Siegel. Guest editors' introduction: Heterogeneous processing. IEEE Computer 26, 6 (June 1993), 13-17.

[14] P. B. Gibbons. A stub generator for multilanguage RPC in heterogeneous environments. IEEE Transactions on Software Engineering SE-13, 1 (January 1987), 77-87.

[15] E. J. Hall, R. A. Delaney, and J. L. Bettner. Investigation of Advanced Counterrotation Blade Configuration Concepts for High Speed Turboprop Systems, Task 5 -- Unsteady Counterrotation Ducted Propfan Analysis Computer Program User's Manual, NASA CR-187125, January 1993.

[16] R. Hayes. UTS: A Type System for Facilitating Data Communication, Ph.D. Dissertation, Department of Computer Science, University of Arizona, August 1989.

[17] R. Hayes and R. D. Schlichting. Facilitating mixed language programming in distributed systems. IEEE Transactions on Software Engineering SE-13, 12 (December 1987), 1254-1264.

[18] P. T. Homer and R. D. Schlichting. A software platform for constructing scientific applications from heterogeneous resources. Journal of Parallel and Distributed Computing 21, (June 1994), 301-315.

[19] P. T. Homer and R. D. Schlichting. Using Schooner to support distribution and heterogeneity in the Numerical Propulsion System Simulation project. Concurrency: Practice and Experience 6, 4 (June 1994) 271-287.

[20] P. T. Homer. Constructing Scientific Applications from Heterogeneous Resources. Ph.D. Dissertation, Technical Report 94-33, Department of Computer Science, University of Arizona, December 1994.

[21] Y. Huang and C. V. Ravishankar. Designing an agent synthesis system for cross-RPC communication. IEEE Transactions on Software Engineering 20, 3 (March 1994), 188-198.

[22] M. B. Jones, R. F. Rashid and M. R. Thompson. Matchmaker: An interface specification language for distributed processing. Proceedings of the 12th Symposium on Principles of Programming Languages, New Orleans, LA (January 1985), 225-235.

[23] A. A. Khokhar, V. K. Prasanna, M. E. Shaaban and C. Wang. Heterogeneous computing: Challenges and opportunities. IEEE Computer 26, 6 (June 1993), 18-27.

[24] Message Passing Interface Forum. Document for a Standard Message-Passing Interface. March 22, 1994.

[25] J. A. Reed. Development of an interactive graphical aircraft propulsion system simulator. Master of Science Thesis, University of Toledo, August 1993.

[26] S. L. Scott and J. Potter. A framework for the virtual heterogeneous associative machine. Proceedings: Workshop on Heterogeneous Processing `94, Cancun, Mexico (April 1994).

[27] A. Skjellum. Scalable libraries in a heterogeneous environment. Proceedings of the 2nd International Symposium on High-Performance Distributed Computing, Spokane, WA (July 1993), 13-20.

[28] Sun Microsystems, Inc. Network Programming Guide (Revision A). Part number 800-3850-10. Sun Microsystems, Inc., Mountain View, CA, March 1990.

[29] V. S. Sunderam. PVM: A framework for parallel distributed computing. Concurrency: Practice and Experience 2, 4 (December 1990), 315-339.

[30] Transportable Applications Environment Plus. Programmer's Manual, Version 5.2. Goddard Space Flight Center, National Aeronautics and Space Administration. December 1992.

[31] M. Wang, S. Kim, M. A. Nichols, R. F. Freund, H. J. Siegel, and W. G. Nation. Augmenting the optimal selection theory for superconcurrency. Proceedings of the 1st Workshop on Heterogeneous Processing, Beverly Hills, CA (March 1992), 13-21.

[32] C. C. Weems, Jr. Image understanding: A driving application for research in heterogeneous parallel processing. Proceedings of the 2nd Workshop on Heterogeneous Processing, Newport Beach, CA (April 1993), 119-126.

[33] Xerox Corp. Courier: The Remote Procedure Call Protocol. Xerox System Integration Standard XSIS 038112, Xerox Corp., Stamford CT, December 1981.

1. - Introduction
2. - Interconnection Systems
3. - The Schooner Interconnection System
3.1 - UTS
3.2 - Stub Compilers
3.3 - Runtime System
4. - Engine Simulation and Monitoring
5. - Other Approaches to Heterogeneity
6. - Conclusion
Acknowledgments
References
Generated with CERN WebMaker