CSAR Logo

Computer Science

Overview

Research Goals

Accomplishments FY00

Accomplishments FY99

Accomplishments FY98

Publications

IInterpolation.gif (6076 bytes)

Interpolation between disparate meshes

Overview

Research in Computer Science is addressed by two teams, Computational Mathematics and Geometry, and Computational Environments. Work in computational mathematics and geometry focuses on the areas of linear solvers, mesh generation and refinement, and interpolation between diverse discretizations. The target of our computational environments program is to provide the broad computational infrastructure necessary to support complex, large-scale simulations in general, and for rocket simulation in particular. Areas of research include parallel programming environments, compiler optimization and parallelization, parallel performance analysis and tools, and parallel input/output.

Research Goals

Programming Environments

Work in programming environments will support the integration of existing component simulation codes and will facilitate new implementations. In the final environment, it will be possible to program different modules of the simulation system in different languages and paradigms. This is necessary to accommodate existing codes, which are written in various languages, and to allow each team to choose the most appropriate programming style for new developments. Other major concerns include portability, very high performance, and ease of debugging and maintenance. The environment will include several new tools and will also draw on a long tradition of research and development on languages and compilers for parallel machines at UIUC, at the DOE labs, and elsewhere.

Program Level

Prog_Model.gif (69667 bytes)

The programming environment, depicted above, will be built on top of a runtime infrastructure library that contains routines for thread management, scheduling, memory management at different levels of the memory hierarchy, synchronization, and message passing. This framework will support concurrent coexistence and runtime support for different languages and for interoperability among them. It will provide a machine model that consists of a collection of nodes, each with a variable number of Shared-Memory Processors (SMPs). This will subsume the ASCI as well as other current and future machines, including clusters of (SMP) workstations.

Although this infrastructure library will expose all the necessary features for effective parallel programming, it is expected that most programmers will prefer higher level interfaces in the form of libraries, languages, or a combination of these. The libraries, which will be built on top of the basic runtime infrastructure, will implement shared virtual memory systems, active messages, and message standards such as MPI-2, PVM, and their multi-threaded variants. Programmers will have direct access to this library from any conventional language such as Fortran77, Fortran 90, C, or C++. There will also be parallelizing compilers that will generate parallel codes from sequential Fortran 77 and MATLAB programs. Charm++ [50], a C++ based object-parallel language that supports task parallelism, and pC++ (supporting data parallelism) have already been implemented within this framework. Versions of the basic runtime infrastructure and of the parallelizing compilers will be developed for each target ASCI machine.

Some of the work will be based on the Converse runtime system [51], which has been used extensively at UIUC to implement several languages and applications such as large molecular dynamics applications [52]. Two parallelizing compilers, Polaris [53] and Falcon [54], will serve as the basis for the Fortran77 and MATLAB translators. Both are among the most effective such systems available today.

We also will make use of object oriented libraries developed elsewhere, such as those for automatic mesh refinement, including AMR++, RIEMANN++, and Overture [25,26]. These libraries are based on P++ [55], a forerunner of the emerging HPC++ standard, so that we will naturally arrive at HPC++ code. Using these libraries would strengthen our collaborations with LANL, which is working on similar software approaches for combustion problems in other contexts.

System Level

Program components (existing or new) representing models of subsystems developed by different teams must be integrated into a parallel system. These components may overlap in the same set of processors, or may be assigned to different processor partitions and may overlap in time. To facilitate such an integration, an orchestration language, a generalization of scripting languages, will be developed (see figure above). Because an important part of this interaction will consist of transferring geometric objects between components, the scripting language will include powerful mechanisms to specify and manipulate such objects. The geometric objects will be implemented as distributed data-structure libraries that will be accessible from across languages and components, and will be capable of transformations of representations demanded by different components. For example, a spatial decomposition data-structure may organize entities with coordinates into cubes, and will be capable of changing the representation to cubes of different sizes with different parallel distributions. The orchestration language will support global synchronization and global data exchange across parallel components. Execution ordering, concurrency, and processor partitioning between components will be specified at this level. Future versions of this language will include features to control execution of the system on heterogeneous computer networks.

Debugging

To facilitate debugging and performance tuning, the programming environment will include mechanisms to display, compare, and modify information at various levels of abstraction. A facility to capture and replay a parallel execution will be developed to allow debugging of problems that occur nondeterministically only in certain execution sequences. The runtime library will also enable language-dependent trace data capture to support debugging and performance visualization tools described below. Furthermore, tools to identify nondeterminacy in parallel codes [56] will be developed. Nondeterminacy is usually caused by missing synchronization operations, and its source can be quite difficult to identify without such tools.

Scalable Performance Analysis

To identify and eliminate performance bottlenecks in complex applications, we have developed portable performance analysis tools for scalable parallel systems. The resulting Pablo environment [57,58] supports instrumentation, analysis, and performance display for explicit message passing codes using both MPI and vendor message passing libraries. Additional tools support analysis of input/output bottlenecks and data parallel HPF codes. Building on Pablo’s support for all three ASCI computation platforms, we describe three ASCI-centered efforts to extend and apply the Pablo toolkit to multidisciplinary computational simulation.

High-Performance Fortran (HPF)

The Pablo performance toolkit supports analysis of data parallel code via integration with the Portland Group’s HPF compiler [59]. This integrated tool set can relate dynamic performance data from the compiler-generated code to the original HPF data parallel code. We intend to extend the Pablo software’s performance analysis capabilities by targeting (1) performance scalability prediction and (2) mixed mode programming that involves use of HPF, MPI message passing, and functional decompositions. The first of these will rely on a combination of compile-time data on program transformations and dynamic performance data to predict the location and movement of bottlenecks as the problem size and number of processors change. The second effort reflects the reality of complex computational simulations like rocket design and analysis, which will require a diverse set of application codes and programming models.

Real-time Monitoring and Adaptive Control

Multidisciplinary simulations like rocket modeling are irregular, with complex, data dependent execution behavior, and dynamic, time-varying resource demands. Leveraging our earlier experiences with adaptive resource control [60,61], we intend to create an adaptive application performance monitoring and control infrastructure that will couple low-overhead performance sensors, fuzzy logic decision mechanisms for robust resource policy management, and resource actuators for policy control. This infrastructure will support both adaptive I/O policies that respond to changing I/O access patterns and dynamic redistribution of work across processors.

I/O Characterization

For the past four years, we have led the Scalable I/O (SIO) Initiative’s effort to analyze the application and physical I/O patterns in a suite of I/O intensive applications [62,63,58]. Based on the insights gained from these and other I/O studies, the SIO Initiative, the MPI Forum, and our UIUC collaborators have defined new, high-level application programming interfaces (APIs) for parallel I/O. Finally, the ASCI effort is internally developing high-level libraries (e.g., Silo and Exodus) for mesh I/O and visualization. We will build on our ongoing collaborations with the SIO and ASCI communities to develop I/O instrumentation for the MPI-IO, SIO, and ASCI APIs and to analyze the performance implications of these I/O APIs for our suite of rocket simulation codes.

High-Performance Data Management

The unprecedented scale of integrated rocket simulation on multiple teraflop platforms raises new research issues in I/O and data management. First, no parallel I/O library or parallel file system has been deployed at such a large scale or with such diverse application I/O requirements. Second, our I/O characterization experiences [67,68] highlighted the sensitivity of current parallel file systems to access patterns and hardware configurations. This suggests that high I/O performance will be possible only by dynamically choosing and tuning I/O policies based on access patterns and resource availability. We describe two such efforts that will build on our successful Panda [64,65] and PPFS [66] I/O libraries for high-performance parallel I/O.

Array Access and Data Migration

To support high-performance parallel I/O for large, multidimensional arrays, the Panda project has developed a parallel I/O library for SPMD applications using HPF-style or AMR-style data distributions [67-69]. Via the Panda I/O interface one can declare groups of arrays, describe their data distribution in memory and on disk, and then request collective I/O of the group in a single command. Panda automatically selects an execution strategy for a high-level I/O request using knowledge of the available system resources and an analytic performance model of execution strategies.

Previous studies of Panda’s I/O data migration strategies showed that the migration strategy must be tightly integrated with the hardware platform’s I/O capabilities. We will tune the Panda software for ASCI platforms using measurements of I/O behavior in our rocket simulation code suite. We will also develop and evaluate internal Panda strategies for coordinating high-performance I/O and migration across multiple, simultaneously executing applications. In a second effort, we will explore support for irregular data distributions and I/O load balance by generalizing Panda’s current support for adaptive mesh refinement (AMR). This work will create interfaces for multiple AMR representations and both static and dynamic I/O load balancing techniques within the Panda library.

Adaptive File Policies

Earlier we argued that adaptive control is key to maximizing parallel I/O performance for complex applications. We plan to explore two approaches to adaptive I/O optimization: (1) an extension of the PPFS [66,70] user-level library that supports the MPI-IO and and SIO parallel I/O APIs, and (2) an extension of Panda to support automatic I/O strategy selection on the ASCI platforms and under the complex I/O demands generated by rocket simulation codes. The first of these libraries will include automatic access pattern classification software based on trained neural nets and hidden Markov models [71] and fuzzy logic controllers that employ real-time I/O performance measurements and access pattern classifications to choose and configure I/O policies [61]. The second library will expand Panda’s automatic strategy selection by including analytic performance models for the new ASCI platforms and adding support for strategy selection under a more complex workload. In addition, we will extend Panda’s strategy selection and performance model to include support for tertiary storage management.

Visualization

Visualization capabilities will be incorporated to provide qualitative and quantitative data analysis for the whole-system simulation as well as subscale components. Display techniques for effective presentation of data with widely varying spatial and temporal scales will be developed. Where feasible, visualization will be coupled with the simulation, providing real-time monitoring and potential steering of the evolving simulation. Visualization will be integrated with the data management system, allowing for visual comparison of results among various parameter runs, between normal and abnormal burns, and between simulation data and experimental data.

Computational Mathematics

Our 3-D simulations will lead to systems of equations with millions of unknowns. There is substantial research expertise at UIUC in all major approaches to solving large linear systems—including direct, iterative, and multigrid methods—which will be customized and applied to the specific systems arising in the various facets of SRB simulation.

Iterative Methods

Efficient iterative solution of linear systems generally requires an effective preconditioner. Incomplete factorization preconditioners are often effective but are difficult to implement efficiently in parallel. More recently developed approximate inverse preconditioners are more promising for parallel implementation because they require only matrix-vector multiplication rather than solution of triangular systems. We have developed an approximate inverse preconditioner in which the approximate inverse is obtained by a novel method that exploits low-rank structure in block submatrices. We intend to implement such approximate inverse preconditioners for the systems arising in SRB simulation and investigate potential instabilities (see, e.g.,[72]).

Most iterative methods developed in recent years, such as BiCG, GMRES, and QMR, are variants of the conjugate gradient/Lanczos algorithm. Disadvantages of this approach for nonsymmetric systems include the possibility of breakdown under some conditions and the need for the transpose of both the system matrix and preconditioning matrix, which is usually inconvenient on parallel machines. The older Chebyshev method requires fewer inner products than other more recent methods, is not subject to breakdown, and does not require matrix transposes. This method has been implemented in Chebycode [73], for which experimental results on a large application problem are given in [74]. As part of an active collaboration with Steve Ashby of LLNL on preconditioning, we plan to extend this work to systems arising in SRB simulation.

Multigrid Methods

Multigrid methods are among the most efficient solvers available for discretizations of linear or nonlinear partial differential equations. Multigrid methods are inherently multiscale, employing grids of varying granularity, and thus are a natural approach to multiscale problems such as SRB simulation. Multigrid methods are built out of components such as smoothers, grid transfer operators (restriction, interpolation), and coarse grid solvers. We plan to tune each of these components for maximum efficiency on the specific systems arising in our simulations. Multigrid has often been viewed as complicated to implement, particularly in the data structures involved, but the software architecture developed as part of this project will enable convenient handling of grids of varying scale. An example of our experience with multigrid in large scale applications is a Newton-multigrid code developed for simulating protein electrostatics [75].

Direct Methods

Direct methods may require substantially more storage and work than iterative methods for very large linear systems, but direct methods have a decided edge in robustness and reliability, for example in solving highly ill-conditioned systems for which convergence can be difficult to attain with iterative methods. We have developed a fully parallel sparse direct solver [76], which we can adapt specifically to the architectures and problems arising in this project. We plan to extend this work to the computation of appropriate preconditioners for iterative methods via incomplete factorization, and to improve the performance of such preconditioners by exploiting new approaches to parallel sparse triangular solutions [77].

Multivariate Interpolation

Integrated simulation of an SRB will require transmission of data across interfaces between models that may have different discretizations, resolutions, and geometric representations. Thus, multivariate interpolation will be required to enable compatible communication across such interfaces. We will adapt standard algorithms for interpolation of both regular and irregular data to the specific needs of such model interfaces.

Computational Geometry

We plan to develop a uniform geometric data representation that will serve all or most functions using geometric information. The data structure is similar to but differs from traditional engineering solutions to similar problems. The data structure combines the following four important features: (1) it can represent arbitrary and occasionally very complicated shapes; (2) it meshes spaces and shapes with good local quality needed for finite element methods; (3) it adapts to existing and evolving density distributions; and (4) it represents the geometry in a hierarchical fashion.

We believe that a clean and satisfying solution to representing geometry can be based on tetrahedral and hexahedral grids. While satisfactory theoretical results and a suite of experimental software are available, it will take some time to adapt other projects to this technology. We plan a two-phase project management. In the first phase continuous simulations will be based on more conventional data structures, such as regular hexahedral grids and hierarchies of such. At the same time, the implementation of our new grid technology will proceed with full speed. In the second phase the simulations will be converted one by one to the new technology. The simulations involving complicated geometry are expected to benefit most from the new grid data structure and these projects will be converted first.

The tetrahedral grid technology is based on Delaunay triangulations [78]. These grids have become increasingly popular, mostly because they allow fast and general algorithms for any type of geometric shape. The following is a list of key questions that have partially been solved and partially need to be addressed within this project.

Shape representation: With the work documented in [79], we can now reconstruct and mesh any geometric shape as a subcomplex of the Delaunay complex of a point sample.

Adaptivity and Quality: Good quality tetrahedral elements can be guaranteed through a dynamic process that adds and adjusts grid nodes [80]. This process solves a difficult problem in tetrahedral grid generation mentioned, for example, in [81].

Hexahedral Meshes: Finite element methods often prefer hexahedral over tetrahedral elements. We plan research on methods that construct hexahedral elements by combining and subdividing tetrahedra in Delaunay triangulations.

Mesh Interfaces: This project combines simulations on neighboring grids with widely varying temporal and spatial resolutions. The grid data structure will be designed to allow clean interface solutions.

Hierarchy: The size of most grids in this project will be extremely large, so that hierarchical access is needed to achieve reasonable response times. We will work with and extend ideas developed in [82] to achieve these goals.

References

[50] L. V. Kale and S. Krishnan. CHARM ++: A portable concurrent object oriented system based on C++. In A. Paepcke, editor, Proc. OOPSLA ‘93, pages 91-108. ACM Press, 1993.

[51] L. V. Kale, M. Bhandarkar, N. Jagathesan, S. Krishnan, and J. Yelon. Converse: An interoperable framework for parallel programming. In Proc. 10th Internat. Parallel Processing Symp., pages 212-217, Apr 1996.

[52] M. Nelson, W. Humphrey, A. Gursoy, A. Dalke, L. Kale, R. Skeel, and K. Schulten. NAMD: A parallel, object-oriented molecular dynamics program. Internat. J. Supercomput. Appl. High Perf. Comput., 10(4), 1996.

[53] W. Blume, R. Doallo, R. Eigenmann, J. Grout, J. Hoeflinger, T. Lawrence, J. Lee, D. Padua, Y. Paek, W. Pottenger, L. Rauchwerger, and P. Tu. Parallel programming with Polaris. IEEE Computer, 29(12):7882, Dec 1996.

[54] L. DeRose and D. Padua. Inference mechanisms for MATLAB programs. In Proc. 10th Internat. Conf. Supercomput., Philadelphia, May 1996.

[55] M. Lemke and D. Quinlan. P++, a C++ virtual shared grids based programming environment for architecture-independent development of structured grid applications. In CONPAR/VAPP V, Lyon, France, Sep 1992. Springer-Verlag.

[56] P. Emrath, S. Ghosh, and D. Padua. Detecting nondeterminacy in parallel programs. IEEE Software, Jan 1992.

[57] D. A. Reed. Experimental performance analysis of parallel systems: Techniques and open problems. In Proceedings of the 7th International Conference on Modeling Techniques and Tools for Computer Performance Evaluation, pages 25-51, May 1994.

[58] D. A. Reed, C. L. Elford, T. Madhyastha, W. H. Scullin, R. A. Aydt, and E. Smirni. I/O, performance analysis, and performance data immersion. In Proceedings of MASCOTS ‘96, pages 1-12, February 1996.

[59] V. S. Adve, J. Mellor-Crummey, M. Anderson, K. Kennedy, J.-C. Wang, and D. A. Reed. An integrated compilation and performance analysis environment for data parallel programs. In Proceedings of Supercomputing ‘95, December 1995.

[60] T. M. Madhyastha, C. L. Elford, and D. A. Reed. Optimizing input/output using adaptive file system policies. In Proceedings of the Fifth Goddard Conference on Mass Storage Systems and Technologies, September 1996.

[61] D. A. Reed, C. L. Elford, T. Madhyastha, E. Smirni, and S. L. Lamm. The next frontier: Interactive and closed loop performance steering. In Proceedings of the 1996 International Conference on Parallel Processing Workshop, pages 20-31, August 1996.

[62] H. Korab and M. D. Brown. Virtual environments and distributed computing at SC ‘95: GII testbed and HPC challenge applications on the I-WAY, December 1995.

[63] E. Smirni and D. A. Reed. I/O requirements of scientific applications: An evolutionary view. In Proceedings of the Fifth IEEE International Symposium on High-Performance Distributed Computing, pages 49-59, August 1996.

[64] K. E. Seamons and M. Winslett. Multidimensional array I/O in Panda 1.0. The Journal of Supercomputing, 10(2):191-211, 1996.

[65] Y. Chen, M. Winslett, S. Kuo, Y. Cho, M. Subramaniam, and K. Seamons. Performance modeling for the Panda array I/O library. In Proceedings of Supercomputing ’96, November 1996.

[66] J. V. Huber, C. L. Elford, D. A. Reed, A. A. Chien, and D. S. Blumenthal. PPFS: A high performance portable parallel file system. In Proceedings of the 9th ACM International Conference on Supercomputing, July 1995.

[67] K. E. Seamons, Y. Chen, M. Winslett, Y. Cho, S. Kuo, P. Jones, J. Jozwiak, and M. Subramaniam. Fast and easy I/O for arrays in large-scale applications. In Seventh IEEE Symposium on Parallel and Distributed Systems, Workshop on Modeling and Specification of I/O, October 1995.

[68] K. Seamons and M. Winslett. Physical schemas for large multidimensional arrays in scientific computing applications. In Supercomputing ‘94, pages 650-659, December 1994.

[69] K. Seamons and M. Winslett. An efficient abstract interface for multidimensional array I/O. In Proceedings of the 7th IEEE International Conference on Scientific and Statistical Database Management, pages 218-227, September 1994.

[70] T. M. Madhyastha and D. A. Reed. Intelligent, adaptive file system policy selection. In Proceedings of Frontiers ‘96, October 1996.

[71] T. M. Madhyastha and D. A. Reed. Input/output access pattern classification using hidden Markov models. In preparation, 1997.

[72] P. Concus and P. Saylor. A modified direct preconditioner for indefinite symmetric Toeplitz systems. Numer. Linear Algebra Appl., 2(5):415-430, 1995.

[73] S. F. Ashby. Chebycode: A Fortran implementation of Manteuffel’s adaptive Chebyshev algorithm. Technical Report 1203, Dept. of Computer Science, University of Illinois, Urbana, IL, May 1985.

[74] S. F. Ashby, S. L. Lee, L. R. Petzold, P. E. Saylor, and E. Seidel. Computing space-time curvature via differential-algebraic equations. Appl. Num. Math, 20:221-234, 1996.

[75] M. Holst, R. Kozack, F. Saied, and S. Subramaniam. Treatment of electrostatic effects in proteins: Multigrid-based Newton iterative method for solution of the full nonlinear Poisson-Boltzmann equation. Proteins: Structure, Function, and Genetics, 18(3):231-245, 1994.

[76] M. T. Heath and P. Raghavan. Performance of a fully parallel sparse solver. Internat. J. Supercomput. Appl. High Perf. Comput., 11(1):46-61, 1997.

[77] M. T. Heath and P. Raghavan. Performance of parallel sparse triangular solution. In Proc. IMA Workshop on Algorithms for Parallel Processing. Springer-Verlag, 1997.

[78] B. Delaunay. Sur la sphere vide. Izv. Akad. Nauk SSSR, Otdelenie Mat. Estest. Nauk, 7:793-800, 1934.

[79] H. Edelsbrunner. Surface reconstruction by wrapping finite sets in space. Technical Report rgi-math-96-001, Raindrop Geomagic, Champaign, IL, 1996.

[80] H. Edelsbrunner and M. Facello. Three-dimensional mesh improvement. Technical Report rgi-math-97-005, Raindrop Geomagic, Champaign, IL, 1997.

[81] D. A. Field J. C. Cavendish and W. H. Frey. An approach to automatic three-dimensional finite element mesh generation. Internat. J. Numer. Methods Engrg., 21:329-347, 1985.

[82] R. Waupotitsch. Simplifying and deforming through hierarchies of simplicial grids. Technical Report UIUCDCS-R-96-1959, Dept. Comput. Sci., Univ. Illinois, Urbana, IL, 1996.