Many applications are inherently set up to generate numerous sub-vector, sub-multivector, or sub-matrix contributions to global linear algebra objects. Examples include finite element methods  and boundary element methods [, ] The contributions of these applications can be viewed as dimensional ``sub-objects''. However, frequently they are also partial sums of the global linear algebra objects, in which case the contribution must be added to existing entries. The most natural approach for parallelizing these applications is to partition the numerous sub-object computations among processors. Such a natural parallelization of the applications' linear algebra object generation phase produces sub-vector, sub-multivector, and sub-matrix partial sums which may or may not be entirely local with respect to the data distribution of the global linear algebra objects.
In order to fill or retrieve entries in a linear algebra object, an application enters the API-active state of PLAPACK. In this state, objects can be opened in a shared-memory like mode, which allows an application to either fill or retrieve individual elements, sub-vectors, or sub-blocks of linear algebra objects. The PLAPACK application interface supports filling global linear algebra objects with sub-object partial sums. If the application generates its sub-vectors, sub-multivectors, or sub-matrices to match the global data distribution (i.e., only generates contributions that exist locally in the global object) then no communication is performed. However, if the sub-object generated on a particular processor does not match the global data distribution, then the PLAPACK application interface transparently communicates all or part of the sub-object between processors as required.
Similarly, the PLAPACK application interface supports querying global linear algebra objects. An application may request, from any processor, any sub-vector, sub-multivector, or sub-matrix of a global linear algebra object. If the ``sub-object'' can be obtained locally then no communication occurs. However, if portions of the requested ``sub-object'' are nonlocal then those portions are retrieved as required.
Managing consistency in a shared-memory parallel programming environment is considerably more difficult than in a message passing environment. Since PLAPACK mixes the two paradigms, some clear rules about what operations can be performed during the API-active state must be stated. While a program is in the API-active state none of the normal PLAPACK routines that require communication should be called for any PLAPACK object. This restriction avoids any possible deadlocks due to mixing PLAPACK's normal synchronous communications and the API-active asynchronous communications. PLAPACK calls that can be safely called in the API-active state are denoted API-safe. In the appendices, API-safe calls are indicated by a .
Our implementation of the calls in this chapter requires only a standard compliant MPI implementation. We use a package that manages asynchronous MPI calls and message handlers called the Managed Message-Passing Interface (MMPI) , developed by Carter Edwards at The University of Texas at Austin.