Parallel implementation of most dense linear algebra operations is a relatively well understood process. Nonetheless, availability of general purpose, high performance parallel dense linear algebra libraries is severely hampered by the fact that translating the sequential algorithms to a parallel code requires careful manipulation of indices and parameters describing the data, its distribution to processors, and/or the communication required. It is this manipulation of indices that is highly error-prone, leading to bugs in parallel code. This in turn stands in the way of the parallel implementation of more sophisticated algorithms, since the coding effort simply becomes overwhelming.

The Parallel Linear Algebra Package (PLAPACK) infrastructure attempts to overcome this complexity by providing a coding interface that mirrors the natural description of sequential dense linear algebra algorithms. These descriptions typically do not require more than half a chalk board to explain. To achieve this, we have adopted an ``object based'' approach to programming. This object based approach has already been popularized for high performance parallel computing by libraries like the Toolbox being developed at Mississippi State University [], the PETSc library at Argonne National Laboratory [], and the Message-Passing Interface [, ].

In order to parallelize a linear algebra algorithm, we must
start by partitioning and distributing the vectors and matrices
involved.
Traditionally, parallel dense linear algebra algorithms and libraries start by
partitioning and distributing the matrix, with the distribution
of vectors being an afterthought.
This seems to make sense: A dense matrix typically requires memory
and computation involving such matrices typically
requires operations.
Vectors on the other hand require only *O*(*n*) memory
and computation involving vectors tends
to only require *O*(*n*) or operations.
It is this partitioning and distribution of matrices that then dictates the
interface between an application and a parallel linear algebra library.

While this appears to be convenient for the library,
this approach creates an inherent conflict between the needs
of the application and the library.
It is the *vectors* in linear systems that naturally
dictate the partitioning and distribution of work associated
with
(most) applications that lead to linear systems [].
Notice that in a typical application, the linear system is
created to compute values for degrees of freedom, which
have some spatial significance. In finite element or boundary
element methods, we solve for force, stress, or displacement
at points in space. For the application, it is thus more
natural to partition the domain of interest into sub domains,
like domain decomposition methods do, and assign those
sub domains to
nodes (processors). This is equivalent to partitioning
the *vectors* and assigning the sub-vectors to nodes.

The PLAPACK infrastructure uses a data distribution
that
starts by partitioning the vectors associated with
the linear algebra problem and assigning the sub-vectors to nodes.
The matrix distribution is then *induced* by the
distribution of these vectors.
This approach was chosen in
an attempt to create more reasonable
interfaces between applications and libraries.
However, the
surprising discovery has been that this approach greatly
*simplifies* the implementation of the infrastructure,
allowing much more generality (in future extensions
of the infrastructure) while simultaneously
reducing the amount of code required when compared
to previous generation parallel dense linear algebra libraries
[].