Let us consider the simple example of implementation
of the
Cholesky factorization, which we will revisit
in Chapter .
Given a symmetric positive definite matrix *A* , the Cholesky
factorization of *A* is given by

where *L* is lower triangular.

The algorithm for implementing this operation can be described by partitioning (blocking) the matrices

where and are scalars, and
and are vectors of length *n*-1 .
The indicates the symmetric part of *A* .
Now,

This in turn yields the equations

We now conclude that the following steps will implement the
Cholesky factorization, overwriting the lower triangular
portion of *A* with *L* :

We make a few observations:

- compute the Cholesky factorization of updated recursively, yielding .

The primary opportunity for parallelism is in the symmetric rank-1 update: .

- The algorithm does not index individual elements of the original matrix explicitly. Notice it references different parts of the
currentmatrix, not the original matrix.- The blocks are simply references into the original matrix.
- There is an inherent use of recursion.

Using the PLAPACK infrastructure, parallel code for the above algorithm is given by

Previewing the rest of the book, all information that describesPLA_Obj_view_all( a, &acur ); while ( TRUE ){ PLA_Obj_global_length( acur, &size ); if ( 0 == size ) break; PLA_Obj_split_4( acur, 1, 1, &a11, &a12, &a21, &acur ); Take_sqrt( a11 ); PLA_Inv_scal( a11, a21 ); PLA_Syr( PLA_LOW_TRIAN, min_one, a21, acur ); }

In order to understand how parallelism is achieved in the above calls, we must first examine how PLAPACK distributes vectors and matrices, in Section 1.3. After this, we show how operations like the symmetric rank-1 update can be naturally parallelized, in Sections 1.5 and 1.6.