In order to take advantage of the higher performance of sequential level-3 BLAS, we rewrite the above algorithm in terms of matrix-matrix operations. Partition

where and are sub-matrices. Now,

This in turn yields the equations

We thus conclude that the following steps will implement the
Cholesky factorization, overwriting the lower triangular
portion of *A* with *L* :

Notice that to compute , a level-2 implementation of the Cholesky factorization can be used. The bulk of the computation is in the operation , which now uses a matrix-matrix operation, thereby achieving better performance.

- , where Cholesky factor overwrites the lower triangular part of .
- compute the Cholesky factorization of updated recursively, yielding .

The PLAPACK code for this level-3 BLAS based implementation is given
in
Figure 8.2.
Notice that the block size *b* must be determined. This block size
should equal the width of that makes the
the symmetric rank-k update:
most efficient.
The call

PLA_Environ_nb_alg( PLA_OP_SYM_PAN_PAN, template, &nb_alg );

PLA_Temp_get_comm_dir( template, PLA_DIR_TEMP_ROW, &old_dir_row );

PLA_Temp_get_comm_dir( template, PLA_DIR_TEMP_COL, &old_dir_col );

PLA_Temp_set_comm_dir( template, PLA_DIR_TEMP_ROW, PLA_DIR_RIGHT );

PLA_Temp_set_comm_dir( template, PLA_DIR_TEMP_COL, PLA_DIR_DOWN );

PLA_Temp_set_comm_dir( template, PLA_DIR_TEMP_ROW, old_dir_row );

PLA_Temp_set_comm_dir( template, PLA_DIR_TEMP_COL, old_dir_col );

`
`