For this operation to be well-defined, we require matrices,
*A* , *B* , and *C* to have dimensions
, , and ,
respectively.

If we partition

then

for . Thus, a matrix-matrix multiply can be written as a sequence of matrix-vector multiplies

We can implement the above sequence of matrix-vector multiplies
by using views, splitting off one column and row at a time:
Let *C* and *B* be partitioned like

where and equal the first column of *C* and
first row of *B* .
Then

So the computation of *C* can be viewed
matrix-vector multiply of *A* times the first row of *B* ,
after which the operation can
be performed, using the same approach, recursively.

This leads to the following algorithm:

Again, the `` '' symbol is used to indicate assignment, while ``='' indicates a reference or partitioning.

- Scale
- Let and
- Do until done

- If and are empty, the loop is done.
- Partition

- Compute
- Let and

In order to get better performance,
it is advantageous to set up the algorithm
to perform a sequence of matrix-panel-of-vector
operations:
Let *C* and *B* be partitioned like

where and now equal a panel of *s* columns.
Then

So the computation of *C* can be viewed
as a matrix-times-panel-of-vectors multiply
( *A* times the first panel of rows of *B* ),
followed by a recursive operation
using the same approach.

This leads to the following algorithm:

Again, the `` '' symbol is used to indicate assignment, while ``='' indicates a reference or partitioning. The resulting code is given in Fig. . We will often refer to this approach to parallel matrix-matrix multiplication as a

- Scale
- Let and
- Do until done

- Determine width
sof split. If and are empty, the loop is done.- Partition

and

- Compute
- Let and

PLACE BEGIN HR HERE

PLACE END HR HERE

For large *n* , the above algorithm can be improved by
introducing pipelining.
This can be accomplished by
setting the natural flow of computation to be to the
right or left, for communication within rows,
or to the top or bottom, for communication within columns.