Let us further consider the matrix-matrix multiply.
We claim
for each of the
four cases given in Equations -,
there are actually eight different cases to
be considered. The different cases correspond to
the cases where each of the matrix dimensions *m* , *n* , and
*k* are either large or small. Since there are three
dimensions and two possibilities for each dimension, there
are cases. These different cases are tabulated
in Tables and .
In these tables, the diagrams indicating matrix shapes illustrate
the layout of data in its *non-transposed* format.
Transposition is indicated by the superscript `` ''.
We will discuss general approaches to each of the different
cases in the rest of this section.

Let us reconsider the formation of .
Notice that if we partition a given matrix *X*
by columns and by rows:

where , then *C* can be computed
by any of the following formulae:

which suggests besides the panel-panel variant for forming *C* ,
there are also matrix-panel and panel-matrix variants.
Similarly, there are panel-panel, matrix-panel, and panel-matrix
variants for each of the three other matrix-matrix multiplies
in Equations -.

The choice of variants presented in previous subsections are optimal for this case. The primary issue is that good load balance can be obtained, since all nodes have a reasonable portion of the individual panel-panel, matrix-panel or panel-matrix operation to perform. A secondary issue is that except for the case , all communication can be performed using only broadcasts and reduction-to-one within one or the other dimension of the mesh of nodes. Thus, no packing needs to be performed in the copy and/or reduce. In other words, the individual panels need not be transposed as described in Section 1.4.2.

Notice that in the tables, it is indicated that this
defaults to a rank-1 update when *k* = 1 . From this,
we deduce that if *k* is small, a panel-panel
variant is the appropriate choice,
since in the limit it becomes a rank-1 update.
Notice that for the cases where *A* is transposed
(e.g. and ),
this means that the contributions from *A*
present themselves as row panels, which must be
spread within rows, requiring a transpose of each panel,
as described in Section 1.4.2.
Similarly,
for the cases where *B* is transposed
(e.g. and ),
the contributions from *B*
present themselves as column panels, which must be
spread within columns, also requiring a transpose of each panel,
as described in Section 1.4.2.

Notice that in the tables, it is indicated that this
defaults to a matrix-vector multiply when *n* = 1 . From this,
we deduce that if *n* is small,
a matrix-panel variant is the appropriate choice,
since it defaults to a matrix-vector multiply when *n* = 1 .
For similar reasons as for the last case,
depending of whether *A* and *B* are
to be transposed, the views of matrices *C* and *B*
change, and the reduction and spreading may require a
transpose operation for each panel of *C* and/or *B* .

In the tables, it is indicated that this
defaults to an axpy-like operation when *n* = 1
and *k* = 1 . From this,
we deduce that if *n* and *k* are small, this operation
will be at least as efficient as the appropriate
case of the axpy operation.
One can envision an efficient implementation that
redistributes panels of rows and/or columns as
multivectors and performs axpy-like operations
using all nodes.

This case is similar to the one where *m* is large,
and *n* is small. This time, a panel-matrix variant
appears appropriate.

This case is similar to the one where *m* is large,
and *n* is small, suggesting an axpy-like approach.

This
defaults to an inner product-like operation when *m* = 1
and *n* = 1 . One approach could be to redistribute
panels of rows and/or columns as multivectors, and
performing inner product-like operations using all nodes.

This operation becomes the totally degenerate case of
a simple scalar multiplication when
*m* = *n* = *k* = 1 , *n* = 1 . From this,
we deduce that if *m* , *n* , and *k* are small,
one should consider performing it on a single processor,
i.e., redistributing the matrices as multiscalars.

**Table:** Different possible cases for matrix-matrix multiply, Part I.

**Table:** Different possible cases for matrix-matrix multiply, Part II.