Next: 1.6.2 Attaining better performance Up: 1.6 Implementation of Basic Previous: 1.6 Implementation of Basic

1.6.1 Matrix-matrix multiplication

C = A B : Parallelizing C = A B becomes straight-forward when one observes that

Thus the parallelization of this operation can proceed as a sequence of rank-1 updates, with the vectors y and x equal to the appropriate column and row of matrices A and B , respectively.

: For this case, we note that

This time, the parallelization of the operation can proceed as a sequence of matrix-vector multiplications, with the vectors y and x equal to the appropriate column and row of matrices C and B , respectively.

: Notice that is equivalent to computing , and thus the computation can proceed by computing

The matrix-vector multiplication schemes described earlier can be easily adjusted to accommodate for this special case.

: On the surface, this operation appears quite straight-forward:

Thus the parallelization of this operation can proceed as a sequence of rank-1 updates, with the vectors y and x equal to the appropriate row and column of matrices A and B , respectively. However, notice that the required spreading of vectors is quite different, requiring the spreading of a matrix row within rows of nodes and a matrix column within columns of nodes. It should be noted that without the observations made in Section 1.4.2 about spreading matrix rows and columns by first redistributing like the inducing vector distribution, this operation is by no means trivial when the mesh of nodes is non-square.

Next: 1.6.2 Attaining better performance Up: 1.6 Implementation of Basic Previous: 1.6 Implementation of Basic

rvdg@cs.utexas.edu