rocWMMA: rocWMMA (rocWMMA) rocWMMA: rocWMMA: rocWMMA is C++ library for accelerating mixed-precision matrix rocWMMA: multiply-accumulate (MMA) operations leveraging AMD GPU hardware. rocWMMA: rocWMMA makes it easier to break down MMA problems into fragments and rocWMMA: distribute block-wise MMA operations in parallel across GPU rocWMMA: wavefronts. The API consists of a header library, that can be used to rocWMMA: compile MMA acceleration directly into GPU kernel device code. This rocWMMA: can benefit from compiler optimization in the generation of kernel rocWMMA: assembly, and does not incur additional overhead costs of linking to rocWMMA: external runtime libraries or having to launch separate kernels.