Step-by-step row major GEMM optimization tutorial on OpenCL GPU platforms (OpenCL >= 1.2). Tested on Khadas VIM4 (A311D2), i7-12700K, Apple M1, StarFive Vision2.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results