matrix-multiplication - algorithm for matrix matrix multiplication of size o(100)

Question

While I realize this is a niche question, I am wondering if anyone knows of an algorithm for matrix matrix multiplication, that would be really great(meaning use a lot of flops of the cpu or possibly gpu) at matrices of sizes between 100x100 to 500x500?

While I know xgemm and xgemm3m are nice, unfortunately they get the big flops for matrices bigger than 1000x1000.

thanks for the help :)

score 1 · Accepted Answer

不是答案，但评论太长了。

我认为您从英特尔数据中得出了错误的结论。你似乎在想

啊哈，对于大矩阵，dgemm 可以以 300GFLOP/s 的速度拉动，但对于小矩阵，只能以悲惨的 100GFLOP/s 速度进行 - 以 300GFLOP/s 乘以小矩阵的方法在哪里？

我认为沿着这些思路

Ah-ha dgemm 在大型阵列上效率最高；嗯，我想知道调用它是否有固定成本，这在较小的工作规模上表现出相对较差的性能。我希望如果这些小矩阵有更快的算法，英特尔的聪明人会实现它们并使 dgemm 足够聪明，可以为任何给定的问题规模选择正确的内部代码路径。毕竟，密集矩阵乘法是 LINPACK 的关键部分，尽管它有很多缺点，但它通常用于对高性能计算机进行基准测试，英特尔非常有动力通过使用这些基准来证明其机器的卓越性。

现在我并不是说你不像英特尔的人那么聪明，我的思路可能有缺陷，但我告诉你，你将很难编写或获得一个优于dgemm你的代码的代码英特尔硬件上的小矩阵。我期待看到我在这方面错了的证据。

matrix-multiplication - algorithm for matrix matrix multiplication of size o(100)

1 回答 1

Related

Reference