performance - Why is MATLAB so fast in matrix multiplication? - Stack Overflow
Pro Tip: cuBLAS Strided Batched Matrix Multiply | NVIDIA Technical Blog
Multiplication Kernel - an overview | ScienceDirect Topics
CS-Tech-Era: TILED Matrix Multiplication Using Shared Memory in CUDA
GitHub - jim-rafferty/cuda-matrix-multiply-mex: A mex function to perform matrix multiplication on an nvidia gpu with a potentially huge improvement in performance depending on hardware available. Matlab's parallel computing toolbox is not required.