Figure: benchmark times relative to C (smaller is better, C performance = 1.0).
C and Fortran compiled with gcc 5.1.1. C timing is the best timing from all optimization levels (-O0 through -O3). C, Fortran and Julia use OpenBLAS v0.2.14. The Python implementations of rand_mat_stat and rand_mat_mul use NumPy (v1.9.2) functions; the rest are pure Python implementations. Plot created with Gadfly and IJulia from this notebook.