Armadillo 3.0.0 has just released on 09-04-2012. In the release file (armadillo-3.0.0.tar.gz) you can find prebuilt version of blas and lapack libraries. These files are necessary because they are the core of Armadillo. But the prebuilt version of blas and lapack only use one core of the machine. So we can use muticore version of blas and lapack to improve performance of Armadillo. There are three version of Blas we can use: MKL Blas of Intel (not free), AMD Blas-ACML (better on AMD machine), and OpenBlas, an updated version of GotoBlas. After downloading the zip file that contains OpenBLAS, I use MinGW to build OpenBLAS on my laptop, which run Windows 7 with Visual Studio 2010 and get the result files: libopenblas_penrynp-r0.1.0.dll, libopenblas_penrynp-r0.1.0.lib. I download the lapack files (lapack.dll and lapack.lib) from the lapack’s Homepage and then I replace blas_win32_MT.dll and blas_win32_MT.dll, lapack_win32_MT.dll, lapack_win32_MT.lib with my files and now Armadillo runs faster because it uses both 2 cores of my machine (of couse, not for all operations).
Important notes: #define ARMA_USE_LAPACK and #define ARMA_USE_BLAS must be enable in the file config.hpp of Armadillo.
If you enable TBB option, then add the code below to the file include\armadillo of Armadillo: