I recently submitted some inline assembly versions of Matrix * Matrix, and Matrix * Vector functions for ARM NEON to the Oolong Engine project.

Here is the full source, note the functions assume column-major matrices.