To avoid having to change the API in an incompatible way,
a arm_mat_mult_opt_q31 was introduced and is providing a faster implementation
to use with Helium (but requiring more storage for intermediate results).
Some improvements to tests for matrix functions added.