In response to a question on this blog, here are a bunch of ARM/NEON/SIMD resources that I have accrued in my bookmarks over the last few months.

There are essentially three approaches in GCC,  which trade off power/flexibilty for ease of use.

  1. Assembly (standalone, or inline).
  2. Compiler intrinsics, and
  3. “Automatic” compiler vectorization.

The first link explains the differences:

Happy reading, and let me know if you find anything else useful.