The ARMv7 CPU (Cortex-A8) used in the iPhone 3GS is a very nice CPU. One of the things it can do is real SIMD intrinsics. Although Apple don’t document this, the fine folks who made GCC do.

Here’s how to demonstrate this on Xcode.

0. Create a project.

1. Set Target->Architecture to “Optimized (armv6 armv7)”

This builds a fat binary with two executables – one for the older arm architecture and one for the Cortex/NEON architecture.

2. Set Other C Flags to “-mfloat-abi=softfp -mfpu=neon”.

As specified in the “arm_neon.h” header. I’m guessing that these are ignored for the armv6 binary.

3. Include preprocessor guards in the source to make sure the intrinsics are only compiled in for armv7. See the following snippet:

#ifdef _ARM_ARCH_7
#include <arm_neon.h>
float32x4_t scale( float32x4_t v, float f )
{
  return vmulq_n_f32( v, f );
}
#endif

Note: At the moment I can only make this code compile under C, there seems to be an internal GCC issue when compiling this code as C++

4. Choose “Build->Show Assembly Code”. You should see a “vmul.f32    x,  y, z” assembly instruction buried amongst the stack maintenance code.

Good news for mobile gamers. NEON, together with proper shaders, should help the next wave of iPhone games leapfrog the quality of other handhelds.