The ARMv7 CPU (Cortex-A8) used in the iPhone 3GS is a very nice CPU. One of the things it can do is real SIMD intrinsics. Although Apple don’t document this, the fine folks who made GCC do.
Here’s how to demonstrate this on Xcode.
0. Create a project.
1. Set Target->Architecture to “Optimized (armv6 armv7)”
This builds a fat binary with two executables – one for the older arm architecture and one for the Cortex/NEON architecture.
2. Set Other C Flags to “-mfloat-abi=softfp -mfpu=neon”.
As specified in the “arm_neon.h” header. I’m guessing that these are ignored for the armv6 binary.
3. Include preprocessor guards in the source to make sure the intrinsics are only compiled in for armv7. See the following snippet:
#ifdef _ARM_ARCH_7
#include <arm_neon.h>
float32x4_t scale( float32x4_t v, float f )
{
return vmulq_n_f32( v, f );
}
#endif
Note: At the moment I can only make this code compile under C, there seems to be an internal GCC issue when compiling this code as C++
4. Choose “Build->Show Assembly Code”. You should see a “vmul.f32 x, y, z” assembly instruction buried amongst the stack maintenance code.
Good news for mobile gamers. NEON, together with proper shaders, should help the next wave of iPhone games leapfrog the quality of other handhelds.
Hi,
Great post about NEON! Thanks! Have you gotten it to work in C++ at all yet? I’m having the same issue.
-Jay
Nope. sorry. I logged a bug with Apple (#7022698).
A possible workaround is to try SDK 3.1 – I’m waiting for it to come out of beta, and in the mean time using asm directly.
I’ll be sure to post here when I find out!
Thanks for this Justin. You wouldn’t happen to have any pointers to using the intrinsics in arm_neon.h would you?
I’m interested in doing a bit of interactive mesh distortions on the iPhone and NEON looks like it can deliver the real-time speed I need. I’m a bit overwhelmed by the reams of intrinsic code in that header file at the moment, though.
Cheers,
Doug
@dugla
It looks to be due to C++’s stricter type safety. If you cast that float to a float32_t (which is defined the arm_neon.h header), then it works in g++.
Jesse – changing float->float32_t didn’t work for me.
Doug – I’ll post a list of resources soon.