Ever spent a tonne of time vectorizing a function call, only to find that playing with the compiler settings does just as good a job? Me too. Here are some things to check out when compiling for performance: (note that this is mostly my own experience on the iPhone)

  • Always use -ffast-math (multiplication and division calls are replaced by function calls to handle NaNs in C99)
  • "Finally, when vectorizing try -ftree-vectorizer-verbose=6, that prints out diagnostics why the compiler was unable to vectorize some loop. Often sprinkling some restrict over your pointers is enough." - Job on Arstechnica
  • Always use -ftree-vectorize (its on with -03, but not with -Os)
  • Trying turning off thumb mode with -marm (Arm processors)