At this point I am wondering if there is even any optimization that I can do. What I want to do is get the compiler to tell me what it is doing. So that I can focus on areas that the compiler is not optimizing. In order to do this I need to learn the build system. Which in this case is cmake.
Learning cmake
After a bit of reading, I now understand the basics of how cmake works to generate a Makefile. I have included some helpful references at the end.
I found the CMAKE_CXX_FLAGS option which allows me to set compiler flags.
cmake ../ -DCMAKE_CXX_FLAGS="-O2 -DNDEBUG -rdynamic -ftree-vectorizer-verbose=2"
I could also add this line to CMakeLists.txt
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ftree-vectorizer-verbose=2 ")
I can see the output with
make VERBOSE=1
However I do not see any output from the -ftree-vectorizer-verbose option.
At this point I am not sure how to get -ftree-vectorizer-verbose output. I have also tried compiling this with gcc commands I got from the make VERBOSE=1 output. This also has given me no results.
Testing O3
Testing O3 optimizations against the default optimization options.
These results are very similar, however in all my runs of these tests O3 was slightly slower than the default flags -O2 -DNDEBUG -rdynamic.
I am going to try a few more approaches to optimizes this, however I am not sure I can make any improvements to this code.
CMake Resources:
https://cmake.org/cmake-tutorial/
https://learnxinyminutes.com/docs/cmake/
https://www.aosabook.org/en/cmake.html