Project Investigation Part 2: Can I even optimize this?

At this point I am wondering if there is even any optimization that I can do. What I want to do is get the compiler to tell me what it is doing. So that I can focus on areas that the compiler is not optimizing. In order to do this I need to learn the build system. Which in this case is cmake.

Learning cmake

After a bit of reading, I now understand the basics of how cmake works to generate a Makefile. I have included some helpful references at the end.

I found the CMAKE_CXX_FLAGS option which allows me to set compiler flags.


cmake ../ -DCMAKE_CXX_FLAGS="-O2 -DNDEBUG -rdynamic -ftree-vectorizer-verbose=2"

I could also add this line to CMakeLists.txt

set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -ftree-vectorizer-verbose=2 ")

I can see the output with
make VERBOSE=1
However I do not see any output from the -ftree-vectorizer-verbose option.

At this point I am not sure how to get -ftree-vectorizer-verbose output. I have also tried compiling this with gcc commands I got from the make VERBOSE=1 output. This also has given me no results.

Testing O3

Testing O3 optimizations against the default optimization options.

Default:

O3:

These results are very similar, however in all my runs of these tests O3 was slightly slower than the default flags -O2 -DNDEBUG -rdynamic.

I am going to try a few more approaches to optimizes this, however I am not sure I can make any improvements to this code.

CMake Resources:
https://cmake.org/cmake-tutorial/
https://learnxinyminutes.com/docs/cmake/
https://www.aosabook.org/en/cmake.html