Lab 5 - Algorithm Selection

In this lab we are comparing the runtime of different algorithms to modify the volume of sound samples.

Our codes:

This header file is used to define the size of our samples to make sure it is consistent across runs.
Header

The first implementation loops through all the samples and multiplies them by 0.5.
vol1

While second implementation creates an array will all possible reduced values, and then loops through assigning the corresponding value to the sample.
vol2

The third implementation loops through all the samples and the does integer multiplication and a bit shift to get the correct output.
vol3

This last file is our control file, that has everything but the changing of the samples.
volEmpty

Results

All results are the average of 5 test runs with 100,000,000 samples.

Our first system specs:

And the results:

Control: 0m0.980s
Vol 1 : 0m1.130s
Vol 2 : 0m1.171s
Vol 3 : 0m1.108s

Vol 1 takes 0.050s to run.
Vol 2 takes 0.091s to run.
Vol 3 takes 0.028s to run.

This result is not surprising, Vol 1 is processor heavy, and this system has a powerful processor. Vol 2 is cache/memory heavy and with this number of samples there is a lot of data to hold in cache, our samples take up around 200mb. Vol3 uses integer multiplication and bit shifting, which while processor intensive.

The next tests are on a Cortex-A57 octa-core server, it has 80KiB L1, 1MiB L2, and 8000kb L3 Cache.

Results:

Control: 0m4.925s
Vol 1 : 0m5.165s
Vol 2 : 0m5.391s
Vol 3 : 0m5.128s

Vol 1 takes 0.24s to run.
Vol 2 takes 0.46s to run.
Vol 3 takes 0.20s to run.

These results are the same as the before, with Vol 3 being slightly faster than Vol 1, and Vol 2 being the slowest. This is because on both these systems there is enough cpu power to be faster than the memory access.

Changing Sample Size

Now I wanted to test with a smaller sample size to see if I could push it so that vol 2 was faster than vol 1.

For these next tests I am going to use a Cortex-A53 24 core server. As this is running single threaded on a lower power core, there is more of a chance that the vol 2 approach will be better. I ran the other test here as well and it gave the same results.

I reduced the number of samples to 10,000,000 and ran the tests again.

Control: 0m1.071s
Vol 1 : 0m1.080s
Vol 2 : 0m1.315s
Vol 3 : 0m1.071s

These results are the same as before, where Vol 2 is still the slowest test.

Then I tried increasing the samples to 1,000,000,000. With this case the processing should take a long time, but hopefully the array in vol 2 should be able to sit in cache and speed up this process.

Control:
Vol 1 : 1m49.952s
Vol 2 : 2m10.980s
Vol 3 : 1m48.201s

This also did not work, which leaves me with the conclusion that Vol 2 is the slowest algorithm. As well as, Vol 3 is faster than Vol 1 by a small factor, making vol 3 the fastest algorithm. However there could be hardware where this is different.

All these algorithms are capable of processing fast enough for cd rate, 176400 samples a second, on all of these systems.

No Compiler Optimizations

Now my last test is to run this without compiler optimizations. (Run on the same system and the first tests)

Control: 0m1.659s
Vol 1 : 0m2.057s
Vol 2 : 0m1.936s
Vol 3 : 0m2.105s

These tests are the opposite of what has been seen in every other test. This tells me that the compiler is doing some optimization that reduces the cpu load. And without optimization Vol 2, with less cpu usage but a little more memory usage, is better. Different compilers might also make different optimization choices that might effect the results. Which is why it is important to test algorithms like this, and to be aware of what the compiler will do to your code. As what seems like the most efficient choice might not be after compilation.