Project Investigation: Why this optimization failed

Looking into x86_64 ASM

In the x86_64 assembly I was looking for any particularly large instructions that might be sensitive to alignment. To start I found the section of the binary that the decompress all tags function is in. Also the disassembly contains the byte pattern from the inline assembler.


asm(".byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00");
asm(".byte 0x0f, 0x1f, 0x84, 0x00, 0x00, 0x00, 0x00, 0x00");


14: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)
1b: 00
1c: 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1)

In the disassembly, there are some large move instructions.

3f: 48 b8 ff ff 00 00 ff movabs $0xffffff0000ffff,%rax

However I dont know what specifically might be alignment sensitive in this binary.

Here is the disassembly of the section.

Conclusion

Ultimately, this optimization seem to not apply to aarch64, likely due to the instructions being fixed size. If this code is to pad around a page boundary then that boundary likely lies in a place that is not a hot spot. It is possible that another function could be optimized this way. However I have not seen any performance change at all. This to me shows one of the major advantages of aarch64, not having to worry about optimizing the placement of the instructions results in better, cleaner and faster code. While I cannot confirm the exact reason that the code was used for x86_64, I am quite confident that the regression is not occurring on aarch64. Or if it is occurring it is not effecting this function.