Extending the Assembler Loop

This builds on the assembler code from last post.
The target output is:

Loop: 00
Loop: 01
Loop: 02
Loop: 03
Loop: 04
Loop: 05
Loop: 06
Loop: 07
Loop: 08
Loop: 09
Loop: 10
Loop: 11
Loop: 12
Loop: 13
Loop: 14
Loop: 15
Loop: 16
Loop: 17
Loop: 18
Loop: 19
Loop: 20
Loop: 21
Loop: 22
Loop: 23
Loop: 24
Loop: 25
Loop: 26
Loop: 27
Loop: 28
Loop: 29

In order to extend this code to work on numbers more than one digit in size we need to separate it so that we can add the offset to every digit in the number, and then stuff that back into the appropriate place in the message.

In x86_64 we have to use the div command to divide the counter by 10 in order to separate the digits. To do this we first need to zero the rdx register. Then we need to move our index into rax, and our divisor into a register. Finally we can call div %r10, which will replace rax with the result, it will also put the remainder into rdx. This gives use what we need to add offset and stuff the results into message, like we did previously.

The aarch64 code is again easier to write than the x86_64 code. We can divide on any registers, so we put our divisor in a register and divide our iterator by that. The result of this can be stored in any register we want. This removes the setup we had to do for x86_64. However this does not calculate the remainder for us like in X86_64. In order to do that we use msub (msub result, quotient, divisor, dividend), and then we can do our message stuffing.

The aarch64 approach is much better than the x86_64 version as the remainder is only calculated when needed. It also can use any register meaning its is more flexible and is less likely to accidentally destroy data.

The X86_64 code:

.globl _start

start = 0 /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 30 /* loop exits when the index hits this number (loop condition is i max) */
offset = 48
divisor = 10
zero = 0

mov $start,%r15 /* loop index */

/* ... body of the loop ... do something useful here ... */

movq $zero, %rdx /* move zero into %rdx, this is needed for the division*/
movq %r15, %rax /* move index into %rax */
movq $divisor, %r10 /* move divisor into register 10*/
div %r10 /* divide rax by register 10 */

movb $offset, %bl /*mov offset into register b*/
/* tens digit */
add %rax, %rbx /* add offset and division result */
mov %bl, msg + 6 /* move into position */

/* ones digit */
movb $offset, %bl
add %rdx,%rbx /* add offset and division remainder */
mov %bl, msg + 7 /* move into position */s

movq $len,%rdx
movq $msg,%rsi
movq $1,%rdi /* file descriptor stdout */
movq $1,%rax /* syscall sys_write */

inc %r15 /* increment index */
cmp $max,%r15 /* see if we're done */
jne loop /* loop if we're not */

mov $0,%rdi /* exit status */
mov $60,%rax /* syscall sys_exit */

.section .data

msg: .ascii "Loop: \n"
len = . - msg

The aarch64 code:

.globl _start

start = 0
max = 30
offset = 48
divisor = 10

mov x19, start
mov x23, divisor
adr x21, msg /* put msg into register 21 */

udiv x20, x19, x23 /* divide index by divisor, store into x20 */
msub x22, x20, x23, x19 /* calculate remainder and store into x22 */

/* tens */
add x24, x20 ,offset /*add offset to division result*/
strb w24, [x21, 6] /* move addition result into msg + 6*/

/* ones */
add x24, x22, offset /*add offset to remainder */
strb w24, [x21, 7] /* move into msg+7 */

mov x0, 1 /* file descriptor: 1 is stdout */
adr x1, msg /*message location (memory address) */
mov x2, len /* message length (bytes) */

mov x8, 64 /* write is syscall #64 */
svc 0 /*invoke syscall */

add x19,x19,1 /*LOOP CHECKING STUFF*/
cmp x19,max
b.ne _loop

mov x0, 0 /* status -> 0*/
mov x8, 93 /* exit is syscall #93 */
svc 0 /* invoke syscall*/

msg: .ascii "Loop: \n"
len= . - msg