Monday, 16 January 2012

Hello, World!

Well, here it is, the ubiquitous "Hello, World!" example.

Despite the title, there are a couple of interesting things to note in the code below. The first is calculating the length of the string hellotxt, the second being the different usage of the $ operator in GNU assembler.


  1. The string length is calculated using the .dot symbol which is interpreted as "the current address that as is assembling into" (the gas documentation can be found here).
  2. I've entered three equivalent lines below, two of which I've commented out. They would, if compiled, define the symbol msg_len and set its value to be the length of the string hellotxt using the technique described above. This value is referenced later in the line "movq $msg_len, %rdx" in which it is set as the value to %rdx.

    The reason I wanted to draw your attention to this is the incongruent meaning of the $ symbol. In the line above (movq $hellotxt, %rsi) it means "copy the address of hellotxt to %rsi", whereas when it references a symbol, it means "replace this placeholder with the value of the symbol msg_len". So to get this right, you need to know your symbols from your memory references.


Anyway, without further ado, here's the code and a Makefile which should build it on a 64-bit Linux distro. I'm currently using as from GNU Binutils 2.20.1-20100303. My kernel is 2.6.32-33 and I'm using Xubuntu.

hello.s
-------------------------------8<-------------------------------
.section .data
hellotxt:    .asciz    "Hello, World!\n"
        msg_len =    . - hellotxt       # define a *symbol* to represent the length of the hellotxt string
#.equ   msg_len ,    . - hellotxt       # defines the same symbol using an equate
#.set   msg_len ,    . - hellotxt       # defines the same symbol using the .set directive

.section .text
.globl _start

_start:

        movq    $1, %rax                # sys_write
        movq    $1, %rdi                # stdout
        movq    $hellotxt, %rsi         # use '$' to get address-of 'hellotxt'
        movq    $msg_len, %rdx          # use '$' to reference the symbol 'msg_len', define above
        syscall

        movq    $60, %rax               # sys_exit
        movq    $0, %rdi                # exit code
        syscall

-------------------------------8<-------------------------------

Makefile
-------------------------------8<-------------------------------

hello: hello.o
 ld -o hello hello.o

hello.o: hello.s
 as -gstabs -o hello.o hello.s

clean:
 rm hello.o hello

-------------------------------8<-------------------------------


A quick examination of the output of objdump -D hello shows the following:
Disassembly of section .text:

00000000004000b0 <_start>:
  4000b0: 48 c7 c0 01 00 00 00  mov    $0x1,%rax
  4000b7: 48 c7 c7 01 00 00 00  mov    $0x1,%rdi
  4000be: 48 c7 c6 e0 00 60 00  mov    $0x6000e0,%rsi
  4000c5: 48 c7 c2 0f 00 00 00  mov    $0xf,%rdx


You can see that the value 0xf has been substituted for $msg_len, the length of hellotxt plus it's trailling null-byte which was added by the .asciz directive.

If you were to compile it with msg_len as a .quad instead, however, the code would look like this:

hello.s
-------------------------------8<-------------------------------
.section .data
hellotxt:    .asciz    "Hello, World!\n"
msg_len:     .quad    . - hellotxt

.section .text
.globl _start

_start: 

        movq    $1, %rax                # sys_write
        movq    $1, %rdi                # stdout
        movq    $hellotxt, %rsi         # use '$' to get address-of 'hellotxt'
        movq    msg_len, %rdx           # value-at 'msg_len'
        syscall
        
        movq    $60, %rax               # sys_exit
        movq    $0, %rdi                # exit code 
        syscall 

-------------------------------8<-------------------------------


The output of objdump -D hello then looks like this:
Disassembly of section .text:

00000000004000b0 <_start>:
  4000b0: 48 c7 c0 01 00 00 00  mov    $0x1,%rax
  4000b7: 48 c7 c7 01 00 00 00  mov    $0x1,%rdi
  4000be: 48 c7 c6 e0 00 60 00  mov    $0x6000e0,%rsi
  4000c5: 48 8b 14 25 ef 00 60  mov    0x6000ef,%rdx
        ...

Disassembly of section .data:

00000000006000e0 :
  6000e0: 48                    'H'
  6000e1: 65                    'e'
  6000e2: 6c                    'l'
  6000e3: 6c                    'l'
  6000e4: 6f                    'o'
        ...

00000000006000ef :
  6000ef: 0f 00 00              The value '15'
  6000f2: 00 00                 
  6000f4: 00 00                 
        ...


In this, you can see that instead of loading literal 0xF into %rdx, the instruction now loads the value at 0x6000ef into the register. Helpfully, objdump shows that the value at 0x6000ef is... 15.

No comments:

Post a Comment