Monday, 16 January 2012

Calling ASM functions from Java

If you've been inquisitive enough to read the "About" pages you'll see that my day job involves writing software in Java. To that end, I've put together some code which demonstrates calling a function in a shared library (written in assembler). Hopefully the following will illustrate the steps involved fairly clearly.

Let's start, then, with that ubiquitous "Hello, World!" program again, except that this time we'll call a static, native method to do the printing.

HelloWorld.java
public class HelloWorld {

 static {
  System.loadLibrary("hello");
 }

 public static void main(String[] args) throws Exception {
  sayHello();
 }

 static native void sayHello();

}

You'll note that it's in the default package (for non-java types out there, this means that we omit the package declaration), but otherwise it's a vanilla implementation. We can also start by creating a Makefile to build our code:

Makefile - part 1
-------------------------------8<-------------------------------

all: HelloWorld.class

HelloWorld.class: HelloWorld.java
        javac -cp . HelloWorld.java

clean:
        rm *.class
-------------------------------8<-------------------------------


With a bit of luck (and a JDK installed on your system) this should compile into a new file called HelloWorld.class. Typically, in order to avoid bugs and minimise on the typing, you would use javah to create the header file which we'll use as a cheat-sheet from which to identify the function name we need to implement in assembler. We only need to build the header file once so there's no point adding it to the Makefile. From the working directory with the HelloWorld.class file in it, execute:

javah -classpath . HelloWorld

Its contents look like this:

HelloWorld.h
/* DO NOT EDIT THIS FILE - it is machine generated */
#include 
/* Header for class HelloWorld */

#ifndef _Included_HelloWorld
#define _Included_HelloWorld
#ifdef __cplusplus
extern "C" {
#endif
/*
 * Class:     HelloWorld
 * Method:    sayHello
 * Signature: ()V
 */
JNIEXPORT void JNICALL Java_HelloWorld_sayHello
  (JNIEnv *, jclass);

#ifdef __cplusplus
}
#endif
#endif


So, what's there to say about this? Well, it references the JDK's jni.h header and accommodates a C or C++ compiler. However, as I alluded to earlier, we're only going to use this as a template from which to steal the symbol Java_HelloWorld_sayHello. The as compiler does not permit dependent header files to be declared in the source-file, or anywhere for that matter. Other compilers such as nasm require the definition of external symbols and warn you at compile-time if you refer to a symbol which is undefined; as simply assumes the symbol will be satisfied at link-time, something which folk more able than I suggest leads to extremely hard-to-find bugs later on.

So, to the assembly code, now we know what the function should be called:

HelloWorld.s
-------------------------------8<-------------------------------
.section .data
        hellotxt:  .asciz "Hello, World!\n"
        msg_len =  . - hellotxt

.section .text

.type   Java_HelloWorld_sayHello, @function
.globl  Java_HelloWorld_sayHello

Java_HelloWorld_sayHello:

        pushq       %rbp                   # store the parent stack frame's base-pointer
        movq        %rsp%rbp             # store the updated stack-pointer as our base-pointer

        movq        $1%rax               # sys_write
        movq        $1%rdi               # stdout
        leaq        hellotxt(%rip), %rsi   # address-of 'hellotxt' using RIP-relative addressing
        movq        $msg_len%rdx         # value-of symbol 'msg_len', will insert literal 0xF/15 

        syscall                            # make the sys_write call

        movq        %rbp%rsp             # restore the previous stack-pointer from %rbp
        popq        %rbp                   # restore the previous base-pointer from the stack
        ret                                # return to the instruction after the 'call'

-------------------------------8<-------------------------------

All fairly straightforward, really. There's no _start label since we don't intend this to become an application, and we've used the as compiler's .type directive and @function declaration to tell it about how to treat the symbol Java_HelloWorld_sayHello. Omitting this directive didn't affect the performance of the function, strangely enough; I suspect the declaration's importance lies elsewhere.

You will also note the way we load the address of hellotxt is different from our original "Hello, World!" example. That code was compiled as a static binary, so the compiler had absolute control over the address to which it wrote the bytes which comprise the output string. When compiling the assembly code for a shared library, the compiler has no such knowledge, and some runtime indirection has to take place in order to reference its address. To this end, we benefit enormously from the fact that we're writing 64-bit assembly, as we can use the %rip register to calculate the offset to the hellotxt string. 32-bit relative-addressing is horrendous by comparison, and relies on knowing your relative offset from the Global Offset Table (GOT). You then see such code as this:

 call __i686.get_pc_thunk.bx
 addl $_GLOBAL_OFFSET_TABLE_, %ebx

In this case the call instruction pushes the address of the next execution instruction onto the stack so that the ret instruction can operate; I'd hazard a guess that function __i686.get_pc_thunk.bx inspects that stack-value and returns it (i.e. the instruction pointer's value) in the EBX register. To this is then added the address of the GOT.

Back in our 64-bit example, the leaq instruction writes into the %rsi register the value offsetOf(hellotxt) + valueIn(%rip). To be a bit more precise, the pseudo-code value offsetOf(hellotxt) is the address of a relocation. Try this resource for much more detail than I want to go into here. All you need to know is that if you intend to use your code in a shared library, you need to use position-independent code.

Right, the next incarnation of the Makefile gets a bit more interesting:

Makefile - part 2
-------------------------------8<-------------------------------

all: HelloWorld.class libhello.so

HelloWorld.class: HelloWorld.java
        javac -cp . HelloWorld.java

libhello.so: HelloWorld.o
        ld -fPIC -shared -o libhello.so HelloWorld.o

HelloWorld.o: HelloWorld.s
        as --64 -g -o HelloWorld.o HelloWorld.s

clean:
        rm *.o *.class *.so

-------------------------------8<-------------------------------

This Makefile should build and link a shared library by virtue of the -fPIC and -shared arguments to ld. PIC stands for position independent code, of course.

You should now be in a position to execute the following command line:
java -Djava.library.path=$(pwd) -cp . HelloWorld

Which, of course, should result in the expected output. ;)

No comments:

Post a Comment