Syntax Highlighting for Assembly Code

Published

January 17, 2012

This is now completely irrelevant as I’m now running this website using Quarto, which is completely brilliant. I thought I’d leave this in the catalogue as it was my first foray into really lengthy Sed scripts (which probably shows)

Having put a fair amount of time into writing the posts I’ve published up so far, I’ve become disappointed with publishing code snippets in <pre> tags. There is Alex Gorbatchev’s shiny JavaScript solution, but back when this blog was on Wordpress, I couldn’t use it on as I couldn’t supply my own ‘brush’ to format GNU assembler/gas code.

I solved the problem by writing a sed script capable of producing the required HTML. I’ve posted it here so that anyone else wanting to post assembly can do so (although I’ve only posted a few instruction’s worth of highlighting here - you’ll probably want to adapt this for your needs).

Executing it is simple. When I’ve got a source-file ready for publishing (and ensured that tabs other than leading tabs have been converted to spaces), I just have to execute the following at my bash prompt:

asmstyle.sed mysource.s > mysource.s.html

All that remains is to paste it into the HTML text-area. I’ll post the sed file I use for marking-up Makefiles at some point soon.

The asmstyle.sed script has worked for my posts to date, but I’m sure that there are still some bugs in it. In that sense, feel free to use it, but YMMV. Enjoy ;)

asmstyle.sed
#!/bin/sed -nf

# copy the line into the hold-space
h
/#/!s/.*//
/#/ {
    # Isolate the comment section
 s/^[^#]*#\(.*\)$/#\1/
    # add <span> tags
 s/^/<span style="color:#080">/
 s/$/<\/span>/
}
# exchange the hold and the pattern space
x

# delete the comment symbol and everything after it
s/^\([^#]*\)#.*$/\1/

#
# Gas Directives
#
s/\(\.[a-z]\+\)/<span style="color:#4D0000;font-weight:bold">\1<\/span>/g

t numbers
:numbers
#
# Numbers
#
# Any hex numbers 
s/\(\$\?~\?-\?0x[0-9A-Fa-f]\+\)/<span style="color:red">\1<\/span>/g
t instructions
# Any decimal numbers
s/\(\$\?~\?-\?[0-9]\+\)/<span style="color:red">\1<\/span>/g
t instructions

# symbols and indexed addressing
s/\(\$?[a-zA-Z_]\+\)(/<span style="color:red">\1<\/span>(/g
t instructions

# memory registers
s/\(\$[-~a-zA-Z_0-9]\+\)/<span style="color:red">\1<\/span>/g


:instructions
#
# Instructions
#
s/\(add[bwlq]\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(and[bwlq]\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(call\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(cmpxchg[bwlq]\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(jne\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(jz\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(jnz\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(imul[bwlq]\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(lea[bwlq]\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(leave\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(lock\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(loop\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(mov[bwlq]\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(mul[bwlq]\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(pop[lq]\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(push[lq]\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(ret[q]\{0,1\}\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(sub[bwlq]\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(syscall\)/<span style="color:#009;font-weight:bold">\1<\/span>/
s/\(test[bwlq]\)/<span style="color:#009;font-weight:bold">\1<\/span>/

#
# Registers
#
s/\(\*\?%[re][abcdis][xip]\)/<span style="font-weight:bold">\1<\/span>/g
s/\(\*\?%[abcd][xlh]\)/<span style="font-weight:bold">\1<\/span>/g
s/\(\*\?%r[8910]\)/<span style="font-weight:bold">\1<\/span>/g
s/\(\*\?%xmm[0-7]\)/<span style="font-weight:bold">\1<\/span>/g
s/\(\*\?%mmx[0-7]\)/<span style="font-weight:bold">\1<\/span>/g

:epilogue

#
# Append the contents of the hold-space to the pattern-space (which comes with 
# a newline, unfortunately
#
G
s/\n//

1 i\
<pre>\n<code style="color:silver">-------------------------------8&lt;-------------------------------</code>
$ a\
<code style="color:silver">-------------------------------8&lt;-------------------------------</code>\n</pre>

p