Friday, 17 February 2012

AVX matrix-multiplication, or something like it

It's been a while since the last post, and I can confidently say that I understand one or two percent of the new (well, new to me) world of AVX instructions. There was the not so not-so-brief incident involving lots of head scratching about why my test implementation using vgatherqpd would cause a SIGILL exception on my Sandy Bridge laptop. I guess cpuid does have another use outside timing loops ;)

Thursday, 16 February 2012

Stack Alignment

The System V ABI "AMD64 Architecture Processor Supplement" stipulates that the stack should be aligned to a 16-byte boundary before calling a function. It provides the following:

"The end of the input argument area shall be aligned on a 16 (32, if __m256 is passed on stack) byte boundary. In other words, the value (%rsp + 8) is always a multiple of 16 (32) when control is transferred to the function entry point. The stack pointer, %rsp, always points to the end of the latest allocated stack frame."

So... aligning the stack to a 16-byte boundary is easy, right?