The System V ABI "AMD64 Architecture Processor Supplement" stipulates that the stack should be aligned to a 16-byte boundary before calling a function. It provides the following:
"The end of the input argument area shall be aligned on a 16 (32, if
__m256
is passed on stack) byte boundary. In other words, the value (
%rsp + 8
) is always a multiple of 16 (32) when control is transferred to the function entry point. The stack pointer,
%rsp
, always points to the end of the latest allocated stack frame."
So... aligning the stack to a 16-byte boundary is easy, right?