In my last post on a vectorised bitwise-OR function for kdb+, I wondered towards the end what the next interesting step would be — should it be the production of an AVX-variant of the same code, or should it be the modification of the code to handle
short
as well as byte
values? Well, I opted for the former, and found out that it really wasn't worth the trouble, and the biggest benefit was realising some of the limitations of the AVX instruction-set. It's worth sharing, since this may help others in the future and will help produce a faster version when AVX2 instructions come out... which is a hint.