Description
Split off from #3114 ...
It doesn't seem (from my testing) that numpy guarantees 128-bit alignment the start of an array (which is odd, since malloc
usually does that no matter what, from my testing), and it certainly doesn't 128-bit align each row/column since they're always contiguous.
Apparently it can be done manually via a method like described below:
http://mail.scipy.org/pipermail/scipy-user/2009-March/020283.html
(This just aligns the start of the array, but if you overallocate more and play with strides you can do the same thing to every c-contig row or f-contig column)
The downside is that this has to be done everywhere where allocation currently happens...even with a helper function to do it it's going to be somewhat messy.
Also, to avoid too much overhead, I think this should only be done when each row/column is at least 256 bits (32 bytes) naturally, so the absolute worst case space penalty is 33% + 96 bits constant (on a 32-bit machine) or 20% + 64 bits constant (on a 64-bit machine)...
We can bump that up a bit higher to 512 bits if that still seems like too much, but it means that DataFrames with 5-7 float64/int64 columns could be pathologically misaligned. (So basically, this won't help at all until at least 9 columns are present...)