-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST/PERF: make a test input creater instead of np.random.randn mainly for perf testing #3114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I expected np.array(order='C') , np.array(order='F') to work, it doesn't. |
I think you have to do
|
we should also exercise non-contiguous cases, too, because those result from slicing larger contiguous arrays. |
absolutely, if you have time, maybe put together of a list of things we shoudl test (on perf basis) where these things matter? |
well, that's all i can think of now (obviously I didn't think of this before it came up as a perf issue, otherwise I would have tested it already), but I'll make note if I think of any more |
It looks like the perf regression was 50% - but on a 1ms test. |
@y-p take is a primitive operation for a lot of other operations though...I think the test suite is not representative of real world usage in general (and performance is definitely noticeable for large data sets...like >1GB, which I've actually had to deal with for personal usage already) so it's very hard to tell what matters and what doesn't; it's like reading tea leaves |
of course, I guess i'm trying to understand what regression we're talking about, since Basically, you've landed a big optimization by taking array order into account, |
do you mean the latest PR, #3130? that does address the regression in |
I'm trying to test a further optimization refinement (described in the FINAL EDIT portion of this SO question) and I'm getting some extreme variability in my results (with or without the new BUILD_CACHE_DIR, although it seems to be worse with it...still not sure if there's something going on with that or if it's just a fluke). I've talked to some performance experts in the C++ SO chatroom about this and the consensus seems to be that randomly generating a set of indices once but reusing them over and over for multiple runs of the same test (as is the case for some, but not all, of the suite tests...I believe this is the case at least... can anyone confirm that the To get more consistent results, I think we need to do one of the following:
The first is probably better but it means that the time of the random index generation is part of the results of the test, which is non-ideal. The second might lead to less coverage in case hardcoded indices don't adequately cover real use cases. Not 100% sure how much of an issue this is, but from the results I've been seeing it seems like this is the most likely culprit...still investigating (a little bit hard because of test turnaround time). |
Does vb actually do multiple iterations and averaging? I didn't know. So you're suggesting that adding the default random seed to test_perf will actually |
If you can post a script example for me to verify that what you suggests eliminates variation |
@y-p I don't know either about vb_suite...I'm guessing the i.e.
calls In this case,
but the same labels are (presumably) used repeatedly so any performance effect of structural features of the permutation will be present in every iteration rather than averaged out. As per your statement
I think I'm suggesting the opposite actually (unless I'm misunderstanding you)...if we fix the seed (it's not fixed right now, is it?), which I didn't actually consider, then every run of vb_suite will be using the same random numbers so we'll eliminate the effect of variability entirely in theory (but possibly the test will be non-representative by accident). Or we could go the opposite extreme, and make every single iteration of each test variable, so the variability hopefully averages out. What doesn't work is the middle ground where we non-deterministically generate a random set of numbers once and reuse them over and over (which I presume is what's happening right now). |
@y-p I'm replying here instead of on closed PR
You mean fixing the seed so that every run of vb_suite generates the same values, right? Are you sure it's working so that the same values are generated every time (i.e. it's picking up the seed correctly and not being reseeded ron-deterministically elsewhere?) It's possibly only be relevant for this low-level thing I'm doing right now, since data alignment makes a big difference in the performance in this case. The approach is non-ideal regardless, anyway (assuming I'm correct about how |
(Rather than just conjecturing more, I'm going to try running vb_suite overnight repeatedly to test this systematically with and without various changes to tests; it might take some time for me to get enough data to draw firm conclusions...I'll report back if/when I do...) |
then I will too. re repeatable seed making variability better not worse, good I thought so too. re averaging: You've actually tested this on a specific test and got what you |
be aware of the changes in test_perf, with the seeding, make sure you're running the commit |
did you push a change to |
re: setting up the seed. as I wrote I found no other calls to seed within vbench, yes I did, that's what I noted in my comment you quoted above. |
if random data produces that much variation in results, you need to keep the test_perf.sh -s myseed(default 1234) |
OK, great, I'll try overnighting a bunch of runs with fixed seed and a bunch of runs with different seeds and see how much variation I get. It'll help me narrow this down. (Not sure why I didn't think of putting a fixed seed originally, that's clearly easier than rewriting the tests :D) |
that's ok, you can do cache stripe calculations in your head, that's good enough 8) |
fyi, i did 3 runs on a small number of vbs, each time, the value of the random seed after I think that clearly proves the random data is now deterministically random. If that's not an oxymoron. |
ok, I definitely verified that my latest refinement leads to variable results, but it's not reduced by setting the random seed deterministically, so it means that the Python interpreter itself has non-deterministic behavior that is relevant for performance (probably in allocation patterns): here results with fixed seeds:
and with varying seeds:
This non-determinism is probably already present, just magnified by this particular change (which is weird, because the change doesn't seem to really do much...) |
I've seen python variability with %timeit many times. I'd suggest that the fact that these are sub-ms tests is playing a role, I take it you withdraw your FUD comments about the cython build cache? ;) |
well, to be honest, I can't be sure since all of this is with the cache on...I doubt that's an issue but I just mean that this doesn't provide any info about that actually. I can try repeating everything with the cache off if we want to be sure. |
my change is literally what I describe in the last section of the SO question, though...so it seems like it'd be benign: all I'm doing taking advantage of the known strides to write a loop that uses those strides at compile-time rather than a generic one that evaluates the strides at run-time...possibly |
You could measure variability for each vbench seperately, and see if large variance |
looked through numpy list prior dicussions about this, and turns out there's no guarantee of alignment beyond the minimum required (which for me appears to always be 32-bit even for 64-bit doubles in my case...that's what's probably causing the problem)...apparently you can overallocate a bit (with a 1-d array), manually get a view of an aligned subarray and set the strides yourself it you need it...memory is cheap enough that an extra 96 bits per row + 96 bits total at the beginning (worst possible case) can't make much of a difference...probably is that you'd have to change the allocation everywhere where it's relevant...will split this off to a new issue |
alignment can yield big wins. most def. need to validate that this is what's actually happening though. |
I believe this is no longer relevant with ASV benchmarks but feel free to reopen if you disagree |
to allow easy testing of c-contig/f-contig and long frames etc
see #3089
http://mail.python.org/pipermail/pandas-dev/2013-March/000008.html
The text was updated successfully, but these errors were encountered: