-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix randomly failing test in test_frame.py #9225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I concur cc @behzadnouri look right to you? |
the whole resulting index (the combination of the first 3 columns) should be unique, not a single level. see test example below. (the probability of non-unique would then be lower) I would recommend to change the PR to this: (git diff here)
this way it would be a more general test than forcing a single level to be all unique. example that the tests pass as long as
|
@behzadnouri I'd prefer to more explicitly fix this rather than just add a My new version of the patch enforces the uniqueness constraint by just doing multiple |
I agree with @qwhelan. Using A simpler fix (probably worth doing in any case) would be do set the random number seed for the test. I usually recommend using |
@shoyer the point of randomized tests is to discover corner cases that one may not think of. other-wise one would just provide a hand-written test case. by pre-fixing the random seed you lose that advantage.
|
@shoyer Normally I'd agree but we're only sampling 20 integers here - might as well hardcode that rather than set a seed. @behzadnouri The issue is not the notation but the lack of clarity into the what the acceptable data state is when the block is exited (and not having an upper-bound on runtime). My current patch directly produces your desired data. |
@qwhelan this looks good. ready to go? |
@jreback Yep. |
Fix randomly failing test in test_frame.py
thanks! |
@jreback I noticed a randomly failing test while doing a full suite run a week or two ago. The test is using random integers as part of a
MultiIndex
, which randomly leads to a non-unique index. Attempting to re-index theDataFrame
therefore randomly fails. The solution is to just sample without replacement.The following code gives a failure rate of ~1.5%, which is in line with birthday problem estimates (
d=1000
and four iterations each ofn=3
andn=2
):