-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CI: Fix flaky test_value_counts_null #32449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
simonjayhawkins
merged 5 commits into
pandas-dev:master
from
SaturnFromTitan:fix-flaky-tests-test_value_counts_null
Mar 7, 2020
Merged
Changes from 1 commit
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
2113521
sorting by index before comparing results in test_value_counts_null
SaturnFromTitan cde9df6
using sort_index for value_counts tests only for float16
SaturnFromTitan f04d255
Merge branch 'master' into fix-flaky-tests-test_value_counts_null
SaturnFromTitan ab04a73
using sort_index for all fixture values to handle all inconsistencies…
SaturnFromTitan 72b4da1
only using sort_index if there are duplicated values
SaturnFromTitan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the result should just be obj.value_counts() for the test to be considered a valid test.
is it feasible for expected to be constructed to create the expected sort order or is the sort order tested elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally agree, but this is the only solution I found to work consistently on CI.
What I've found so far is the following:
value_counts
preserves the order fromobj
if multiple values share the same countfloat16
onWindows py36_np15
, but there might be more)So this might actually be a bug. I never worked with Cython, so it's hard for me to trace this deeper. It might be an issue in
value_count_float64
inpandas/_libs/hashtable_func_helper.pxi
My suggestion would be to
sort_index
forfloat16
for nowBy seeing if the test is still flaky afterwards we can gather more data about its relation to float16 etc.
Wdyt @simonjayhawkins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could it be not related to float16, just duplicate values are more likely due to decreased resolution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. If it's about duplicated values alone then it should always fail for
repeats
, see hereThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This time it failed with
int32
as well: https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=30037&view=logs&j=3a03f79d-0b41-5610-1aa4-b4a014d0bc70&t=4d05ed0e-1ed3-5bff-dd63-1e957f2766a9&l=66How should we go about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could just revert #32281 for now instead @jbrockmendel ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I come to wonder if the order of values with the same count is actually deterministic. If not, is there an alternative to using
sort_index
onresult
andexpected
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think merging this PR is better than reverting the #32281. In the previous version we just skipped all tests with duplicated values. Here we at least test that the values are correct, even though we don't validate that they are ordered consistently.