Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
API: membership checks on ExtensionArray containing NA values #37867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: membership checks on ExtensionArray containing NA values #37867
Changes from 25 commits
7ab6443
7986bc4
466f7cc
a9b75dd
9d0eca1
b0b32ab
c6e42d2
75c45bc
83c9fe4
08c4c98
5a23b1d
4b0c200
92604e9
8a24f0d
fdb9deb
52e2b43
f21890e
6f633c7
d8bdb2e
4e4dbc4
a1583e7
3c2c2b0
237fe45
37219c3
c4a6c36
245c99a
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Arrow arrays have Arrow Nulls as the dtype.na_value, while data_missing[0] gives ´None´.
Maybe @TomAugspurger could look into that as part of his Arrow work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine to simply skip it as you did. This are only test EAs, that were needed for certain specific tests, so no problem this don't fully work otherwise.
The actual public arrays using Arrow under the hood (eg the new ArrowStringArray) has a different implementation and is fully tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also base these test arrays on
pd.NA
? Otherwise we can use the same code as in the string array to replace the scalar return value ofNone
withna_value
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might consider actually using the nulls_fixture as this is more comprehensive (sure it will run the other checks multiple times but no big deal).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am personally -1 on using a fixture for this (we could move the list of nulls that is currently used for defining the fixture to constant in
_testing.py
and use that constant in both places, though)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in this case?
we already have a nulls fixture and use it in a great many places
we need to be comprehensive and general in testing - specific cases are ok sometimes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok here's the fixture contents
so if you add
float('nan')
then this should coverThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the generality of using the nulls_fixture too. I've added it in the newest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This duplicates what you have in
tests/arrays/categorical/test_operators.py
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not:
Categorical
already had a__contains__
method and it's more permissive than the new one. So, in this file we have (below)assert na_value_type in data_missing
, while the base tests method isassert na_value_type not in data_missing
(notice thenot
).na values is also more complicated in categoricals, because in some cases we want to accept
pd.NaT
and in other cases not. I'd like to take it in another round (or let it slide)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, I was not referring to the base tests that this is overriding, but the original test you added to
tests/arrays/categorical/test_operators.py
which also tests this more permissive behaviour?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I'll delete the ones in
tests/arrays/categorical/test_operators.py
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here.