-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: value_counts not working correctly on ExtensionArrays #33674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: value_counts not working correctly on ExtensionArrays #33674
Conversation
949ef4f
to
ea5f7c4
Compare
ea5f7c4
to
b025232
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a whatsnew note as well, you can put it in numeric bug fix section
This could be an opportunity to use the more performant sum function from core\array_algos\masked_reductions.py and add a sum method to IntegerArray (or maybe BaseMaskedArray) There is a precedent for adding sum methods to extension arrays (timedeltas and sparse) but not sure how this fits in with the EA interface. @jorisvandenbossche? |
Hello @kotamatsuoka! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-04-30 23:45:49 UTC |
c6c4751
to
67c83e9
Compare
957d105
to
4235020
Compare
4235020
to
7007205
Compare
@simonjayhawkins That's the approach I was taking here although am not sure at the moment how best to implement this: #33538 |
hmm, didn't realise that this PR overlaps with #33538. |
2d857d1
to
dac1af5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kotamatsuoka for working on this.
the bug is fixed on master, see #33172 (comment)
just needs tests.
pandas/core/arrays/string_.py
Outdated
@@ -290,7 +290,7 @@ def _reduce(self, name, skipna=True, **kwargs): | |||
def value_counts(self, dropna=False): | |||
from pandas import value_counts | |||
|
|||
return value_counts(self._ndarray, dropna=dropna).astype("Int64") | |||
return value_counts(self._ndarray, dropna=dropna) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you revert this back to what it was on master (will need to change the string tests back as well)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If revert, value_counts method return different dtype and need to change many tests. (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsaxton why/where were these changed to Int64 in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it was added here #30824
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback this is converted to Int64 on purpose, in the idea that operations on nullable columns should, to the extent possible, also result in new nullable dtypes (similarly as doing a comparison operation with string dtype will give a nullable boolean dtype)
pandas/core/arrays/string_.py
Outdated
@@ -290,7 +290,7 @@ def _reduce(self, name, skipna=True, **kwargs): | |||
def value_counts(self, dropna=False): | |||
from pandas import value_counts | |||
|
|||
return value_counts(self._ndarray, dropna=dropna).astype("Int64") | |||
return value_counts(self._ndarray, dropna=dropna) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dsaxton why/where were these changed to Int64 in the first place?
this also needs a whatsnew note, add in bug fixes, extension array section in 1.1 |
4e283ff
to
badbaa4
Compare
pandas/tests/extension/test_numpy.py
Outdated
@@ -199,6 +199,10 @@ class TestMethods(BaseNumPyTests, base.BaseMethodsTests): | |||
def test_value_counts(self, all_data, dropna): | |||
pass | |||
|
|||
@pytest.mark.skip(reason="value_counts not implemented") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be skip or xfail?
badbaa4
to
bf91882
Compare
bf91882
to
67ee980
Compare
@kotamatsuoka the base test that you added is now being skipped for several ExtensionArray classes (which is fine, to be clear), but I think that we should then test it specifically for at least some of those classes with a custom test (I would at least test it for IntegerArray and BooleanArray). So instead of xfailing or skipping the test in (or is this already tested elsewhere?) |
Thanks @jorisvandenbossche . |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. ping on green.
cdc2f09
to
3473a42
Compare
3473a42
to
5d54a05
Compare
thanks @kotamatsuoka |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff