-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Default to stat axis for SparseDataFrame when axis=None #13066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Surprisingly, there aren't a lot of places where |
@@ -74,3 +74,4 @@ Performance Improvements | |||
|
|||
Bug Fixes | |||
~~~~~~~~~ | |||
- Bug in ``SparseDataFrame`` in which ``axis=None`` did not default to ``self._stat_axis_number`` (:issue:`13048`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
say axis=0
, users have no idea what self._stat_axis_number
is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
a671f55
to
ad81154
Compare
# no exception should be raised even though | ||
# numpy passes in 'axis=None' | ||
funcs = ['sum', 'cumsum', 'var', 'mean', 'prod', | ||
'cumprod', 'std'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add kurt, skew, sem, max, min, idxmax, idxmin
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.kurt
, np.skew
, np.sem
, np.idxmin
, and np.idxmax
do not exist. Will add for np.max
and np.min
though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, so do for np.argsort/argmin
as well then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.argmin
or np.argmax
breaks because __array_wrap__
for SparseDataFrame
expects a 2-D input, which is not what you get when you call either of those functions. I think this is a bug (or lacking feature) on the pandas
side because argmin
and argmax
should return scalars, meaning __array_wrap__
should have to accept them. @jreback thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arg mom and max are aliases for idxmin/max which are correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really? I think that's true only for Series
. Doesn't exist for DataFrame
. Unless you want to add them for compat purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh u r testing frames
need same tests for series
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, okay. But what about the aliasing for DataFrame
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is aliasing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For Series
, you alias idxmin
to argmin
and likewise idxmax
to argmax
for ndarray
compat. However, that aliasing doesn't exist for DataFrame
.
ad81154
to
678beab
Compare
@jreback : would be great to get some feedback regarding the |
678beab
to
4dddda2
Compare
@gfyoung can you a copy-pastable example here of issue with |
>>> from pandas import SparseDataFrame
>>> import numpy as np
>>> sdf = SparseDataFrame([[1, 2, 3], [4, 5, 6]])
>>> np.argmin(sdf)
...
ValueError: Must pass 2-d input I am not advocating that it be added to the list of supported methods, but I see no reason to test it per se in the list of functions for |
4dddda2
to
7cd9244
Compare
doesn't work on dense either. I suppose you could make a separate issue about this. Its not high on priority list though.
|
# numpy passes in 'axis=None' or `axis=-1' | ||
funcs = ['sum', 'cumsum', 'var', 'mean', | ||
'prod', 'cumprod', 'std', 'argsort', | ||
'argmin', 'argmax', 'sort', 'min', 'max'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so let's take out argmin/max
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@jreback : I don't consider to be an issue, as the documentation makes it clear that you should call the |
@gfyoung yep, take em out. ping when green. |
7cd9244
to
da30e21
Compare
@jreback : Travis is all green. Ready to merge if there is nothing else. |
thanks! |
Title is self-explanatory. Closes #13048.