-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Allow non-callable attributes in aggregate function. Fixes GH16405 #16458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you add the test from the original issue here? It can go in Also could use a release note in |
Codecov Report
@@ Coverage Diff @@
## master #16458 +/- ##
==========================================
+ Coverage 90.43% 90.43% +<.01%
==========================================
Files 161 161
Lines 51045 51049 +4
==========================================
+ Hits 46161 46165 +4
Misses 4884 4884
Continue to review full report at Codecov.
|
I may have gone a bit overboard with the tests... although somehow I made coverage worse overall? |
pandas/core/base.py
Outdated
@@ -378,7 +378,7 @@ def aggregate(self, func, *args, **kwargs): | |||
def _try_aggregate_string_function(self, arg, *args, **kwargs): | |||
""" | |||
if arg is a string, then try to operate on it: | |||
- try to find a function on ourselves | |||
- try to find an attribute on ourselves |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function (or attribute)
doc/source/whatsnew/v0.20.2.txt
Outdated
@@ -83,6 +83,7 @@ Reshaping | |||
- Bug in ``DataFrame.stack`` with unsorted levels in MultiIndex columns (:issue:`16323`) | |||
- Bug in ``pd.wide_to_long()`` where no error was raised when ``i`` was not a unique identifier (:issue:`16382`) | |||
- Bug in ``Series.isin(..)`` with a list of tuples (:issue:`16394`) | |||
- Bug in ``DataFrame.agg`` and ``Series.agg`` with aggregating on non-callable attributes (:issue:`16404`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this only affects .size
, so say that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it works right now, you really can use any attribute in the aggregation, so something like:
In [20]: df
Out[20]:
A B C
0 NaN 1.0 foo
1 2.0 NaN None
2 3.0 3.0 bar
In [21]: df.agg({'A': ['count', 'size', 'name'],
...: 'B': ['count', 'size', 'name', 'data'],
...: 'C': 'size'})
...:
Out[21]:
A B C
count 2 2 NaN
data NaN [1.0, nan, 3.0] NaN
name A B NaN
size 3 3 3.0
works.
If it should only affect size, we could change the check in base.py to just see if the arg passed is 'size'
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one's not actually outdated. I just messed up the bug # and had to fix it. The question of this only affecting size is still open
pandas/core/base.py
Outdated
return f(*args, **kwargs) | ||
if callable(f): | ||
return f(*args, **kwargs) | ||
# people may try to aggregate on a non-callable attribute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line
pandas/tests/frame/test_apply.py
Outdated
'B': {'count': 2, 'size': 3}, | ||
'C': {'count': 2, 'size': 3}}) | ||
|
||
assert_frame_equal(result1.reindex_like(expected), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can pass check_like=True
to do this reindex comparisons.
pandas/tests/frame/test_apply.py
Outdated
result2.reindex_like(expected)) | ||
assert_frame_equal(result2.reindex_like(expected), expected) | ||
|
||
# Just functional string arg is same as calling df.arg() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a minimal tests to series/test_apply.py
pandas/tests/frame/test_apply.py
Outdated
assert_series_equal(result, expected) | ||
|
||
# Trying to to the same w/ non-function tries to pass args | ||
# which we explicitly forbid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this actually should work, what are the args/kwargs that are actually there? (ithere maybe internal ones that need to be remove FYI, e.g. _level
or something)
@jreback are you OK with this? I believe the discussion was whether we could have a simpler implementation by special-casing |
can't change size as it's already defined as a property (for numpy compat) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very small doc change. ping on green.
doc/source/whatsnew/v0.20.2.txt
Outdated
@@ -89,6 +89,7 @@ Reshaping | |||
- Bug in ``pd.wide_to_long()`` where no error was raised when ``i`` was not a unique identifier (:issue:`16382`) | |||
- Bug in ``Series.isin(..)`` with a list of tuples (:issue:`16394`) | |||
- Bug in construction of a ``DataFrame`` with mixed dtypes including an all-NaT column. (:issue:`16395`) | |||
- Bug in ``DataFrame.agg`` and ``Series.agg`` with aggregating on non-callable attributes (:issue:`16405`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DataFrame.agg()
and Series.agg()
@@ -635,3 +635,46 @@ def test_nuiscance_columns(self): | |||
expected = DataFrame([[6, 6., 'foobarbaz']], | |||
index=['sum'], columns=['A', 'B', 'C']) | |||
assert_frame_equal(result, expected) | |||
|
|||
def test_non_callable_aggregates(self): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment of .size
here (and how its a property of Series/DataFrame and hence we allow this)
thanks! |
…05 (pandas-dev#16458) (cherry picked from commit a67c7aa)
git diff upstream/master --name-only -- '*.py' | flake8 --diff