Skip to content

BUG: Allow non-callable attributes in aggregate function. Fixes GH16405 #16458

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 1, 2017
Merged

BUG: Allow non-callable attributes in aggregate function. Fixes GH16405 #16458

merged 3 commits into from
Jun 1, 2017

Conversation

pvomelveny
Copy link
Contributor

@pvomelveny pvomelveny commented May 23, 2017

@TomAugspurger
Copy link
Contributor

Can you add the test from the original issue here? It can go in pandas/tests/frame/test_apply.py

Also could use a release note in doc/source/whatsnew/v0.20.2.txt, under the bugfix section.

@TomAugspurger TomAugspurger added this to the 0.20.2 milestone May 23, 2017
@TomAugspurger TomAugspurger added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 23, 2017
@codecov
Copy link

codecov bot commented May 23, 2017

Codecov Report

Merging #16458 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16458      +/-   ##
==========================================
+ Coverage   90.43%   90.43%   +<.01%     
==========================================
  Files         161      161              
  Lines       51045    51049       +4     
==========================================
+ Hits        46161    46165       +4     
  Misses       4884     4884
Flag Coverage Δ
#multiple 88.27% <100%> (ø) ⬆️
#single 40.16% <0%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/base.py 96.21% <100%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e81f3cc...0a367c6. Read the comment docs.

@pvomelveny
Copy link
Contributor Author

pvomelveny commented May 23, 2017

I may have gone a bit overboard with the tests... although somehow I made coverage worse overall?

@@ -378,7 +378,7 @@ def aggregate(self, func, *args, **kwargs):
def _try_aggregate_string_function(self, arg, *args, **kwargs):
"""
if arg is a string, then try to operate on it:
- try to find a function on ourselves
- try to find an attribute on ourselves
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function (or attribute)

@@ -83,6 +83,7 @@ Reshaping
- Bug in ``DataFrame.stack`` with unsorted levels in MultiIndex columns (:issue:`16323`)
- Bug in ``pd.wide_to_long()`` where no error was raised when ``i`` was not a unique identifier (:issue:`16382`)
- Bug in ``Series.isin(..)`` with a list of tuples (:issue:`16394`)
- Bug in ``DataFrame.agg`` and ``Series.agg`` with aggregating on non-callable attributes (:issue:`16404`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this only affects .size, so say that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it works right now, you really can use any attribute in the aggregation, so something like:

In [20]: df
Out[20]:
     A    B     C
0  NaN  1.0   foo
1  2.0  NaN  None
2  3.0  3.0   bar

In [21]: df.agg({'A': ['count', 'size', 'name'],
    ...:         'B': ['count', 'size', 'name', 'data'],
    ...:         'C': 'size'})
    ...:
Out[21]:
         A                B    C
count    2                2  NaN
data   NaN  [1.0, nan, 3.0]  NaN
name     A                B  NaN
size     3                3  3.0

works.

If it should only affect size, we could change the check in base.py to just see if the arg passed is 'size'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one's not actually outdated. I just messed up the bug # and had to fix it. The question of this only affecting size is still open

return f(*args, **kwargs)
if callable(f):
return f(*args, **kwargs)
# people may try to aggregate on a non-callable attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line

'B': {'count': 2, 'size': 3},
'C': {'count': 2, 'size': 3}})

assert_frame_equal(result1.reindex_like(expected),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can pass check_like=True to do this reindex comparisons.

result2.reindex_like(expected))
assert_frame_equal(result2.reindex_like(expected), expected)

# Just functional string arg is same as calling df.arg()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a minimal tests to series/test_apply.py

assert_series_equal(result, expected)

# Trying to to the same w/ non-function tries to pass args
# which we explicitly forbid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this actually should work, what are the args/kwargs that are actually there? (ithere maybe internal ones that need to be remove FYI, e.g. _level or something)

@TomAugspurger
Copy link
Contributor

@jreback are you OK with this? I believe the discussion was whether we could have a simpler implementation by special-casing 'size'?

@jreback
Copy link
Contributor

jreback commented May 30, 2017

can't change size as it's already defined as a property (for numpy compat)

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very small doc change. ping on green.

@@ -89,6 +89,7 @@ Reshaping
- Bug in ``pd.wide_to_long()`` where no error was raised when ``i`` was not a unique identifier (:issue:`16382`)
- Bug in ``Series.isin(..)`` with a list of tuples (:issue:`16394`)
- Bug in construction of a ``DataFrame`` with mixed dtypes including an all-NaT column. (:issue:`16395`)
- Bug in ``DataFrame.agg`` and ``Series.agg`` with aggregating on non-callable attributes (:issue:`16405`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DataFrame.agg() and Series.agg()

@@ -635,3 +635,46 @@ def test_nuiscance_columns(self):
expected = DataFrame([[6, 6., 'foobarbaz']],
index=['sum'], columns=['A', 'B', 'C'])
assert_frame_equal(result, expected)

def test_non_callable_aggregates(self):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment of .size here (and how its a property of Series/DataFrame and hence we allow this)

@jreback jreback merged commit a67c7aa into pandas-dev:master Jun 1, 2017
@jreback
Copy link
Contributor

jreback commented Jun 1, 2017

thanks!

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Jun 1, 2017
TomAugspurger pushed a commit that referenced this pull request Jun 4, 2017
Kiv pushed a commit to Kiv/pandas that referenced this pull request Jun 11, 2017
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.agg({'col': 'size'}) not working
3 participants