-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrame.agg({'col': 'size'}) not working #16405
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
hmm, that doesn't seem friendly. I think the issue is that |
Hey there, I started looking into this as part of the PyCon sprint. Using the example: >>> pd.DataFrame({'g': [0, 0, 1], 'v': [1, 2, None]}).agg({'v': 'size'}) I tracked down where the error that gets passed by is. In def _try_aggregate_string_function(self, arg, *args, **kwargs):
"""
if arg is a string, then try to operate on it:
- try to find a function on ourselves
- try to find a numpy function
- raise
"""
assert isinstance(arg, compat.string_types)
f = getattr(self, arg, None)
if f is not None:
return f(*args, **kwargs)
f = getattr(np, arg, None)
if f is not None:
return f(self, *args, **kwargs)
raise ValueError("{} is an unknown string function".format(arg)) The arg in this case is def apply(self, func, axis=0, broadcast=False, raw=False, reduce=None,
args=(), **kwds):
"""
Applies function along input axis of DataFrame.
...
"""
axis = self._get_axis_number(axis)
ignore_failures = kwds.pop('ignore_failures', False)
# dispatch to agg
if axis == 0 and isinstance(func, (list, dict)):
return self.aggregate(func, axis=axis, *args, **kwds)
... It seems like a good place to cut this off may be either def _try_aggregate_string_function(self, arg, *args, **kwargs):
"""
if arg is a string, then try to operate on it:
- try to find a function on ourselves
- try to find a numpy function
- raise
"""
assert isinstance(arg, compat.string_types)
f = getattr(self, arg, None)
if f is not None:
return f(*args, **kwargs) if callable(f) else f # Do people still hate the ternary operator?
... But since this is actually in the base class, there could be some unforeseen consequences of this I can't see? (And barring those consequences, should this should just be restricted to calls of 'size'? Otherwise people could do potentially-unexpected-but-also-kinda-cool things like call Appreciate any thoughts, thanks! |
@pvomelveny what you did is reasonable, but let's write it out (yes hate the ternary operator :>) maybe
and see if things pass (and add a comment about why doing this). |
@pvomelveny can you submit a pull request with those changes? I'm curious to see if the tests pass. |
size
aggregation does not seem to work properly when used with ungrouped dataframes. When replacingsize
withcount
the examples run through. However, this change null semantics, counting nulls (size
) in contrast to skipping nulls (count
). The examples below are for Pandas version0.20.1
.When used with
.groupby(...)
, size aggregation works as expected:Without
.groupby
, it raises an recursion error:The text was updated successfully, but these errors were encountered: