-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrame.agg - why numpy.size doesn't work? #42203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Actually this works:
The only thing relevant to your issue is:
|
@aliceliu9988 can yo please add a descriptive title for the issue. |
Hi @attack68, Thanks for helping pinpoint the problem and reminding me to update the question title. I tried this: print("df.agg({'A': [np.size]}) is :",df.agg({'A':[np.size]})) It did go through, but the output is not the row count of column A (3), but this: df.agg({'A': [np.size]}) is : A I hope someone knows why. |
Internally, np.size is evaluated on a Series. For a UDF,
|
Hi Richard @rhshadrach, Thanks for your explanation. I am a beginner of Python and do appreciate your hints and will look more into the difference between 'transform' and 'aggregation. Ah... I have to say as a beginner I didn't expect the syntax to behave inconsistently like this. |
I realize now I wasn't very clear, but I was trying to say the same thing! Thanks for raising this issue. |
Wow, this looks serious. I have another example.
so
It gets weirder
|
[ x] I have checked that this issue has not already been reported.
[ x] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Problem description
Intuitively, I assumed df.agg({'A':[np.mean,np.std,np.size]}) should work as df.agg({'A':['mean','std','size']}) does, but it doesn't. I wonder why? Looked through docs like the below but still didn't get it:
Expected Output
A
4.0
####3.0
4.0
Output of
*df.agg({'A':[np.mean,np.std,np.size]})
TypeError Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\base.py in _aggregate_multiple_funcs(self, arg, _axis)
553 try:
--> 554 return concat(results, keys=keys, axis=1, sort=False)
555 except TypeError:
~\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
280 copy=copy,
--> 281 sort=sort,
282 )
~\anaconda3\lib\site-packages\pandas\core\reshape\concat.py in init(self, objs, axis, join, keys, levels, names, ignore_index, verify_integrity, copy, sort)
356 )
--> 357 raise TypeError(msg)
358
TypeError: cannot concatenate object of type '<class 'float'>'; only Series and DataFrame objs are valid
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
in
1 import numpy as np
----> 2 df.agg({'A':[np.mean,np.std,np.size]})
~\anaconda3\lib\site-packages\pandas\core\frame.py in aggregate(self, func, axis, *args, **kwargs)
6704 result = None
6705 try:
-> 6706 result, how = self._aggregate(func, axis=axis, *args, **kwargs)
6707 except TypeError:
6708 pass
~\anaconda3\lib\site-packages\pandas\core\frame.py in _aggregate(self, arg, axis, *args, **kwargs)
6718 result = result.T if result is not None else result
6719 return result, how
-> 6720 return super()._aggregate(arg, *args, **kwargs)
6721
6722 agg = aggregate
~\anaconda3\lib\site-packages\pandas\core\base.py in _aggregate(self, arg, *args, **kwargs)
426
427 try:
--> 428 result = _agg(arg, _agg_1dim)
429 except SpecificationError:
430
~\anaconda3\lib\site-packages\pandas\core\base.py in _agg(arg, func)
393 result = {}
394 for fname, agg_how in arg.items():
--> 395 result[fname] = func(fname, agg_how)
396 return result
397
~\anaconda3\lib\site-packages\pandas\core\base.py in _agg_1dim(name, how, subset)
377 "nested dictionary is ambiguous in aggregation"
378 )
--> 379 return colg.aggregate(how)
380
381 def _agg_2dim(name, how):
~\anaconda3\lib\site-packages\pandas\core\series.py in aggregate(self, func, axis, *args, **kwargs)
3686 # Validate the axis parameter
3687 self._get_axis_number(axis)
-> 3688 result, how = self._aggregate(func, *args, **kwargs)
3689 if result is None:
3690
~\anaconda3\lib\site-packages\pandas\core\base.py in _aggregate(self, arg, *args, **kwargs)
484 elif is_list_like(arg):
485 # we require a list, but not an 'str'
--> 486 return self._aggregate_multiple_funcs(arg, _axis=_axis), None
487 else:
488 result = None
~\anaconda3\lib\site-packages\pandas\core\base.py in _aggregate_multiple_funcs(self, arg, _axis)
562 result = Series(results, index=keys, name=self.name)
563 if is_nested_object(result):
--> 564 raise ValueError("cannot combine transform and aggregation operations")
565 return result
566
ValueError: cannot combine transform and aggregation operations
The text was updated successfully, but these errors were encountered: