BUG: Inconsistent result dtypes for aggregation functions (e.g. sum, mean, .apply(np.sum), ...) #50628
Open
2 of 3 tasks
Labels
Apply
Apply, Aggregate, Transform, Map
Bug
Reduction Operations
sum, mean, min, max, etc.
Timedelta
Timedelta data type
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
There are two related issues:
Summing over a DataFrame with a empty column of dtype
timedelta64[ns]
raisesTypeError
. Summing over an empty series of this dtype works out fine and returns a timedelta of 0, as expected.The second issue is, that
df.mean()
anddf.sum()
behave differently concerning the output dtype.mean
keeps the output types from the input, resulting in a series of dtypeobject
whereassum
converts the result values tofloat64
.mean
andsum
are only examples, the same arguments can be made for other functions likemedian
,std
etc. as well.Expected Behavior
For the first issue, I'd expect the same behavior as for the Series, i.e. timedelta 0 for the empty timedelta column.
For the second issue I'd expect the outputs of DataFrame aggregation functions have a consistent dtype strategy.
In my opinion, the behavior of
mean
is correct, i.e. keeping the input data types.Removing the
if
-blockfrom DataFrame._reduce() fixes the TypeError and makes the behavior of
sum
consistent withmean
. On the other hand, thetest_apply_funcs_over_empty
test will then fail, because the results aren't consistent with.apply(np.func)
anymore.As mentioned, I'd prefer the behavior of
mean
, but that's probably for someone with better insights into the implementation and API design to decide. But even if the conversion tofloat64
is intentional, this should probably be consistent between aggregation functions, e.g. not only for sum, but also for mean, that preserves the dtype at the moment.Installed Versions
INSTALLED VERSIONS
commit : 8dab54d
python : 3.11.0.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22621
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : de_DE.cp1252
pandas : 1.5.2
numpy : 1.24.1
pytz : 2022.7
dateutil : 2.8.2
setuptools : 65.5.0
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None
The text was updated successfully, but these errors were encountered: