Skip to content

pivot_table raises DataError: No numeric types to aggregate with Timedelta #26938

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
s-celles opened this issue Jun 19, 2019 · 4 comments
Closed
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby Timedelta Timedelta data type

Comments

@s-celles
Copy link
Contributor

Code Sample, a copy-pastable example if possible

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({"Name": ["A", "B", "C", "B", "A", "C"], "Laptime": [48.0,48.0,48.0,48.0,48.0,48.0], "Time": [12.0, 14, 16, 62, 60, 64]})

In [3]: df.dtypes
Out[3]:
Name        object
Laptime    float64
Time       float64
dtype: object

In [4]: pd.pivot_table(df, values="Laptime", index="Time", columns="Name")

returns

Name     A     B     C
Time
12.0  48.0   NaN   NaN
14.0   NaN  48.0   NaN
16.0   NaN   NaN  48.0
60.0  48.0   NaN   NaN
62.0   NaN  48.0   NaN
64.0   NaN   NaN  48.0

but when columns and/or index are Timedelta, pivot_table raises an exception

In [5]: df = pd.DataFrame({"Name": ["A", "B", "C", "B", "A", "C"], "Laptime": [48.0,48.0,48.0,48.0,48.0,48.0], "Time": [12.0, 14, 16, 62, 60, 64]})
In [6]: for colname in ["Laptime", "Time"]:
            df[colname] = pd.to_timedelta(df[colname], unit='s')
In [7]: df.dtypes
Out[7]:
Name                object
Laptime    timedelta64[ns]
Time       timedelta64[ns]
dtype: object
In [8]: pd.pivot_table(df, values="Laptime", index="Time", columns="Name")

it raises

DataError: No numeric types to aggregate

Problem description

raises:

DataError: No numeric types to aggregate

Expected Output

shouldn't raises exception but should return

Name             A         B         C
Time
00:00:12  00:00:48       NaT       NaT
00:00:14       NaT  00:00:48       NaT
00:00:16       NaT       NaT  00:00:48
00:00:60  00:00:48       NaT       NaT
00:00:62       NaT  00:00:48       NaT
00:00:64       NaT       NaT  00:00:48

Output of pd.show_versions()

pd.show_versions()
/Users/scls/anaconda3/lib/python3.7/site-packages/psycopg2/init.py:144: UserWarning: The psycopg2 wheel package will be renamed from release 2.8; in order to keep installing from binary please use "pip install psycopg2-binary" instead. For details see: http://initd.org/psycopg/docs/install.html#binary-install-from-pypi.
""")

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: fr_FR.UTF-8
LOCALE: fr_FR.UTF-8

pandas: 0.24.2
pytest: 4.5.0
pip: 19.1
setuptools: 41.0.1
Cython: 0.29.7
numpy: 1.16.4
scipy: 1.2.1
pyarrow: None
xarray: None
IPython: 7.5.0
sphinx: 2.0.1
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: 1.2.1
tables: 3.5.1
numexpr: 2.6.9
feather: None
matplotlib: 3.0.3
openpyxl: 2.6.1
xlrd: 1.2.0
xlwt: 1.3.0
xlsxwriter: 1.1.8
lxml.etree: 4.3.3
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: 1.3.3
pymysql: 0.9.3
psycopg2: 2.7.7 (dt dec pq3 ext lo64)
jinja2: 2.10.1
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@s-celles
Copy link
Contributor Author

Same problem also occurs with datetime64

In [10]: df = pd.DataFrame({"Name": ["A", "B", "C", "B", "A", "C"], "Laptime": [48.0, 48.0, 48.0, 48.0, 48.0, 48.0], "Time": [12.0, 14, 16, 62, 60, 64]})
    ...:
    ...: for colname in ["Laptime", "Time"]:
    ...:     df[colname] = pd.to_timedelta(df[colname], unit='s') + pd.to_datetime(0)
    ...:
    ...: df.dtypes
Out[10]:
Name               object
Laptime    datetime64[ns]
Time       datetime64[ns]
dtype: object

In [11]: pd.pivot_table(df, values="Laptime", index="Time", columns="Name")

also raises

DataError: No numeric types to aggregate

@simonjayhawkins
Copy link
Member

@scls19fr Thanks for the report.

This looks like a bug in the groupby machinery as the mean of TimeDelta values with missing values is a valid operation.

Further investigation and PRs welcome.

@simonjayhawkins
Copy link
Member

The root cause is the same as #17382 so i'm going to close this for now as a duplicate, but certainly the code sample would be a good test case.

@simonjayhawkins simonjayhawkins added the Duplicate Report Duplicate issue or pull request label Jun 19, 2019
@CKoes
Copy link

CKoes commented Feb 11, 2021

The problem is due to the dtype of column NAME. dtype 'object' was causing the same error on my side. I had a DataFrame with one column of dtype 'object' containing only floats . After chaning the type of this column to 'float64' I could use pivot_table as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Duplicate Report Duplicate issue or pull request Groupby Timedelta Timedelta data type
Projects
None yet
Development

No branches or pull requests

3 participants