Skip to content

Calling pandas.cut with series of timedelta and timedelta bins raises TypeError, but should succeed #20607

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nmusolino opened this issue Apr 4, 2018 · 3 comments
Labels
Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@nmusolino
Copy link
Contributor

nmusolino commented Apr 4, 2018

Problem description

Attempting to call pandas.cut with a Series that contains timedelta values and bins that are timedelta values fails with TypeError.

Code Sample

In [1]: import datetime

In [2]: import numpy

In [3]: import pandas

In [4]: s = pandas.date_range('1/1/2018', periods=5, freq='s').to_series() - datetime.datetime(2018, 1, 1, 0, 0, 0)

In [5]: s
Out[5]:
2018-01-01 00:00:00   00:00:00
2018-01-01 00:00:01   00:00:01
2018-01-01 00:00:02   00:00:02
2018-01-01 00:00:03   00:00:03
2018-01-01 00:00:04   00:00:04
Freq: S, dtype: timedelta64[ns]

In [6]: pandas.cut(s, bins=[numpy.timedelta64(t, 's') for t in [0, 2, 5]])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-42f313379671> in <module>()
----> 1 pandas.cut(s, bins=[numpy.timedelta64(t, 's') for t in [0, 2, 5]])
      2
      3

C:\...\lib\site-packages\pandas\tools\tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest)
    112     else:
    113         bins = np.asarray(bins)
--> 114         if (np.diff(bins) < 0).any():
    115             raise ValueError('bins must increase monotonically.')
    116

TypeError: Cannot cast ufunc less input from dtype('<m8[s]') to dtype('<m8') with casting rule 'same_kind'

This also occurs when using nanosecond timedeltas ([11]), and with pandas.Timedelta values ([14])

In [9]: bins = numpy.array([numpy.timedelta64(t * 1000000000, 'ns') for t in [0, 2, 5]])

In [10]: bins
Out[10]: array([         0, 2000000000, 5000000000], dtype='timedelta64[ns]')

In [11]: pandas.cut(s, bins=bins)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-83e991bf4051> in <module>()
----> 1 pandas.cut(s, bins=bins)

C:\...\lib\site-packages\pandas\tools\tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest)
    112     else:
    113         bins = np.asarray(bins)
--> 114         if (np.diff(bins) < 0).any():
    115             raise ValueError('bins must increase monotonically.')
    116

TypeError: Cannot cast ufunc less input from dtype('<m8[ns]') to dtype('<m8') with casting rule 'same_kind'

In [14]: pandas.cut(s, bins=[pandas.to_timedelta(t, unit='s') for t in [0, 2, 5]])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-8f423565c547> in <module>()
----> 1 pandas.cut(s, bins=[pandas.to_timedelta(t, unit='s') for t in [0, 2, 5]])

C:\...\lib\site-packages\pandas\tools\tile.py in cut(x, bins, right, labels, retbins, precision, include_lowest)
    112     else:
    113         bins = np.asarray(bins)
--> 114         if (np.diff(bins) < 0).any():
    115             raise ValueError('bins must increase monotonically.')
    116

pandas\tslib.pyx in pandas.tslib._Timedelta.__richcmp__ (pandas\tslib.c:46569)()

TypeError: Cannot compare type 'Timedelta' with type 'int'

On line 114 of tile.py in the exception traceback above:

if (np.diff(bins) < 0).any():

the comparison to zero should really be a comparison to zero in the datatype of the series.

Expected Output

The call should succeed, returning a series of bins that correspond to each element.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: 1.5.0
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.3
html5lib: 0.999
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.1.3
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: 2.43.0
pandas_datareader: None

@nmusolino
Copy link
Contributor Author

See also issue #19891.

@nmusolino nmusolino changed the title Calling pandas.cut with series of timedelta and timedelta bins raises Calling pandas.cut with series of timedelta and timedelta bins raises TypeError, but should succeed Apr 4, 2018
@gfyoung gfyoung added Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Apr 10, 2018
@gfyoung
Copy link
Member

gfyoung commented Apr 10, 2018

the comparison to zero should really be a comparison to zero in the datatype of the series.

Does that patch work for you BTW?

@mroeschke
Copy link
Member

Looks like this was fixed back in v0.20

- ``pd.cut`` and ``pd.qcut`` now support datetime64 and timedelta64 dtypes (:issue:`14714`, :issue:`14798`)

And we have tests for testing timedelta bins with cut.

@gfyoung gfyoung added this to the No action milestone Nov 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

3 participants