-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pandas.cut incorrectly raises a ValueError due to an overflow #26045
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It looks like a potential fix is to cast the diff --git a/pandas/core/reshape/tile.py b/pandas/core/reshape/tile.py
index f99fd9004b..a9271404be 100644
--- a/pandas/core/reshape/tile.py
+++ b/pandas/core/reshape/tile.py
@@ -230,7 +230,7 @@ def cut(x, bins, right=True, labels=None, retbins=False, precision=3,
else:
bins = np.asarray(bins)
bins = _convert_bin_to_numeric_type(bins, dtype)
- if (np.diff(bins) < 0).any():
+ if (np.diff(bins.astype('float64')) < 0).any():
raise ValueError('bins must increase monotonically.')
fac, bins = _bins_to_cuts(x, bins, right=right, labels=labels, This looks to provide the expected output for the example data. Not sure if this is the best solution and haven't run the tests to see if this causes any unintended issues. |
I see that you welcome contributions on this issue, I would like to give it a try. |
@Batalex : sure, go for it! |
Batalex
added a commit
to Batalex/pandas
that referenced
this issue
Apr 12, 2019
Batalex
added a commit
to Batalex/pandas
that referenced
this issue
Apr 12, 2019
4 tasks
jreback
pushed a commit
that referenced
this issue
Apr 19, 2019
yhaque1213
pushed a commit
to yhaque1213/pandas
that referenced
this issue
Apr 22, 2019
ryanreh99
pushed a commit
to ryanreh99/pandas
that referenced
this issue
Apr 22, 2019
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Code Sample, a copy-pastable example if possible
The issue appears to be due to an overflow in
np.diff
, which in turn makes it appear that thebins
are not monotonically increasing. Essentially the following:Problem description
The
bins
are monotonically increasing but aValueError
is raised indicating that they aren't.Expected Output
I'd expect
[4]
to not raise aValueError
.Output of
pd.show_versions()
INSTALLED VERSIONS
commit: 6d9b702
python: 3.6.8.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None
pandas: 0.25.0.dev0+389.g6d9b702a66
pytest: 4.2.0
pip: 19.0.1
setuptools: 40.6.3
Cython: 0.28.2
numpy: 1.14.6
scipy: 1.0.0
pyarrow: 0.6.0
xarray: 0.9.6
IPython: 7.2.0
sphinx: 1.8.2
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: 0.4.0
matplotlib: 2.0.2
openpyxl: 2.4.8
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 0.9.8
lxml.etree: 3.8.0
bs4: None
html5lib: 0.999
sqlalchemy: 1.1.13
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: 0.1.5
pandas_gbq: None
pandas_datareader: None
gcsfs: None
The text was updated successfully, but these errors were encountered: