-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
reorder_categories() on categorical index of interval values produces bizarre results #23452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks, I think the root cause of this issue is a bug in In [2]: ii1 = pd.interval_range(0, 3)
In [3]: ii1
Out[3]:
IntervalIndex([(0, 1], (1, 2], (2, 3]],
closed='right',
dtype='interval[int64]')
In [4]: ii2 = ii1[::-1]
In [5]: ii2
Out[5]:
IntervalIndex([(2, 3], (1, 2], (0, 1]],
closed='right',
dtype='interval[int64]')
In [6]: ii2.get_indexer(ii1)
Out[6]: array([-1, -1, -1], dtype=int64) The output in Coincidentally, I'm currently working on another issue that fixes the bug above (among other things), which does indeed fix the issue you've described. On my WIP branch: In [2]: ii1 = pd.interval_range(0, 3)
In [3]: ii2 = ii1[::-1]
In [4]: ii2.get_indexer(ii1)
Out[4]: array([2, 1, 0], dtype=int64)
In [5]: e=pd.DataFrame(index=pd.CategoricalIndex([pd.Interval(0,1), pd.Interval(1,2), pd.Interval(2,3)],ordered=True),data={'x':[1,2,3]})
...: e.index=e.index.reorder_categories(e.index.categories[::-1])
...: e.sort_index()
...:
Out[5]:
x
(2, 3] 3
(1, 2] 2
(0, 1] 1 Not quite ready to open up a PR yet, but I'm planning for my WIP branch to be incorporated in the next release (0.24.0), so this bug should be fixed then too. |
This looks fixed on master. Could use a test
|
take |
Problem description
Categorical indexes where the individual values are of type interval produce bizarre results when reordered with
.reorder_categories
. This important because these indexes and columns are produced by.cut
, so they occur in analysis even if the user never directly creates them.Minimal Example
With strings, this works as expected:
With intervals, it does not:
Expected Output
More Realistic Example
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 3.16.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.23.0
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: