-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: label-based indexing fails with certain list indexers in case of mixed integers/strings columns names #14836
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Upon further digging, it seems that it's not IndexSlice itself, but df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,'All'], index=list('ABC'))
#df.ix[pd.IndexSlice[['A','B','C'],[2011,2012]]]
# Works
df.ix[:,[2011,2012]]
df.ix[['A','B','C'],[2011,'All']]
# Fails
df.ix[['A','B','C'],[2011,2012]] Traceback: ---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _multi_take(self, tup)
847 [(a, self._convert_for_reindex(t, axis=o._get_axis_number(a)))
--> 848 for t, a in zip(tup, o._AXIS_ORDERS)])
849 return o.reindex(**d)
/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in <listcomp>(.0)
847 [(a, self._convert_for_reindex(t, axis=o._get_axis_number(a)))
--> 848 for t, a in zip(tup, o._AXIS_ORDERS)])
849 return o.reindex(**d)
/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _convert_for_reindex(self, key, axis)
868 keyarr = _ensure_platform_int(keyarr)
--> 869 return labels.take(keyarr)
870
/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/indexes/base.py in take(self, indices, axis, allow_fill, fill_value, **kwargs)
1514 fill_value=fill_value,
-> 1515 na_value=self._na_value)
1516 else:
/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/indexes/base.py in _assert_take_fillable(self, values, indices, allow_fill, fill_value, na_value)
1538 else:
-> 1539 taken = values.take(indices)
1540 return taken
IndexError: index 2011 is out of bounds for axis 0 with size 3
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last)
<ipython-input-328-c74834bf7769> in <module>()
6
7 # Fails
----> 8 df.ix[['A','B','C'],[2011,2012]]
/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in __getitem__(self, key)
81 pass
82
---> 83 return self._getitem_tuple(key)
84 else:
85 key = com._apply_if_callable(key, self.obj)
/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
802 # ugly hack for GH #836
803 if self._multi_take_opportunity(tup):
--> 804 return self._multi_take(tup)
805
806 # no shortcut needed
/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _multi_take(self, tup)
849 return o.reindex(**d)
850 except:
--> 851 raise self._exception
852
853 def _convert_for_reindex(self, key, axis=0):
KeyError: |
@jmarrec Thanks for the report I think
It seems to work in some other variations:
So it seems to occur for the combo of two lists as indexers and where the one for the columns is only integer. |
is this planned to be fixed? |
works on 0.24.2 and master. (not tried earlier versions) Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)] :: Anaconda custom (64-bit) on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.24.2'
>>> import numpy as np
>>> df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,2013], index=list('ABC'))
>>> df
2011 2012 2013
A 0.933769 0.374327 0.139483
B 0.183546 0.771873 0.010685
C 0.539503 0.413679 0.822604
>>>
>>> df.loc[['A','B','C'],[2011,2012]]
2011 2012
A 0.933769 0.374327
B 0.183546 0.771873
C 0.539503 0.413679
>>>
>>> df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,'All'], index=list('ABC'))
>>>
>>> df
2011 2012 All
A 0.681976 0.289243 0.230125
B 0.955009 0.955685 0.599972
C 0.361911 0.052804 0.826324
>>>
>>> df.loc[['A','B','C'],[2011,2012]]
2011 2012
A 0.681976 0.289243
B 0.955009 0.955685
C 0.361911 0.052804
>>>
>>> df.loc[['A','B','C'],[2011,'All']]
2011 All
A 0.681976 0.230125
B 0.955009 0.599972
C 0.361911 0.826324
>>> @jorisvandenbossche would you be happy to close this issue if a test was added for just the .loc scenarios? |
Code Sample, a copy-pastable example if possible
This works, when all columns are integer labels:
This doesn't, when columns are a mix of integers and strings:
Problem description
It seems that in the second case it's trying to find it by position rather than label.
Expected Output
I would expect pandas to understand that I'm trying to lookup by label rather than position.
Output of
pd.show_versions()
pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 28.7.1
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: None
bs4: 4.5.1
html5lib: 0.999999999
httplib2: None
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: