Skip to content

BUG: label-based indexing fails with certain list indexers in case of mixed integers/strings columns names #14836

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jmarrec opened this issue Dec 8, 2016 · 4 comments · Fixed by #27537
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Milestone

Comments

@jmarrec
Copy link
Contributor

jmarrec commented Dec 8, 2016

Code Sample, a copy-pastable example if possible

This works, when all columns are integer labels:

df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,2013], index=list('ABC'))
df.ix[pd.IndexSlice[['A','B','C'],[2011,2012]]]

Out:
       2011      2012
A  0.289341  0.651091
B  0.528271  0.682148
C  0.742617  0.578734

This doesn't, when columns are a mix of integers and strings:

df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,'All'], index=list('ABC'))

# This crashes
df.ix[pd.IndexSlice[['A','B','C'],[2011,2012]]]
> IndexError: index 2011 is out of bounds for axis 0 with size 3

#  this works though:
df.ix[pd.IndexSlice[['A','B','C'],[2011,'All']]]

Problem description

It seems that in the second case it's trying to find it by position rather than label.

Expected Output

I would expect pandas to understand that I'm trying to lookup by label rather than position.

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 28.7.1
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: None
bs4: 4.5.1
html5lib: 0.999999999
httplib2: None
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: 0.2.1

@jmarrec
Copy link
Contributor Author

jmarrec commented Dec 8, 2016

Upon further digging, it seems that it's not IndexSlice itself, but ix too:

df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,'All'], index=list('ABC'))
#df.ix[pd.IndexSlice[['A','B','C'],[2011,2012]]]
# Works
df.ix[:,[2011,2012]]
df.ix[['A','B','C'],[2011,'All']]

# Fails
df.ix[['A','B','C'],[2011,2012]]

Traceback:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _multi_take(self, tup)
    847                 [(a, self._convert_for_reindex(t, axis=o._get_axis_number(a)))
--> 848                  for t, a in zip(tup, o._AXIS_ORDERS)])
    849             return o.reindex(**d)

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in <listcomp>(.0)
    847                 [(a, self._convert_for_reindex(t, axis=o._get_axis_number(a)))
--> 848                  for t, a in zip(tup, o._AXIS_ORDERS)])
    849             return o.reindex(**d)

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _convert_for_reindex(self, key, axis)
    868                 keyarr = _ensure_platform_int(keyarr)
--> 869                 return labels.take(keyarr)
    870 

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/indexes/base.py in take(self, indices, axis, allow_fill, fill_value, **kwargs)
   1514                                                fill_value=fill_value,
-> 1515                                                na_value=self._na_value)
   1516         else:

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/indexes/base.py in _assert_take_fillable(self, values, indices, allow_fill, fill_value, na_value)
   1538         else:
-> 1539             taken = values.take(indices)
   1540         return taken

IndexError: index 2011 is out of bounds for axis 0 with size 3

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-328-c74834bf7769> in <module>()
      6 
      7 # Fails
----> 8 df.ix[['A','B','C'],[2011,2012]]

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in __getitem__(self, key)
     81                 pass
     82 
---> 83             return self._getitem_tuple(key)
     84         else:
     85             key = com._apply_if_callable(key, self.obj)

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
    802         # ugly hack for GH #836
    803         if self._multi_take_opportunity(tup):
--> 804             return self._multi_take(tup)
    805 
    806         # no shortcut needed

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _multi_take(self, tup)
    849             return o.reindex(**d)
    850         except:
--> 851             raise self._exception
    852 
    853     def _convert_for_reindex(self, key, axis=0):

KeyError: 

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Dec 9, 2016

@jmarrec Thanks for the report

I think ix failing here is possibly expected, because for non-integer indexed, ix will fallback to integer positional indexing.
The typical recommenation we then make is: if you know you are using labels to index, you should use loc instead of ix. However, this seems to fail as well:

In [78]: df.loc[['A','B','C'],[2011,2012]]
...
IndexError: index 2011 is out of bounds for axis 0 with size 3

During handling of the above exception, another exception occurred:
...
KeyError: 

It seems to work in some other variations:

In [94]: df.loc[:,[2011, 2012]]
Out[94]: 
       2011      2012
A  0.554463  0.446838
B  0.053866  0.159172
C  0.131302  0.937487

In [95]: df.loc['A',[2011, 2012]]
Out[95]: 
2011    0.554463
2012    0.446838
Name: A, dtype: float64

In [96]: df.loc[['A'],[2011, 2012]]
...
KeyError: 

In [97]: df.loc[['A'],2011]
Out[97]: 
A    0.554463
Name: 2011, dtype: float64

In [99]: df.loc[['A', 'B', 'C'],[2011, 'All']]
Out[99]: 
       2011       All
A  0.554463  0.935564
B  0.053866  0.166841
C  0.131302  0.298964

So it seems to occur for the combo of two lists as indexers and where the one for the columns is only integer.

@jorisvandenbossche jorisvandenbossche changed the title IndexSlice fails when columns are mixed integers and strings BUG: label-based indexing fails with certain list indexers in case of mixed integers/strings columns names Dec 9, 2016
@jorisvandenbossche jorisvandenbossche added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Dec 9, 2016
@ron819
Copy link

ron819 commented Nov 5, 2018

is this planned to be fixed?

@simonjayhawkins
Copy link
Member

The typical recommenation we then make is: if you know you are using labels to index, you should use loc instead of ix. However, this seems to fail as well:

works on 0.24.2 and master. (not tried earlier versions)

Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)] :: Anaconda custom (64-bit) on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.24.2'
>>> import numpy as np
>>> df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,2013], index=list('ABC'))
>>> df
       2011      2012      2013
A  0.933769  0.374327  0.139483
B  0.183546  0.771873  0.010685
C  0.539503  0.413679  0.822604
>>>
>>> df.loc[['A','B','C'],[2011,2012]]
       2011      2012
A  0.933769  0.374327
B  0.183546  0.771873
C  0.539503  0.413679
>>>
>>> df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,'All'], index=list('ABC'))
>>>
>>> df
       2011      2012       All
A  0.681976  0.289243  0.230125
B  0.955009  0.955685  0.599972
C  0.361911  0.052804  0.826324
>>>
>>> df.loc[['A','B','C'],[2011,2012]]
       2011      2012
A  0.681976  0.289243
B  0.955009  0.955685
C  0.361911  0.052804
>>>
>>> df.loc[['A','B','C'],[2011,'All']]
       2011       All
A  0.681976  0.230125
B  0.955009  0.599972
C  0.361911  0.826324
>>>

@jorisvandenbossche would you be happy to close this issue if a test was added for just the .loc scenarios?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants