BUG: label-based indexing fails with certain list indexers in case of mixed integers/strings columns names #14836

jmarrec · 2016-12-08T23:30:48Z

Code Sample, a copy-pastable example if possible

This works, when all columns are integer labels:

df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,2013], index=list('ABC'))
df.ix[pd.IndexSlice[['A','B','C'],[2011,2012]]]

Out:
       2011      2012
A  0.289341  0.651091
B  0.528271  0.682148
C  0.742617  0.578734

This doesn't, when columns are a mix of integers and strings:

df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,'All'], index=list('ABC'))

# This crashes
df.ix[pd.IndexSlice[['A','B','C'],[2011,2012]]]
> IndexError: index 2011 is out of bounds for axis 0 with size 3

#  this works though:
df.ix[pd.IndexSlice[['A','B','C'],[2011,'All']]]

Problem description

It seems that in the second case it's trying to find it by position rather than label.

Expected Output

I would expect pandas to understand that I'm trying to lookup by label rather than position.

Output of `pd.show_versions()`

INSTALLED VERSIONS ------------------ commit: None

pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 28.7.1
Cython: 0.25.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: None
bs4: 4.5.1
html5lib: 0.999999999
httplib2: None
apiclient: None
sqlalchemy: 1.1.4
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: 0.2.1

The text was updated successfully, but these errors were encountered:

jmarrec · 2016-12-08T23:39:12Z

Upon further digging, it seems that it's not IndexSlice itself, but ix too:

df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,'All'], index=list('ABC'))
#df.ix[pd.IndexSlice[['A','B','C'],[2011,2012]]]
# Works
df.ix[:,[2011,2012]]
df.ix[['A','B','C'],[2011,'All']]

# Fails
df.ix[['A','B','C'],[2011,2012]]

Traceback:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _multi_take(self, tup)
    847                 [(a, self._convert_for_reindex(t, axis=o._get_axis_number(a)))
--> 848                  for t, a in zip(tup, o._AXIS_ORDERS)])
    849             return o.reindex(**d)

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in <listcomp>(.0)
    847                 [(a, self._convert_for_reindex(t, axis=o._get_axis_number(a)))
--> 848                  for t, a in zip(tup, o._AXIS_ORDERS)])
    849             return o.reindex(**d)

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _convert_for_reindex(self, key, axis)
    868                 keyarr = _ensure_platform_int(keyarr)
--> 869                 return labels.take(keyarr)
    870 

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/indexes/base.py in take(self, indices, axis, allow_fill, fill_value, **kwargs)
   1514                                                fill_value=fill_value,
-> 1515                                                na_value=self._na_value)
   1516         else:

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/indexes/base.py in _assert_take_fillable(self, values, indices, allow_fill, fill_value, na_value)
   1538         else:
-> 1539             taken = values.take(indices)
   1540         return taken

IndexError: index 2011 is out of bounds for axis 0 with size 3

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
<ipython-input-328-c74834bf7769> in <module>()
      6 
      7 # Fails
----> 8 df.ix[['A','B','C'],[2011,2012]]

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in __getitem__(self, key)
     81                 pass
     82 
---> 83             return self._getitem_tuple(key)
     84         else:
     85             key = com._apply_if_callable(key, self.obj)

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
    802         # ugly hack for GH #836
    803         if self._multi_take_opportunity(tup):
--> 804             return self._multi_take(tup)
    805 
    806         # no shortcut needed

/Users/julien/Virtualenvs/py35/lib/python3.5/site-packages/pandas/core/indexing.py in _multi_take(self, tup)
    849             return o.reindex(**d)
    850         except:
--> 851             raise self._exception
    852 
    853     def _convert_for_reindex(self, key, axis=0):

KeyError:

jorisvandenbossche · 2016-12-09T08:51:41Z

@jmarrec Thanks for the report

I think ix failing here is possibly expected, because for non-integer indexed, ix will fallback to integer positional indexing.
The typical recommenation we then make is: if you know you are using labels to index, you should use loc instead of ix. However, this seems to fail as well:

In [78]: df.loc[['A','B','C'],[2011,2012]]
...
IndexError: index 2011 is out of bounds for axis 0 with size 3

During handling of the above exception, another exception occurred:
...
KeyError:

It seems to work in some other variations:

In [94]: df.loc[:,[2011, 2012]]
Out[94]: 
       2011      2012
A  0.554463  0.446838
B  0.053866  0.159172
C  0.131302  0.937487

In [95]: df.loc['A',[2011, 2012]]
Out[95]: 
2011    0.554463
2012    0.446838
Name: A, dtype: float64

In [96]: df.loc[['A'],[2011, 2012]]
...
KeyError: 

In [97]: df.loc[['A'],2011]
Out[97]: 
A    0.554463
Name: 2011, dtype: float64

In [99]: df.loc[['A', 'B', 'C'],[2011, 'All']]
Out[99]: 
       2011       All
A  0.554463  0.935564
B  0.053866  0.166841
C  0.131302  0.298964

So it seems to occur for the combo of two lists as indexers and where the one for the columns is only integer.

ron819 · 2018-11-05T14:16:46Z

is this planned to be fixed?

simonjayhawkins · 2019-07-23T03:40:08Z

The typical recommenation we then make is: if you know you are using labels to index, you should use loc instead of ix. However, this seems to fail as well:

works on 0.24.2 and master. (not tried earlier versions)

Python 3.7.3 (default, Mar 27 2019, 17:13:21) [MSC v.1915 64 bit (AMD64)] :: Anaconda custom (64-bit) on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.__version__
'0.24.2'
>>> import numpy as np
>>> df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,2013], index=list('ABC'))
>>> df
       2011      2012      2013
A  0.933769  0.374327  0.139483
B  0.183546  0.771873  0.010685
C  0.539503  0.413679  0.822604
>>>
>>> df.loc[['A','B','C'],[2011,2012]]
       2011      2012
A  0.933769  0.374327
B  0.183546  0.771873
C  0.539503  0.413679
>>>
>>> df = pd.DataFrame(np.random.rand(3,3), columns=[2011,2012,'All'], index=list('ABC'))
>>>
>>> df
       2011      2012       All
A  0.681976  0.289243  0.230125
B  0.955009  0.955685  0.599972
C  0.361911  0.052804  0.826324
>>>
>>> df.loc[['A','B','C'],[2011,2012]]
       2011      2012
A  0.681976  0.289243
B  0.955009  0.955685
C  0.361911  0.052804
>>>
>>> df.loc[['A','B','C'],[2011,'All']]
       2011       All
A  0.681976  0.230125
B  0.955009  0.599972
C  0.361911  0.826324
>>>

@jorisvandenbossche would you be happy to close this issue if a test was added for just the .loc scenarios?

jorisvandenbossche changed the title ~~IndexSlice fails when columns are mixed integers and strings~~ BUG: label-based indexing fails with certain list indexers in case of mixed integers/strings columns names Dec 9, 2016

jorisvandenbossche added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Dec 9, 2016

simonjayhawkins mentioned this issue Jul 23, 2019

TST: label-based indexing fails with certain list indexers in case of… #27537

Merged

4 tasks

jreback added this to the 1.0 milestone Jul 23, 2019

jreback closed this as completed in #27537 Jul 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: label-based indexing fails with certain list indexers in case of mixed integers/strings columns names #14836

BUG: label-based indexing fails with certain list indexers in case of mixed integers/strings columns names #14836

jmarrec commented Dec 8, 2016 •

edited

Loading

jmarrec commented Dec 8, 2016 •

edited

Loading

jorisvandenbossche commented Dec 9, 2016 •

edited

Loading

ron819 commented Nov 5, 2018

simonjayhawkins commented Jul 23, 2019

BUG: label-based indexing fails with certain list indexers in case of mixed integers/strings columns names #14836

BUG: label-based indexing fails with certain list indexers in case of mixed integers/strings columns names #14836

Comments

jmarrec commented Dec 8, 2016 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jmarrec commented Dec 8, 2016 • edited Loading

jorisvandenbossche commented Dec 9, 2016 • edited Loading

ron819 commented Nov 5, 2018

simonjayhawkins commented Jul 23, 2019

jmarrec commented Dec 8, 2016 •

edited

Loading

Output of `pd.show_versions()`

jmarrec commented Dec 8, 2016 •

edited

Loading

jorisvandenbossche commented Dec 9, 2016 •

edited

Loading