Partial indexing with a list and hierarchical index #13501

jseabold · 2016-06-23T14:00:53Z

Code Sample, a copy-pastable example if possible

Teaching a pandas course. Attendee just came across this. Note that we index with a list instead of a tuple at the bottom.

frame = pd.DataFrame(np.arange(12).reshape(( 4, 3)),
                  index =[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
                  columns =[['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']])
frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
frame
frame.loc[['b', 2], 'Colorado']
frame.loc[['b', 1], 'Colorado']

Returns

color      Green
key1 key2
b    1         8
     2        11

in both cases on pandas 0.18.1

Expected Output

Error?

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.4.final.0
python-bits: 64
OS: Darwin
OS-release: 15.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: None
pip: 8.1.1
setuptools: 20.3
Cython: None
numpy: 1.11.0
scipy: 0.17.1
statsmodels: None
xarray: None
IPython: 4.0.3
sphinx: None
patsy: None
dateutil: 2.4.2
pytz: 2015.6
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.8
pymysql: None
psycopg2: 2.6.1 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: 2.39.0
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2016-06-23T15:48:15Z

So if you have a non-hierarchical index, you expect a reindex:

In [59]: frame2 = frame.reset_index(level=1, drop=True)

In [60]: frame2
Out[60]:
state  Ohio     Colorado
color Green Red    Green
key1
a         0   1        2
a         3   4        5
b         6   7        8
b         9  10       11

In [63]: frame2.loc[['b', 2], 'Colorado']
Out[63]:
color  Green
key1
b        8.0
b       11.0
2        NaN

So loc is rather liberal on its inputs and does not raise with a list-indexer when at least one of the labels is found.
The problem with the multi-index case is that reindexing is not really an option if you do not provide full indexers (containing all levels, eg frame.loc[[('b', 2), ('c', 3)], 'Colorado'] does reindex)

So in this case it would maybe indeed make sense to raise ? Other option is to keep as is and ignore those values, or to do a reindex with an empty label for the second level (like reindexing with [(2, NaN)])

I thought we already had a discussion on this once, but can't directly find the issue.

jreback · 2016-06-23T20:56:17Z

so you are using slicers implicitly here and its ambiguous, see docs here

You want something like this?

In [12]: idx = pd.IndexSlice

In [19]: frame.loc[idx['b', 1], 'Colorado']
Out[19]: 
color
Green    8
Name: (b, 1), dtype: int64

In [21]: frame.loc[idx[['b'], 1], 'Colorado']
Out[21]: 
color      Green
key1 key2       
b    1         8

This is 'partial' indexing (e.g. I can use ':' for give me everything for that level)

In [29]: frame.loc[idx['a', :], 'Colorado']
Out[29]: 
color      Green
key1 key2       
a    1         2
     2         5

So giving a list to the entire input is an error, you can give a list to a single level (e.g. with multiple values).

I think this part is a bug

In [22]: frame.loc[idx[['b', 1]], 'Colorado']
Out[22]: 
color      Green
key1 key2       
b    1         8
     2        11

jorisvandenbossche · 2016-06-24T13:51:16Z

@jreback You don't need the IndexSlicer to access a single element, doing frame.loc[('b', 1), 'Colorado'] (using a tuple) is perfectly fine IMO?

I suppose the issue was raised (@jseabold correct me if I am wrong) is because somebody wanted to do the above (and so should have used tuple) but used a list, and that the output was then a bit unexpected.

The frame.loc[['b', 2], 'Colorado'] is interpreted (I think) like frame.loc[(['b', 2], slice(None)), 'Colorado'] or frame.loc[idx[['b', 2], :], 'Colorado'].
This is similar like frame.loc[['a', 'b'], 'Colorado'] which gives you correctly the full frame:

In [11]: frame.loc[['a', 'b'], 'Colorado']
Out[11]:
color      Green
key1 key2
a    1         2
     2         5
b    1         8
     2        11

So the issue is more regarding: what to do when not all labels are included in a list indexer in case of a MultiIndex ?

You can also see this issue when using the more explicit IndexSlice:

In [17]: frame.loc[idx[['b', 'c'], :], 'Colorado']
Out[17]:
color      Green
key1 key2
b    1         8
     2        11

The 'c' is ignored in this case (for a single index, you would get a reindex operation introducing NaNs).

mroeschke · 2021-05-01T22:29:55Z

Looks like this example raises now which makes sense to me. Could use a test

In [52]: frame = pd.DataFrame(np.arange(12).reshape(( 4, 3)),
    ...:                   index =[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],
    ...:                   columns =[['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']])
    ...: frame.index.names = ['key1', 'key2']
    ...: frame.columns.names = ['state', 'color']

In [53]: frame
Out[53]:
state      Ohio     Colorado
color     Green Red    Green
key1 key2
a    1        0   1        2
     2        3   4        5
b    1        6   7        8
     2        9  10       11

In [54]: frame.loc[['b', 2], 'Colorado']
KeyError: '[2] not in index'

jreback added API Design Error Reporting Incorrect or improved errors from pandas MultiIndex Difficulty Advanced labels Jun 23, 2016

jreback added this to the Next Major Release milestone Jun 23, 2016

jbrockmendel removed Effort Low labels Oct 21, 2019

mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed API Design Error Reporting Incorrect or improved errors from pandas MultiIndex labels May 1, 2021

mroeschke mentioned this issue May 15, 2021

TST: Add tests for old issues #41482

Merged

10 tasks

mroeschke modified the milestones: Contributions Welcome, 1.3 May 15, 2021

jreback closed this as completed in #41482 May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partial indexing with a list and hierarchical index #13501

Partial indexing with a list and hierarchical index #13501

jseabold commented Jun 23, 2016

jorisvandenbossche commented Jun 23, 2016

jreback commented Jun 23, 2016 •

edited

Loading

jorisvandenbossche commented Jun 24, 2016

mroeschke commented May 1, 2021

Partial indexing with a list and hierarchical index #13501

Partial indexing with a list and hierarchical index #13501

Comments

jseabold commented Jun 23, 2016

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

jorisvandenbossche commented Jun 23, 2016

jreback commented Jun 23, 2016 • edited Loading

jorisvandenbossche commented Jun 24, 2016

mroeschke commented May 1, 2021

output of `pd.show_versions()`

jreback commented Jun 23, 2016 •

edited

Loading