Multi index slicing working only for the last label in the first level index #12697

ssunkara1 · 2016-03-22T23:55:33Z

When I try to slice a multi indexed data frame, the slice interval is ignored for all the entries of the first level index except the last one. This only happens if the length of the slice is beyond a particular value (around 30 in the cases that I observed, but it is not the same across data frames).

This bug seems to have been introduced in version 0.17.1 as this works fine in version 0.16.4

Code Sample

import pandas as pd
import numpy as np

freq = ['a', 'b', 'c', 'd']
idx = pd.MultiIndex.from_product([freq, np.arange(500)])

dfmi = pd.DataFrame(np.random.randn(2000), index=idx, columns=['Test'])
sliced_df = dfmi.loc[pd.IndexSlice[:, 30:70], :]

print sliced_df.loc['a']
print sliced_df.loc['d']

Current Output

         Test
0   -2.288252
1    0.501113
2   -0.581190
3    0.366600
..        ...
496 -1.124694
497 -0.106180
498 -0.348668
499  0.659645

[500 rows x 1 columns]
        Test
30  0.079055
31  2.455371
32  0.014673
33  0.966548
..        ...
67  0.997713
68  1.235465
69 -0.320166
70 -0.968143

Expected Output

        Test
30 -1.025443
31 -1.305710
32  0.614858
33 -0.606788
34 -0.673230
.. ...
68 -1.129218
69 -1.747830
70 -0.611186
        Test
30 -0.679267
31 -0.590352
32  1.000755
33 -0.106813
34 -1.214385
.. ...
68 -1.467416
69 -0.008881
70  0.040510

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 3.13.0-63-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.18.0
nose: 1.3.7
pip: 8.1.0
setuptools: 20.2.2
Cython: 0.22.1
numpy: 1.10.4
scipy: 0.17.0
statsmodels: 0.6.1
xarray: None
IPython: 4.0.0-dev
sphinx: 1.4a1
patsy: 0.3.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.0
numexpr: 2.4.4
matplotlib: 1.5.1
openpyxl: 1.8.5
xlrd: 0.9.3
xlwt: 1.0.0
xlsxwriter: 0.7.3
lxml: 3.4.4
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.5
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.38.0

The text was updated successfully, but these errors were encountered:

jreback · 2016-03-23T11:51:52Z

so this is chained indexing on a multi-index. So a fully-qualified selection works:

In [33]: dfmi.loc[pd.IndexSlice['d', 30:70], :]
Out[33]: 
          Test
d 30  0.244715
  31  0.692720
  32  0.100552
  33 -0.319824
  34 -0.380916
...        ...
  66  4.098519
  67 -1.592476
  68  1.057753
  69 -1.329330
  70  0.048740

[41 rows x 1 columns]

In [34]: dfmi.loc[pd.IndexSlice['a', 30:70], :]
Out[34]: 
          Test
a 30  0.295775
  31 -1.544894
  32  0.098592
  33 -1.111341
  34 -0.165816
...        ...
  66 -0.241730
  67  0.545091
  68  0.958949
  69  0.186351
  70  0.792855

[41 rows x 1 columns]

Might be something odd going on with the indexers. If you'd have a look and see if you can spot where this is happening would be helpful. You can just use your tests script and step thru the debugger.

ruoyu0088 · 2016-03-30T02:27:59Z

I found the problem. It's the line in convert_indexer() in MultiIndex._get_level_indexer():

m[np.in1d(labels,r,assume_unique=True)] = True

Here is the document of the assume_unique of in1d():

    assume_unique : bool, optional
        If True, the input arrays are both assumed to be unique, which
        can speed up the calculation.  Default is False.

but labels is not unique in this case. So set assume_unique=False fix the problem:

m[np.in1d(labels,r,assume_unique=False)] = True

jreback · 2016-04-26T15:30:07Z

@ruoyu0088 if you want to do a PR for this would be great (and set assume_unique=level.is_unique) I think.

tabs to spaces

pandas-dev#15255)

jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Difficulty Intermediate labels Mar 23, 2016

jreback added this to the 0.18.1 milestone Mar 23, 2016

jreback modified the milestones: 0.18.2, 0.18.1 Apr 26, 2016

jorisvandenbossche modified the milestones: 0.20.0, 0.19.0 Aug 19, 2016

mroeschke added a commit to mroeschke/pandas that referenced this issue Jan 29, 2017

TST: MultiIndex Slicing maintains first level index (pandas-dev#12697)

4ac2bcc

mroeschke mentioned this issue Jan 29, 2017

TST: MultiIndex Slicing maintains first level index (#12697) #15255

Merged

3 tasks

mroeschke added a commit to mroeschke/pandas that referenced this issue Jan 29, 2017

TST: MultiIndex Slicing maintains first level index (pandas-dev#12697)

6477929

tabs to spaces

jorisvandenbossche closed this as completed in #15255 Jan 29, 2017

jorisvandenbossche pushed a commit that referenced this issue Jan 29, 2017

TST: MultiIndex Slicing maintains first level index (#12697) (#15255)

7bb4980

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017

TST: MultiIndex Slicing maintains first level index (pandas-dev#12697) (

5e3db2a

pandas-dev#15255)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi index slicing working only for the last label in the first level index #12697

Multi index slicing working only for the last label in the first level index #12697

ssunkara1 commented Mar 22, 2016

jreback commented Mar 23, 2016

ruoyu0088 commented Mar 30, 2016

jreback commented Apr 26, 2016

Multi index slicing working only for the last label in the first level index #12697

Multi index slicing working only for the last label in the first level index #12697

Comments

ssunkara1 commented Mar 22, 2016

Code Sample

Current Output

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Mar 23, 2016

ruoyu0088 commented Mar 30, 2016

jreback commented Apr 26, 2016

output of `pd.show_versions()`