Skip to content

BUG: multiindex slice 1.1.0rc0 #35351

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leo4183 opened this issue Jul 20, 2020 · 5 comments · Fixed by #35353
Closed

BUG: multiindex slice 1.1.0rc0 #35351

leo4183 opened this issue Jul 20, 2020 · 5 comments · Fixed by #35353
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@leo4183
Copy link

leo4183 commented Jul 20, 2020

mulitindex slicing returns different (from pandas<=1.0.x) results in pandas 1.1.0rc0

df = pd.DataFrame( data=[[0,1,'a',1],[1,0,'a',0],[1,1,'b',1],[1,1,'c',2]], columns=['index','date','author','price'] )

# to keep the most recent date for every index (could happen in practice)
idx = df.drop_duplicates('index',keep='last').set_index(['index','date']).index 

df.set_index(['index','date']).loc[idx,:].reset_index()

df

 index date  author price
0 1 a 1
1 0 a 0
1 1 b 1
1 1 c 2

Result

author price
(0, 1) a 1
(1, 1) b 1
(1, 1) c 2
@leo4183 leo4183 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 20, 2020
@TomAugspurger
Copy link
Contributor

What's the output for 1.0.5?

@leo4183
Copy link
Author

leo4183 commented Jul 20, 2020

for pandas<=1.0.x, the results would look like (as expected):

index date author price
0 1 a 1
1 1 b 1
1 1 c 2

@simonjayhawkins
Copy link
Member

The output was good in 1.0.1, bisecting now.

@simonjayhawkins
Copy link
Member

#33691 appears to be the cause of the regression, labelled 1.1

73fd8a7 is the first bad commit
commit 73fd8a7
Author: jbrockmendel [email protected]
Date: Tue Apr 21 19:31:36 2020 -0700

CLN: remove shallow_copy_with_infer (#33691)

cc @jbrockmendel

@simonjayhawkins simonjayhawkins added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 20, 2020
@simonjayhawkins simonjayhawkins modified the milestones: 1.1.1, 1.1 Jul 20, 2020
@simonjayhawkins
Copy link
Member

simonjayhawkins commented Jul 20, 2020

slight refactor of OP to clearly show the issue.

>>> pd.__version__
'1.1.0rc0'
>>>
>>> df = pd.DataFrame(
...     data=[["a", 1], ["a", 0], ["b", 1], ["c", 2]],
...     columns=["author", "price"],
...     index=pd.MultiIndex.from_tuples(
...         [(0, 1), (1, 0), (1, 1), (1, 1)], names=["index", "date"]
...     ),
... )
>>> df
           author  price
index date
0     1         a      1
1     0         a      0
      1         b      1
      1         c      2
>>>
>>> idx = pd.MultiIndex.from_tuples([(0, 1), (1, 1)], names=["index", "date"])
>>> idx
MultiIndex([(0, 1),
            (1, 1)],
           names=['index', 'date'])
>>>
>>> df.loc[idx, :]
       author  price
(0, 1)      a      1
(1, 1)      b      1
(1, 1)      c      2
>>>
>>> _.index
Index([(0, 1), (1, 1), (1, 1)], dtype='object')
>>>

expected

MultiIndex([(0, 1),
            (1, 1),
            (1, 1)],
           names=['index', 'date'])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants