REGR: setitem with integer slices on Int/RangeIndex is broken (label instead of positional) #31469

amueller · 2020-01-30T18:13:38Z

There's an backward incompatible change in pandas 1.0 that I didn't find in the changelog. I might have just overlooked it though.

import numpy as np
X = pd.DataFrame(np.zeros((100, 1)))
X[-4:] = 1
X

In pandas 0.25.3 or lower, this results in the last four entries of X to be 1 and all the others zero. In pandas 1.0, it results in all entries of X being 1.
I assume it's a change of indexing axis 0 or axis 1?

amueller · 2020-01-30T18:21:21Z

I wonder if it's related to #31449 but I'm not using a multi-index.

MarcoGorelli · 2020-01-30T18:38:34Z

Thanks for the report.

Seems this doesn't affect .iloc:

In [26]: import numpy as np 
    ...: X = pd.DataFrame(np.zeros((5, 1))) 
    ...: X.iloc[-4:] = 1 
    ...: X                                                                      
Out[26]: 
     0
0  0.0
1  1.0
2  1.0
3  1.0
4  1.0

will look into it

jreback · 2020-01-30T22:01:53Z

you are label indexing with a slice with loc
since none of the labels exist nothing is set

did this actually work previously?

this should never have worked with .loc

it might have with [] which has fallback integer indexing

jorisvandenbossche · 2020-01-30T22:12:00Z

I do not remember any specific discussion about this, so I think it is definitely a regression.

Slicing rows in [] has always worked positional if there is an integer index (surprising, yes, but longstanding behaviour, see eg my summary of this of 5 years ago #9595)

MarcoGorelli · 2020-01-30T22:14:31Z

since none of the labels exist nothing is set

@jreback if I've understood correctly, the issue is that everything is being set

>>> import numpy as np
>>> import pandas as pd

>>> X = pd.DataFrame(np.zeros((5, 1)))
>>> X                                                                     
     0
0  0.0
1  0.0
2  0.0
3  0.0
4  0.0

>>> X[-4:]  # only prints the last 4 rows of X...
     0
1  0.0
2  0.0
3  0.0
4  0.0

>>> X[-4:] = 1
>>> X  # ...but everything (including the first row) has now been set
     0
0  1.0
1  1.0
2  1.0
3  1.0
4  1.0

jreback · 2020-01-30T22:15:16Z

I do not remember any specific discussion about this, so I think it is definitely a regression.

Slicing rows in [] has always worked positional if there is an integer index (surprising, yes, but longstanding behaviour, see eg my summary of this of 5 years ago #9595)

maybe but indexing with an out or range label on both sides should return nothing

so the results are correct

jreback · 2020-01-30T22:17:16Z

since none of the labels exist nothing is set

@jreback if I've understood correctly, the issue is that everything is being set

>>> import numpy as np
>>> import pandas as pd

>>> X = pd.DataFrame(np.zeros((5, 1)))
>>> X                                                                     
     0
0  0.0
1  0.0
2  0.0
3  0.0
4  0.0

>>> X[-4:]  # only prints the last 4 rows of X...
     0
1  0.0
2  0.0
3  0.0
4  0.0

>>> X[-4:] = 1
>>> X  # ...but everything (including the first row) has now been set
     0
0  1.0
1  1.0
2  1.0
3  1.0
4  1.0

ahh ok that is not correct; i would expect this indexer to return noting

jreback · 2020-01-30T22:20:53Z

might be #31393

amueller · 2020-01-30T22:59:27Z

ahh ok that is not correct; i would expect this indexer to return noting

Asking for the shape, both in 0.25 and 1.0, you get

>>> X[-4:].shape
(4, 1)

but assignment in version 1.0 assigns to everything.

jorisvandenbossche · 2020-01-31T07:04:30Z

maybe but indexing with an out or range label on both sides should return nothing

This is about positional indexing, so there is no "out of range label". The -4 means start from the fourth last element to the end.

Again, I agree this is surprising behaviour. You would think it is label-based indexing, but it is not. I already described this 5 years in ago #9595.

Some examples to illustrate this:

In [21]: df = pd.DataFrame({'a': [0., 1., 2., 3.]}, index=[2, 3, 4, 5])

In [22]: df 
Out[22]: 
     a
2  0.0
3  1.0
4  2.0
5  3.0

In [23]: df[2:] 
Out[23]: 
     a
4  2.0
5  3.0

In [24]: df[:3]  
Out[24]: 
     a
2  0.0
3  1.0
4  2.0

This those examples are for __getitem__, and work clearly positionally if you look at the index of the results (and both on 0.25 and 1.0, and for both Int64Index as RangeIndex).
And so it is __setitem__ is broken in 1.0.0.

jorisvandenbossche · 2020-01-31T12:05:53Z

This is caused by #27383 I think (cc @jbrockmendel ), specifically:

     def _setitem_slice(self, key, value):
         self._check_setitem_copy()
-        self.loc._setitem_with_indexer(key, value)
+        self.loc[key] = value

amueller · 2020-01-31T16:22:58Z

Thanks for investigating @jorisvandenbossche

jorisvandenbossche · 2020-01-31T16:26:24Z

BTW, I think this is a rather serious regression, since it doesn't give an error, but rather silently modifies/corrupts your data, and thus can silently lead to wrong results. We should probably try to do a 1.0.1 quickly.

TomAugspurger · 2020-01-31T16:29:34Z

Agreed. I won't be able to this weekend, but perhaps Monday?

I'm hoping to fix up a bunch of the reported regressions today.

jbrockmendel · 2020-01-31T19:22:50Z

I'll start a branch reverting the lines @jorisvandenbossche identified and open a PR after confirming that fixes this.

After this is fixed for 1.0.1, we should discuss deprecating the surprising behavior.

amueller mentioned this issue Jan 30, 2020

add print for debugging lol dabl/dabl#182

Merged

MarcoGorelli added the Regression Functionality that used to work in a prior pandas version label Jan 30, 2020

This was referenced Jan 30, 2020

Add example for discrete_scatter dabl/dabl#176

Merged

fix: really long column names mess up plots dabl/dabl#180

Merged

jorisvandenbossche added this to the 1.0.1 milestone Jan 30, 2020

jorisvandenbossche added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 30, 2020

jorisvandenbossche changed the title ~~Indexing change with integer slices not in changelog?~~ REGR: __setitem__ with integer slices on Int/RangeIndex is broken (label instead of positional) Jan 31, 2020

jbrockmendel added a commit to jbrockmendel/pandas that referenced this issue Jan 31, 2020

REG: DataFrame.__setitem__(slice) is positional, closes pandas-dev#31469

24b2f38

jbrockmendel mentioned this issue Jan 31, 2020

REGR: DataFrame.__setitem__(slice, val) is positional #31515

Merged

5 tasks

jreback closed this as completed in #31515 Feb 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REGR: setitem with integer slices on Int/RangeIndex is broken (label instead of positional) #31469

REGR: setitem with integer slices on Int/RangeIndex is broken (label instead of positional) #31469

amueller commented Jan 30, 2020 •

edited

Loading

amueller commented Jan 30, 2020

MarcoGorelli commented Jan 30, 2020 •

edited

Loading

jreback commented Jan 30, 2020

jorisvandenbossche commented Jan 30, 2020

MarcoGorelli commented Jan 30, 2020 •

edited

Loading

jreback commented Jan 30, 2020

jreback commented Jan 30, 2020

jreback commented Jan 30, 2020

amueller commented Jan 30, 2020

jorisvandenbossche commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

amueller commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

TomAugspurger commented Jan 31, 2020

jbrockmendel commented Jan 31, 2020

REGR: __setitem__ with integer slices on Int/RangeIndex is broken (label instead of positional) #31469

REGR: __setitem__ with integer slices on Int/RangeIndex is broken (label instead of positional) #31469

Comments

amueller commented Jan 30, 2020 • edited Loading

amueller commented Jan 30, 2020

MarcoGorelli commented Jan 30, 2020 • edited Loading

jreback commented Jan 30, 2020

jorisvandenbossche commented Jan 30, 2020

MarcoGorelli commented Jan 30, 2020 • edited Loading

jreback commented Jan 30, 2020

jreback commented Jan 30, 2020

jreback commented Jan 30, 2020

amueller commented Jan 30, 2020

jorisvandenbossche commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

amueller commented Jan 31, 2020

jorisvandenbossche commented Jan 31, 2020

TomAugspurger commented Jan 31, 2020

jbrockmendel commented Jan 31, 2020

REGR: setitem with integer slices on Int/RangeIndex is broken (label instead of positional) #31469

REGR: setitem with integer slices on Int/RangeIndex is broken (label instead of positional) #31469

amueller commented Jan 30, 2020 •

edited

Loading

MarcoGorelli commented Jan 30, 2020 •

edited

Loading

MarcoGorelli commented Jan 30, 2020 •

edited

Loading