Skip to content

BUG: errors with .loc[(slice(...), ), ] when modifying a subset of rows in a pandas dataframe/series in 1.1.4 #37711

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
taozuoqiao opened this issue Nov 9, 2020 · 4 comments · Fixed by #37787
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@taozuoqiao
Copy link

taozuoqiao commented Nov 9, 2020

I often use .loc[(slice(...),),] for selection and modifying when dealing with both series and dataframe: see
advanced-indexing-with-hierarchical-index

In [43]: df.loc['bar']
Out[43]:
A B C
second
one 0.895717 0.410835 -1.413681
two 0.805244 0.813850 1.607920

This is a shortcut for the slightly more verbose notation df.loc[('bar',),] (equivalent to df.loc['bar',] in this example).

Every thinkg is OK in 0.25.4, but in 1.1.4 (also 1.0.3 and 1.1.3), only selection is valid and modifying will raise errors:

import pandas as pd
s = pd.Series([1,2,3], index=pd.MultiIndex.from_tuples([('a','A'),  ('b','B'), ('c', 'C')]))
df = s.to_frame()
print(s.loc[('a', ), ])
print(df.loc[('a', ), ])

but when I try to modify a subset of rows using .loc[(slice(...),),]

In [2]: df.loc[('a', ), ] = 0
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-2-480fc1763dde> in <module>
----> 1 df.loc[('a', ), ] = 0

~/opt/anaconda3/envs/edge/lib/python3.8/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)                                                                    
    664         else:
    665             key = com.apply_if_callable(key, self.obj)
--> 666         indexer = self._get_setitem_indexer(key)
    667         self._has_valid_setitem_indexer(key)
    668

~/opt/anaconda3/envs/edge/lib/python3.8/site-packages/pandas/core/indexing.py in _get_setitem_indexer(self, key)                                                                  
    591         """
    592         if self.name == "loc":
--> 593             self._ensure_listlike_indexer(key)
    594
    595         if self.axis is not None:

~/opt/anaconda3/envs/edge/lib/python3.8/site-packages/pandas/core/indexing.py in _ensure_listlike_indexer(self, key, axis)                                                        
    645             # key may be a tuple if we are .loc
    646             # in that case, set key to the column part of key
--> 647             key = key[column_axis]
    648             axis = column_axis
    649

IndexError: tuple index out of range

and

In [3]: s.loc[('a', ), ] = 0
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-3-c3a2e38a0417> in <module>
----> 1 s.loc[('a', ), ] = 0

~/opt/anaconda3/envs/edge/lib/python3.8/site-packages/pandas/core/indexing.py in __setitem__(self, key, value)
    668 
    669         iloc = self if self.name == "iloc" else self.obj.iloc
--> 670         iloc._setitem_with_indexer(indexer, value)
    671 
    672     def _validate_key(self, key, axis: int):

~/opt/anaconda3/envs/edge/lib/python3.8/site-packages/pandas/core/indexing.py in _setitem_with_indexer(self, indexer, value)
   1633         if take_split_path:
   1634             # Above we only set take_split_path to True for 2D cases
-> 1635             assert self.ndim == 2
   1636             assert info_axis == 1
   1637 

AssertionError:

Note that “partial” indexing .loc[slice(...)] is still valid in ``1.1.4`:

In [4]: df.loc['a'] = 0

In [5]: s.loc['a'] = 0

In [6]: df
Out[6]: 
     0
a A  0
b B  2
c C  3

In [7]: s
Out[7]: 
a  A    0
b  B    2
c  C    3
dtype: int64

Of course, .loc[('a', )] is valid for series and .loc[('a', ), : ] is vaild for dataframe, but I have to use different codes for series and dataframe.

Update on 20201113:

  • For series, s.loc[('a')] = 0, s.loc[('a', )] = 0 and s.loc[('a'), ] = 0 are vaild, but s.loc[('a', ), ] = 0 will raise similar AssertionError like above.
  • For dataframe, df.loc[('a')] = 0, df.loc[('a'), :] = 0 and df.loc[('a',), :] = 0 are vaild, but df.loc[('a', )] = 0 , df.loc[('a'), ] = 0 and df.loc[('a', ), ] = 0 will raise similar IndexError like above.
@taozuoqiao taozuoqiao added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 9, 2020
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Nov 12, 2020
@simonjayhawkins
Copy link
Member

@taozuoqiao thanks for the report

first bad commit: [810a4e5] BUG: assignment to multiple columns when some column do not exist (#29334) cc @howsiwei

@simonjayhawkins simonjayhawkins added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 12, 2020
@simonjayhawkins simonjayhawkins added this to the 1.1.5 milestone Nov 12, 2020
@taozuoqiao
Copy link
Author

Thanks very much for the answering and fix. I just update my comment, adding more examples of setting values which could fail in 1.1.3. I am rather new to pandas, and I am not sure whether they will all be fixed by phofl's PR,

  • For series, s.loc[('a')] = 0, s.loc[('a', )] = 0 and s.loc[('a'), ] = 0 are vaild, but s.loc[('a', ), ] = 0 will raise similar AssertionError like above.
  • For dataframe, df.loc[('a')] = 0, df.loc[('a'), :] = 0 and df.loc[('a',), :] = 0 are vaild, but df.loc[('a', )] = 0 , df.loc[('a'), ] = 0 and df.loc[('a', ), ] = 0 will raise similar IndexError like above.

They are all valid when getting values.

@phofl
Copy link
Member

phofl commented Nov 13, 2020

I am not quite sure, if s.loc[('a', ), ] = 0 this should work, a Series has no columns. I think s.loc[('a'), ] = 0 works wrongly?

@taozuoqiao
Copy link
Author

I agree with you, they are really very ambiguous practices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version
Projects
None yet
3 participants