Skip to content

BUG: Setting multiple values via .loc produces NaNs with MultiIndex #46837

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
LustigerKernSpalt opened this issue Apr 22, 2022 · 5 comments
Open
2 of 3 tasks
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Regression Functionality that used to work in a prior pandas version

Comments

@LustigerKernSpalt
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
i = pd.MultiIndex.from_product([(0, 1), (2, 3)])
s = pd.Series([True]*4, index=i)
s.loc[0,:] = s.loc[0,:] 
print(s)

Issue Description

Executing the script above on pandas 1.4.2 produces NaNs:

0  2     NaN
   3     NaN
1  2    True
   3    True

Expected Behavior

Behavior before 1.4.2 (expected):

0  2    True
   3    True
1  2    True
   3    True

Installed Versions

INSTALLED VERSIONS

commit : 4bfe3d0
python : 3.9.6.final.0
python-bits : 64
OS : Darwin
OS-release : 20.6.0
Version : Darwin Kernel Version 20.6.0: Mon Aug 30 06:12:21 PDT 2021; root:xnu-7195.141.6~3/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.4.2
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 21.1.3
setuptools : 57.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
markupsafe : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

@LustigerKernSpalt LustigerKernSpalt added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 22, 2022
simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue May 19, 2022
@simonjayhawkins
Copy link
Member

Thanks @LustigerKernSpalt for the report.

Behavior before 1.4.2 (expected):

first bad commit: [35b338e] BUG: .loc failing to drop first level (#42435)

cc @jbrockmendel

@simonjayhawkins simonjayhawkins added Indexing Related to indexing on series/frames, not to indexes themselves Regression Functionality that used to work in a prior pandas version MultiIndex and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 20, 2022
@simonjayhawkins simonjayhawkins added this to the 1.4.3 milestone May 20, 2022
@simonjayhawkins
Copy link
Member

moving to 1.4.4

@simonjayhawkins simonjayhawkins modified the milestones: 1.4.3, 1.4.4 Jun 22, 2022
@jorisvandenbossche
Copy link
Member

The reason this is now failing is similar as some of the cases that are discussed in #46704

Te getitem operation on the right side has changed result:

In [12]: import pandas as pd
    ...: i = pd.MultiIndex.from_product([(0, 1), (2, 3)])
    ...: s = pd.Series([True]*4, index=i)

In [13]: s.loc[0,:]
Out[13]: 
2    True
3    True
dtype: bool

Selecting 0 now dropped that level, returning a Series without MultiIndex. And as a result of that change, now setting with that Series no longer works.
I think one can argue that setting those values still should work, as also in the setitem operation you are selecting the first level, so that should have a similar effect. But that is something that also didn't work in the past:

In [1]: pd.__version__
Out[1]: '1.3.5'

In [2]: import pandas as pd
   ...: i = pd.MultiIndex.from_product([(0, 1), (2, 3)])
   ...: s = pd.Series([True]*4, index=i)

In [3]: s.loc[0, :].droplevel(0)   # <-- the equivalent of what now is returned directly by s.loc[0, :]
Out[3]: 
2    True
3    True
dtype: bool

In [4]: s.loc[0, :] = s.loc[0, :].droplevel(0)

In [5]: s
Out[5]: 
0  2     NaN
   3     NaN
1  2    True
   3    True
dtype: object

@jbrockmendel
Copy link
Member

Looks like in Loc._convert_to_indexer we return labels.get_locs(key) which correctly gives us [0, 1], but needs to somehow return info that this indexing is level-dropping, so that when we later call align_series, we take that into account

@simonjayhawkins
Copy link
Member

But that is something that also didn't work in the past:

ok, let's not change that this late in the 1.4.x release cycle. removing milestone.

@simonjayhawkins simonjayhawkins removed this from the 1.4.4 milestone Aug 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

4 participants