BUG: DataFrame index name is missing after call ".loc[]" in some cases. #42188

PengSY · 2021-06-22T09:37:57Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd

df1 = pd.DataFrame({"col1": [1, 2, 3], "col2": [2, 3, 4], "col_index": ['a', 'b', 'b']}).set_index("col_index")
df2 = pd.DataFrame({"col1": [1, 2, 3], "col2": [2, 3, 4], "col_index": ['a', 'b', 'c']}).set_index("col_index")

selected_index = df1.index.intersection(['a'])
print(df1.loc[selected_index])
print(df2.loc[selected_index])

Problem description

These two dataframes are almost the same, except the last index value. However, after we select the first row with index as "a" by .loc[] on both dataframes, the returned dataframes are different. And the index name of the result for df2 is missing, which is unexpected.

>>> print(df1.loc[selected_index])
           col1  col2
col_index
a             1     2

>>> print(df2.loc[selected_index])
   col1  col2
a     1     2

Expected Output

The index name for the df.loc[selected_index]) should be col_index as it is for df1.

Output of `pd.show_versions()`

pd.show_versions()

INSTALLED VERSIONS

commit : b5958ee
python : 3.6.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.1.5
numpy : 1.18.5
pytz : 2019.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 52.0.0.post20210125
Cython : 0.29.14
pytest : 6.2.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.16.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
numba : None

The text was updated successfully, but these errors were encountered:

raghuabhishek · 2021-06-22T13:01:55Z

I reproduced your code and I got this output

df1 = pd.DataFrame({"col1": [1, 2, 3], "col2": [2, 3, 4], "col_index": ['a', 'b', 'b']}).set_index("col_index")
df2 = pd.DataFrame({"col1": [1, 2, 3], "col2": [2, 3, 4], "col_index": ['a', 'b', 'c']}).set_index("col_index")
df1
col1 col2
col_index
a 1 2
b 2 3
b 3 4
df2
col1 col2
col_index
a 1 2
b 2 3
c 3 4
selected_index = df1.index.intersection(['a'])
print(df1.loc[selected_index])
col1 col2
col_index
a 1 2
print(df2.loc[selected_index])
col1 col2
col_index
a 1 2

I didnt get this issue which you are facing. I am currently using pandas : 1.4.0.dev0+58.g98e22297bb version

attack68 · 2021-06-22T18:41:44Z

I have confirmed this bug exists on the latest version of pandas.

latest version is 1.2.4 or 1.3.0 pre-release. Your post suggests you are using 1.1.5

ankurtri · 2021-06-24T21:11:15Z

I am not able to recreate this issue on pd.version == '1.2.4'.

simonjayhawkins · 2021-06-25T13:24:39Z

fixed in commit: [7e2aa42] BUG: name retention in Index.intersection (#38111) which was released in pandas 1.2

PengSY added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 22, 2021

attack68 added Closing Candidate May be closeable, needs more eyeballs and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 22, 2021

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Jun 25, 2021

add code sample for pandas-dev#42188

4a7c6e1

simonjayhawkins closed this as completed Jun 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame index name is missing after call ".loc[]" in some cases. #42188

BUG: DataFrame index name is missing after call ".loc[]" in some cases. #42188

PengSY commented Jun 22, 2021

INSTALLED VERSIONS

raghuabhishek commented Jun 22, 2021

attack68 commented Jun 22, 2021

ankurtri commented Jun 24, 2021

simonjayhawkins commented Jun 25, 2021

BUG: DataFrame index name is missing after call ".loc[]" in some cases. #42188

BUG: DataFrame index name is missing after call ".loc[]" in some cases. #42188

Comments

PengSY commented Jun 22, 2021

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

raghuabhishek commented Jun 22, 2021

attack68 commented Jun 22, 2021

ankurtri commented Jun 24, 2021

simonjayhawkins commented Jun 25, 2021

Output of `pd.show_versions()`