BUG: Pandas merge DFs on index and column #35609

carlosg-m · 2020-08-07T15:55:06Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

# Your code here
df = pd.DataFrame([0.5,0.6,6,6,7,12],index=[0,0,1,1,1,2], columns=['distance'])

print(df)

   distance
0       0.5
0       0.6
1       6.0
1       6.0
1       7.0
2      12.0

result = df.merge(df.groupby(df.index)['distance'].min(), left_index=True, right_index=True, on='distance')

print(result)

   distance
0       0.5
0       0.6
1       6.0
1       6.0
1       7.0
2      12.0

Problem description

Merging two DataFrames using left and right index and a column does not seem to be possible.

I only have two options, reset the index on both DFs and merge on both columns which is slower, or create a multiindex which is unnecessary.

Expected Output

result = df.reset_index().merge(df.groupby(df.index)['distance'].min().reset_index(), on=['index','distance'])

print(result)

   index  distance
0      0       0.5
1      1       6.0
2      1       6.0
3      2      12.0

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : d9fff27
python : 3.6.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.112+
Version : #1 SMP Thu Jul 23 08:00:38 PDT 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.0
numpy : 1.18.5
pytz : 2018.9
dateutil : 2.8.1
pip : 19.3.1
setuptools : 49.2.0
Cython : 0.29.21
pytest : 3.6.4
hypothesis : None
sphinx : 1.8.5
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.2.6
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.7.6.1 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 5.5.0
pandas_datareader: None
bs4 : 4.6.3
bottleneck : 1.3.2
fsspec : 0.8.0
fastparquet : None
gcsfs : None
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 2.5.9
pandas_gbq : 0.11.0
pyarrow : 0.14.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.4.1
sqlalchemy : 1.3.18
tables : 3.4.4
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.1.0
xlwt : 1.3.0
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

dsaxton · 2020-08-09T04:26:35Z

I think this is just how the API is written, which would require making the index a column if that's what you want to do.

I'm not sure what the specific use case is here but a faster / equivalent way to do this is to use a groupby.transform window function:

mask = df.groupby(df.index)["distance"].transform(np.min) == df["distance"]                                                                                                                         
print(df[mask])                                                                                                                                                                                     
#    distance
# 0       0.5
# 1       6.0
# 1       6.0
# 2      12.0

carlosg-m · 2020-08-09T11:40:46Z

I think this is just how the API is written, which would require making the index a column if that's what you want to do.

I'm not sure what the specific use case is here but a faster / equivalent way to do this is to use a groupby.transform window function:

mask = df.groupby(df.index)["distance"].transform(np.min) == df["distance"]                                                                                                                         
print(df[mask])                                                                                                                                                                                     
#    distance
# 0       0.5
# 1       6.0
# 1       6.0
# 2      12.0

Thank you!

simonjayhawkins · 2020-08-10T15:08:38Z

closing as answered

simonjayhawkins · 2020-08-10T15:14:27Z

xref #16228 (comment)

* pandas [issue \#35609](pandas-dev/pandas#35609)

carlosg-m added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 7, 2020

dsaxton added Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 9, 2020

simonjayhawkins closed this as completed Aug 10, 2020

simonjayhawkins added the Duplicate Report Duplicate issue or pull request label Aug 10, 2020

appleparan added a commit to appleparan/mise.py that referenced this issue Apr 11, 2021

Fix pandas merge operation due to recent pandas update

20cea4c

* pandas [issue \#35609](pandas-dev/pandas#35609)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Pandas merge DFs on index and column #35609

BUG: Pandas merge DFs on index and column #35609

carlosg-m commented Aug 7, 2020

INSTALLED VERSIONS

dsaxton commented Aug 9, 2020

carlosg-m commented Aug 9, 2020

simonjayhawkins commented Aug 10, 2020

simonjayhawkins commented Aug 10, 2020 •

edited

Loading

BUG: Pandas merge DFs on index and column #35609

BUG: Pandas merge DFs on index and column #35609

Comments

carlosg-m commented Aug 7, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

dsaxton commented Aug 9, 2020

carlosg-m commented Aug 9, 2020

simonjayhawkins commented Aug 10, 2020

simonjayhawkins commented Aug 10, 2020 • edited Loading

Output of `pd.show_versions()`

simonjayhawkins commented Aug 10, 2020 •

edited

Loading