PERF: pd.merge(on=index) is faster than pd.merge(left_index=True, right_index=True) if index is duplicated #56564
Labels
Performance
Memory or execution speed performance
Reshaping
Concat, Merge/Join, Stack/Unstack, Explode
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the main branch of pandas.
Reproducible Example
Installed Versions
Prior Performance
Both merges give the same output (verified using
pd.testing.assert_frame_equal
).pd.merge(..., on=index)
This behaviour is observed only when index has duplicated values in both dataframes. The expected behavior is at least the same execution time regardless of the provided arguments.
Master regression may be caused by #56523 (@lukemanley @phofl FYI).
The text was updated successfully, but these errors were encountered: