PERF: DataFrame.duplicated with subset= for 1 column is slower than Series.duplicated #45236
Closed
3 tasks done
Labels
Milestone
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this issue exists on the latest version of pandas.
I have confirmed this issue exists on the master branch of pandas.
Reproducible Example
Set up:
The following are multiple ways to check for duplicates of the
'brand'
column.Using
subset=['brand']
averages 176 µs:Using
[]
indexing averages 31.2 µs:Using
.loc[:,'brand']
indexing averages 40.6 µs:(1) is the exact example used in the doc for
pandas.DataFrame.duplicated
, but it is the slowest.Installed Versions
I don't think this is expected 😳 this is how the environment was set up:
If it's any better, output of
conda list
:Prior Performance
No response
The text was updated successfully, but these errors were encountered: