-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Allow Iterable[Hashable] in drop_duplicates #59392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR!
pandas/core/frame.py
Outdated
if isinstance(subset, set): | ||
elem = subset.pop() | ||
else: | ||
elem = subset[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't calling .pop
mutate the passed argument? In any case, I'd recommend using next(iter(subset))
in both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Updated.
Also - can you give this PR an informative title. Something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should also update the docs to indicate that any iterable of labels is accepted.
pandas/core/frame.py
Outdated
@@ -6534,7 +6534,7 @@ def dropna( | |||
@overload | |||
def drop_duplicates( | |||
self, | |||
subset: Hashable | Sequence[Hashable] | None = ..., | |||
subset: Hashable | Sequence[Hashable] | set | None = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should just replace Sequence[Hashable]
with Iterable[Hashable]
here (and in the other overloads), because even a dict
works now.
Also, I suggest making a separate PR in pandas-stubs
once this PR is accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Could you specify which docs I should be updating to reflect these changes?
Also, what is the difference between pandas
and pandas-stubs
?
I am a new contributor and am happy to raise the PR there but would like to understand more about each repo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, what is the difference between
pandas
andpandas-stubs
? I am a new contributor and am happy to raise the PR there but would like to understand more about each repo.
pandas
is the code that executes.
pandas-stubs
is a set of typing declarations meant for users to use when type checking their code. It is bundled within Visual Studio Code. See https://github.com/pandas-dev/pandas-stubs/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarifying. I will do as you suggested and open a PR there once this one is merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this. Will let someone else make the final call.
pandas/core/frame.py
Outdated
@@ -6535,7 +6534,7 @@ def dropna( | |||
@overload | |||
def drop_duplicates( | |||
self, | |||
subset: Hashable | Sequence[Hashable] | AbstractSet | None = ..., | |||
subset: Hashable | Iterable[Hashable] | set | None = ..., |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for set
anymore now that you've changed Sequence
to Iterable
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true. Updated.
doc/source/whatsnew/v3.0.0.rst
Outdated
@@ -51,7 +51,7 @@ Other enhancements | |||
- :meth:`Series.cummin` and :meth:`Series.cummax` now supports :class:`CategoricalDtype` (:issue:`52335`) | |||
- :meth:`Series.plot` now correctly handle the ``ylabel`` parameter for pie charts, allowing for explicit control over the y-axis label (:issue:`58239`) | |||
- Restore support for reading Stata 104-format and enable reading 103-format dta files (:issue:`58554`) | |||
- Support passing a :class:`Set` input to :meth:`DataFrame.drop_duplicates` (:issue:`59237`) | |||
- Support passing a :class:`Set` and :class:`Iterable[Hashable]` input to :meth:`DataFrame.drop_duplicates` (:issue:`59237`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here - can remove Set
now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI is failing on code checks. I think it's the line subset = self.columns # type: ignore[assignment]
where the type: ignore
can now be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, can be merged on green.
Thanks @kirill-bash! |
Appreciate your help, @rhshadrach! |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.