ENH: Allow Iterable[Hashable] in drop_duplicates #59392

kirill-bash · 2024-08-02T21:00:26Z

closes ENH: Reduce type requirements for the subset parameter in drop_duplicates/duplicated #59237 (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

rhshadrach

Thanks for the PR!

rhshadrach · 2024-08-05T21:02:46Z

pandas/core/frame.py

+            if isinstance(subset, set):
+                elem = subset.pop()
+            else:
+                elem = subset[0]


Doesn't calling .pop mutate the passed argument? In any case, I'd recommend using next(iter(subset)) in both cases.

Good point! Updated.

pandas/core/frame.py

rhshadrach · 2024-08-05T21:07:34Z

Also - can you give this PR an informative title. Something like ENH: Allow sets in drop_duplicates.

Dr-Irv

You should also update the docs to indicate that any iterable of labels is accepted.

Dr-Irv · 2024-08-07T14:06:27Z

pandas/core/frame.py

@@ -6534,7 +6534,7 @@ def dropna(
    @overload
    def drop_duplicates(
        self,
-        subset: Hashable | Sequence[Hashable] | None = ...,
+        subset: Hashable | Sequence[Hashable] | set | None = ...,


I think you should just replace Sequence[Hashable] with Iterable[Hashable] here (and in the other overloads), because even a dict works now.

Also, I suggest making a separate PR in pandas-stubs once this PR is accepted.

@Dr-Irv,

Updated. Could you specify which docs I should be updating to reflect these changes?

Also, what is the difference between pandas and pandas-stubs?
I am a new contributor and am happy to raise the PR there but would like to understand more about each repo.

Also, what is the difference between pandas and pandas-stubs? I am a new contributor and am happy to raise the PR there but would like to understand more about each repo.

pandas is the code that executes.

pandas-stubs is a set of typing declarations meant for users to use when type checking their code. It is bundled within Visual Studio Code. See https://github.com/pandas-dev/pandas-stubs/

Thanks for clarifying. I will do as you suggested and open a PR there once this one is merged.

Dr-Irv

I'm fine with this. Will let someone else make the final call.

rhshadrach · 2024-08-07T21:17:21Z

pandas/core/frame.py

@@ -6535,7 +6534,7 @@ def dropna(
    @overload
    def drop_duplicates(
        self,
-        subset: Hashable | Sequence[Hashable] | AbstractSet | None = ...,
+        subset: Hashable | Iterable[Hashable] | set | None = ...,


No need for set anymore now that you've changed Sequence to Iterable, right?

That's true. Updated.

rhshadrach · 2024-08-07T21:18:21Z

doc/source/whatsnew/v3.0.0.rst

@@ -51,7 +51,7 @@ Other enhancements
 - :meth:`Series.cummin` and :meth:`Series.cummax` now supports :class:`CategoricalDtype` (:issue:`52335`)
 - :meth:`Series.plot` now correctly handle the ``ylabel`` parameter for pie charts, allowing for explicit control over the y-axis label (:issue:`58239`)
 - Restore support for reading Stata 104-format and enable reading 103-format dta files (:issue:`58554`)
- Support passing a :class:`Set` input to :meth:`DataFrame.drop_duplicates` (:issue:`59237`)
+- Support passing a :class:`Set` and :class:`Iterable[Hashable]` input to :meth:`DataFrame.drop_duplicates` (:issue:`59237`)


Same here - can remove Set now.

Updated. Thanks!

rhshadrach

CI is failing on code checks. I think it's the line subset = self.columns # type: ignore[assignment] where the type: ignore can now be removed.

rhshadrach

lgtm, can be merged on green.

rhshadrach · 2024-08-13T23:46:28Z

Thanks @kirill-bash!

kirill-bash · 2024-08-14T13:09:39Z

Appreciate your help, @rhshadrach!

kirill-bash added 2 commits August 2, 2024 16:59

ENH: pandas-dev#59237

39cb4ca

adding whatsnew note

03c33ee

rhshadrach requested changes Aug 5, 2024

View reviewed changes

rhshadrach added Enhancement duplicated duplicated, drop_duplicates labels Aug 5, 2024

kirill-bash changed the title ~~ENH: pandas-dev#59237~~ ENH: Allow sets in drop_duplicates - pandas-dev#59237 Aug 7, 2024

updates based on feedback

4ff6206

kirill-bash requested a review from rhshadrach August 7, 2024 12:50

Dr-Irv requested changes Aug 7, 2024

View reviewed changes

updates based on feedback: Sequence[Hashable] -> Iterable[Hashable]

6e7f1b1

kirill-bash changed the title ~~ENH: Allow sets in drop_duplicates - pandas-dev#59237~~ ENH: Allow sets + Iterable[Hashable] in drop_duplicates - pandas-dev#59237 Aug 7, 2024

adding whatsnew note

e43ca96

kirill-bash requested a review from Dr-Irv August 7, 2024 16:39

Dr-Irv approved these changes Aug 7, 2024

View reviewed changes

rhshadrach requested changes Aug 7, 2024

View reviewed changes

set is part of Iterable[Hashable]

68e6e13

kirill-bash requested a review from rhshadrach August 9, 2024 13:47

kirill-bash changed the title ~~ENH: Allow sets + Iterable[Hashable] in drop_duplicates - pandas-dev#59237~~ ENH: Allow Iterable[Hashable] in drop_duplicates - pandas-dev#59237 Aug 9, 2024

rhshadrach requested changes Aug 11, 2024

View reviewed changes

kirill-bash added 2 commits August 13, 2024 14:18

removing type: ignore[assignment]

77ff9e6

Merge branch 'main' into main

cd959d4

kirill-bash requested a review from rhshadrach August 13, 2024 19:09

Remove remaining comment

f74c9f4

rhshadrach added this to the 3.0 milestone Aug 13, 2024

rhshadrach approved these changes Aug 13, 2024

View reviewed changes

rhshadrach changed the title ~~ENH: Allow Iterable[Hashable] in drop_duplicates - pandas-dev#59237~~ ENH: Allow Iterable[Hashable] in drop_duplicates Aug 13, 2024

rhshadrach merged commit 614939a into pandas-dev:main Aug 13, 2024
47 checks passed

kirill-bash mentioned this pull request Aug 26, 2024

ENH: updating subset types in drop_duplicates/duplicated pandas-dev/pandas-stubs#986

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Allow Iterable[Hashable] in drop_duplicates #59392

ENH: Allow Iterable[Hashable] in drop_duplicates #59392

kirill-bash commented Aug 2, 2024 •

edited

Loading

rhshadrach left a comment

rhshadrach Aug 5, 2024

kirill-bash Aug 7, 2024

rhshadrach commented Aug 5, 2024

Dr-Irv left a comment

Dr-Irv Aug 7, 2024

kirill-bash Aug 7, 2024

Dr-Irv Aug 7, 2024

kirill-bash Aug 9, 2024

Dr-Irv left a comment

rhshadrach Aug 7, 2024

kirill-bash Aug 9, 2024

rhshadrach Aug 7, 2024

kirill-bash Aug 9, 2024

rhshadrach left a comment

rhshadrach left a comment

rhshadrach commented Aug 13, 2024

kirill-bash commented Aug 14, 2024

ENH: Allow Iterable[Hashable] in drop_duplicates #59392

ENH: Allow Iterable[Hashable] in drop_duplicates #59392

Conversation

kirill-bash commented Aug 2, 2024 • edited Loading

rhshadrach left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach commented Aug 5, 2024

Dr-Irv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dr-Irv left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rhshadrach left a comment

Choose a reason for hiding this comment

rhshadrach left a comment

Choose a reason for hiding this comment

rhshadrach commented Aug 13, 2024

kirill-bash commented Aug 14, 2024

kirill-bash commented Aug 2, 2024 •

edited

Loading