-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Fix EX01 in DataFrame.drop_duplicates #33283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
988542c
1618a0c
f238c81
e34db4f
71b8e05
caffe84
6ff5453
cbabca4
27a2045
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4673,6 +4673,49 @@ def drop_duplicates( | |
See Also | ||
-------- | ||
DataFrame.value_counts: Count unique combinations of columns. | ||
|
||
Examples | ||
-------- | ||
|
||
Consider dataset containing ramen rating. | ||
|
||
>>> brand = ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'] | ||
>>> df = pd.DataFrame({'brand': brand, | ||
... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], | ||
... 'rating': [4, 4, 3.5, 15, 5]}, | ||
... index=['TH', 'TH', 'ID', 'ID', 'ID']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if the index adds much value to the example. I'd remove it if it doesn't, so things are simpler and faster to read. Also, did you run the validation script There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, I agree, the index doesn't have additional values, I might as well remove it.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hehe, I think the important part is the one that you didn't past, after the Validation header. :) That's where it says if any error has been found, or if everything is ok. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, there's no error showed after validation header ._. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIRC there should be a message saying there are no errors if that's the case. May be there is something broken. But in any case, the CI is green, and this is a nice improvement. If there is any validation problem we can take care at it in the future. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But I am wondering, which part of the documentation I could improve regarding the PEP-8 indentation? I could change it and run the validation scripts once again. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's the indentation, see: df = pd.DataFrame({'brand': brand,
'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
df = pd.DataFrame({'brand': brand,
'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
df = pd.DataFrame({
'brand': brand,
'style': ['cup', 'cup', 'cup', 'pack', 'pack'], The first one is the one in your code, and doesn't seem correct to me. The other two seem correct. |
||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove the empty line, so it won't split into sections There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I missed that, I'm gonna fix this issue. |
||
>>> df | ||
brand style rating | ||
TH Yum Yum cup 4.0 | ||
TH Yum Yum cup 4.0 | ||
ID Indomie cup 3.5 | ||
ID Indomie pack 15.0 | ||
ID Indomie pack 5.0 | ||
|
||
By default, it removes duplicate rows based on all columns | ||
|
||
>>> df.drop_duplicates() | ||
brand style rating | ||
TH Yum Yum cup 4.0 | ||
ID Indomie cup 3.5 | ||
ID Indomie pack 15.0 | ||
ID Indomie pack 5.0 | ||
|
||
To remove duplicates on specific column(s), use ``subset`` | ||
|
||
>>> df.drop_duplicates(subset=['brand']) | ||
brand style rating | ||
TH Yum Yum cup 4.0 | ||
ID Indomie cup 3.5 | ||
|
||
To remove duplicates and keep last occurences, use ``keep`` | ||
|
||
>>> df.drop_duplicates(subset=['brand', 'style'], keep='last') | ||
brand style rating | ||
TH Yum Yum cup 4.0 | ||
ID Indomie cup 3.5 | ||
ID Indomie pack 5.0 | ||
""" | ||
if self.empty: | ||
return self.copy() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the empty line.