-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Fix EX01 in DataFrame.drop_duplicates #33283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pandas/core/frame.py
Outdated
|
||
Examples | ||
-------- | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the empty line.
pandas/core/frame.py
Outdated
... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], | ||
... 'rating': [4, 4, 3.5, 15, 5]}, | ||
... index=['TH', 'TH', 'ID', 'ID', 'ID']) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the empty line, so it won't split into sections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I missed that, I'm gonna fix this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
pandas/core/frame.py
Outdated
>>> df = pd.DataFrame({'brand': brand, | ||
... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], | ||
... 'rating': [4, 4, 3.5, 15, 5]}, | ||
... index=['TH', 'TH', 'ID', 'ID', 'ID']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if the index adds much value to the example. I'd remove it if it doesn't, so things are simpler and faster to read.
Also, did you run the validation script scripts/validate_docstrings.py pandas.DataFrame.drop_duplicates
? I'm wondering if the indentation above is PEP-8 correct. The script should tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I agree, the index doesn't have additional values, I might as well remove it.
I also have run the validate scripts, and it returns:
(pandas-dev) E:\pandas>python scripts/validate_docstrings.py pandas.DataFrame.drop_duplicates
################################################################################
################# Docstring (pandas.DataFrame.drop_duplicates) #################
################################################################################
Return DataFrame with duplicate rows removed.
Considering certain columns is optional. Indexes, including time indexes
are ignored.
Parameters
----------
subset : column label or sequence of labels, optional
Only consider certain columns for identifying duplicates, by
default use all of the columns.
keep : {'first', 'last', False}, default 'first'
Determines which duplicates (if any) to keep.
- ``first`` : Drop duplicates except for the first occurrence.
- ``last`` : Drop duplicates except for the last occurrence.
- False : Drop all duplicates.
inplace : bool, default False
Whether to drop duplicates in place or to return a copy.
ignore_index : bool, default False
If True, the resulting axis will be labeled 0, 1, …, n - 1.
.. versionadded:: 1.0.0
Returns
-------
DataFrame
DataFrame with duplicates removed or None if ``inplace=True``.
See Also
--------
DataFrame.value_counts: Count unique combinations of columns.
Examples
--------
Consider dataset containing ramen rating.
>>> brand = ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie']
>>> df = pd.DataFrame({'brand': brand,
... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
... 'rating': [4, 4, 3.5, 15, 5]})
>>> df
brand style rating
0 Yum Yum cup 4.0
1 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
By default, it removes duplicate rows based on all columns
>>> df.drop_duplicates()
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
3 Indomie pack 15.0
4 Indomie pack 5.0
To remove duplicates on specific column(s), use ``subset``
>>> df.drop_duplicates(subset=['brand'])
brand style rating
0 Yum Yum cup 4.0
2 Indomie cup 3.5
To remove duplicates and keep last occurences, use ``keep``
>>> df.drop_duplicates(subset=['brand', 'style'], keep='last')
brand style rating
1 Yum Yum cup 4.0
2 Indomie cup 3.5
4 Indomie pack 5.0
################################################################################
################################## Validation ##################################
################################################################################
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hehe, I think the important part is the one that you didn't past, after the Validation header. :) That's where it says if any error has been found, or if everything is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, there's no error showed after validation header ._.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC there should be a message saying there are no errors if that's the case. May be there is something broken.
But in any case, the CI is green, and this is a nice improvement. If there is any validation problem we can take care at it in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I am wondering, which part of the documentation I could improve regarding the PEP-8 indentation? I could change it and run the validation scripts once again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the indentation, see:
df = pd.DataFrame({'brand': brand,
'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
df = pd.DataFrame({'brand': brand,
'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
df = pd.DataFrame({
'brand': brand,
'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
The first one is the one in your code, and doesn't seem correct to me. The other two seem correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @farhanreynaldo
Co-Authored-By: Marc Garcia <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, this looks great now. Thanks @farhanreynaldo
thanks @farhanreynaldo |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff
Related to #27977.