DOC: Fix EX01 in DataFrame.drop_duplicates #33283

farhanreynaldo · 2020-04-04T12:15:30Z

Related to #27977.

################################################################################
################################## Validation ##################################
################################################################################

ShaharNaveh

@farhanreynaldo Thank you for working on this!

I have a few nits, otherwise LGTM

ShaharNaveh · 2020-04-04T17:38:59Z

pandas/core/frame.py

+
+        Examples
+        --------
+


Remove the empty line.

ShaharNaveh · 2020-04-04T17:39:25Z

pandas/core/frame.py

+        ... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
+        ... 'rating': [4, 4, 3.5, 15, 5]},
+        ...  index=['TH', 'TH', 'ID', 'ID', 'ID'])
+


Remove the empty line, so it won't split into sections

I missed that, I'm gonna fix this issue.

…op-duplicates

ShaharNaveh

LGTM

datapythonista · 2020-04-07T13:29:10Z

pandas/core/frame.py

+        >>> df = pd.DataFrame({'brand': brand,
+        ... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'],
+        ... 'rating': [4, 4, 3.5, 15, 5]},
+        ...  index=['TH', 'TH', 'ID', 'ID', 'ID'])


I'm not sure if the index adds much value to the example. I'd remove it if it doesn't, so things are simpler and faster to read.

Also, did you run the validation script scripts/validate_docstrings.py pandas.DataFrame.drop_duplicates? I'm wondering if the indentation above is PEP-8 correct. The script should tell.

Hm, I agree, the index doesn't have additional values, I might as well remove it.
I also have run the validate scripts, and it returns:

(pandas-dev) E:\pandas>python scripts/validate_docstrings.py pandas.DataFrame.drop_duplicates ################################################################################ ################# Docstring (pandas.DataFrame.drop_duplicates) ################# ################################################################################ Return DataFrame with duplicate rows removed. Considering certain columns is optional. Indexes, including time indexes are ignored. Parameters ---------- subset : column label or sequence of labels, optional Only consider certain columns for identifying duplicates, by default use all of the columns. keep : {'first', 'last', False}, default 'first' Determines which duplicates (if any) to keep. - ``first`` : Drop duplicates except for the first occurrence. - ``last`` : Drop duplicates except for the last occurrence. - False : Drop all duplicates. inplace : bool, default False Whether to drop duplicates in place or to return a copy. ignore_index : bool, default False If True, the resulting axis will be labeled 0, 1, …, n - 1. .. versionadded:: 1.0.0 Returns ------- DataFrame DataFrame with duplicates removed or None if ``inplace=True``. See Also -------- DataFrame.value_counts: Count unique combinations of columns. Examples -------- Consider dataset containing ramen rating. >>> brand = ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'] >>> df = pd.DataFrame({'brand': brand, ... 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], ... 'rating': [4, 4, 3.5, 15, 5]}) >>> df brand style rating 0 Yum Yum cup 4.0 1 Yum Yum cup 4.0 2 Indomie cup 3.5 3 Indomie pack 15.0 4 Indomie pack 5.0 By default, it removes duplicate rows based on all columns >>> df.drop_duplicates() brand style rating 0 Yum Yum cup 4.0 2 Indomie cup 3.5 3 Indomie pack 15.0 4 Indomie pack 5.0 To remove duplicates on specific column(s), use ``subset`` >>> df.drop_duplicates(subset=['brand']) brand style rating 0 Yum Yum cup 4.0 2 Indomie cup 3.5 To remove duplicates and keep last occurences, use ``keep`` >>> df.drop_duplicates(subset=['brand', 'style'], keep='last') brand style rating 1 Yum Yum cup 4.0 2 Indomie cup 3.5 4 Indomie pack 5.0 ################################################################################ ################################## Validation ################################## ################################################################################

hehe, I think the important part is the one that you didn't past, after the Validation header. :) That's where it says if any error has been found, or if everything is ok.

Yeah, there's no error showed after validation header ._.

IIRC there should be a message saying there are no errors if that's the case. May be there is something broken.

But in any case, the CI is green, and this is a nice improvement. If there is any validation problem we can take care at it in the future.

But I am wondering, which part of the documentation I could improve regarding the PEP-8 indentation? I could change it and run the validation scripts once again.

It's the indentation, see:

df = pd.DataFrame({'brand': brand, 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], df = pd.DataFrame({'brand': brand, 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], df = pd.DataFrame({ 'brand': brand, 'style': ['cup', 'cup', 'cup', 'pack', 'pack'],

The first one is the one in your code, and doesn't seem correct to me. The other two seem correct.

datapythonista

Thanks @farhanreynaldo

pandas/core/frame.py

Co-Authored-By: Marc Garcia <[email protected]>

datapythonista

Cool, this looks great now. Thanks @farhanreynaldo

jreback · 2020-04-10T17:52:03Z

thanks @farhanreynaldo

farhanreynaldo added 2 commits April 4, 2020 19:10

DOC: Fix EX01 in DataFrame.drop_duplicates

988542c

DOC: Fix EX01 in DataFrame.drop_duplicates

1618a0c

ShaharNaveh added the Docs label Apr 4, 2020

ShaharNaveh suggested changes Apr 4, 2020

View reviewed changes

farhanreynaldo added 2 commits April 5, 2020 11:06

Remove empty lines

f238c81

Merge branch 'master' of https://github.com/pandas-dev/pandas into dr…

e34db4f

…op-duplicates

farhanreynaldo requested a review from ShaharNaveh April 5, 2020 04:52

ShaharNaveh approved these changes Apr 5, 2020

View reviewed changes

datapythonista reviewed Apr 7, 2020

View reviewed changes

Remove index name

71b8e05

datapythonista approved these changes Apr 8, 2020

View reviewed changes

Fix indentation

caffe84

datapythonista reviewed Apr 8, 2020

View reviewed changes

pandas/core/frame.py Outdated Show resolved Hide resolved

farhanreynaldo and others added 3 commits April 9, 2020 09:10

Add indentation

6ff5453

Co-Authored-By: Marc Garcia <[email protected]>

move brand to fits inline

cbabca4

Add period on sentences

27a2045

datapythonista approved these changes Apr 9, 2020

View reviewed changes

jreback added this to the 1.1 milestone Apr 10, 2020

jreback merged commit 916d1f3 into pandas-dev:master Apr 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: Fix EX01 in DataFrame.drop_duplicates #33283

DOC: Fix EX01 in DataFrame.drop_duplicates #33283

farhanreynaldo commented Apr 4, 2020

ShaharNaveh left a comment

ShaharNaveh Apr 4, 2020

ShaharNaveh Apr 4, 2020

farhanreynaldo Apr 5, 2020

ShaharNaveh left a comment

datapythonista Apr 7, 2020

farhanreynaldo Apr 8, 2020

datapythonista Apr 8, 2020

farhanreynaldo Apr 8, 2020

datapythonista Apr 8, 2020

farhanreynaldo Apr 8, 2020

datapythonista Apr 8, 2020

datapythonista left a comment

datapythonista left a comment

jreback commented Apr 10, 2020

DOC: Fix EX01 in DataFrame.drop_duplicates #33283

DOC: Fix EX01 in DataFrame.drop_duplicates #33283

Conversation

farhanreynaldo commented Apr 4, 2020

ShaharNaveh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ShaharNaveh left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

datapythonista left a comment

Choose a reason for hiding this comment

jreback commented Apr 10, 2020