-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: Improve the docstring of DataFrame.equals() #22539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: Improve the docstring of DataFrame.equals() #22539
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great docstring, added some comments.
pandas/core/generic.py
Outdated
Test whether two DataFrame objects contain the same elements. | ||
|
||
This function allows two DataFrame objects to be compared against | ||
each other to see if they same shape and elements. NaNs in the same |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they have the same shape...
pandas/core/generic.py
Outdated
|
||
See Also | ||
-------- | ||
pandas.Series.eq : Compare two Series objects of the same length |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the pandas.
prefix, just Series.eq
is enough.
I think we need eq
and equals
of both Series
and DataFrame
, not just Series.eq
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be redundant to mention equals
since this is the docstring for that method?
pandas/core/generic.py
Outdated
and return a Series where each element is True if the element | ||
in each Series is equal, False otherwise. | ||
|
||
numpy.array_equal : Return True if two arrays have the same shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for the blank line before this
pandas/core/generic.py
Outdated
Examples | ||
-------- | ||
>>> a = pd.DataFrame({1:[0], 0:[1]}) | ||
>>> b = pd.DataFrame({1.0:[0], 0.0:[1]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be consistent with other docstrings, it'd be good to use df
for the main DataFrame the ones in which we'll call the method df.equals()
, and something more descriptive for the other (e.g. exactly_equal
, different_data_type
, different_index_type
...)
After creating the data, it'd be nice to display the content (i.e. >>> df
)
Not sure in this case if it'd be better or worse, but in general we try to have somehow real-world (meaningful) data for the examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used the same data in the updated PR - felt with text in order to use meaningful data it would be harder/more confusing to demonstrate. Can change that if needed.
pandas/core/generic.py
Outdated
>>> b = pd.DataFrame({1.0:[0], 0.0:[1]}) | ||
|
||
DataFrames a and b have the same element types and values, but have | ||
different types for the indices, which will still return True. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"indices" are the row labels, which are the same, in this case the different type is in the column labels.
pandas/core/generic.py
Outdated
@@ -1303,8 +1303,59 @@ def __invert__(self): | |||
|
|||
def equals(self, other): | |||
""" | |||
Determines if two NDFrame objects contain the same elements. NaNs in | |||
the same location are considered equal. | |||
Test whether two DataFrame objects contain the same elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This docstring is used for both Series and DataFrame, we should be generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the DataFrame references in the docstring to NDFrame as originally written, however the validation script returned an error (Private classes (['NDFrame']) should not be mentioned in public docstring.). The updated PR still has NDFrame but can change it to something else (Series/DataFrame?) if that's better.
Codecov Report
@@ Coverage Diff @@
## master #22539 +/- ##
=========================================
Coverage ? 92.04%
=========================================
Files ? 169
Lines ? 50787
Branches ? 0
=========================================
Hits ? 46745
Misses ? 4042
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring looks really good. Just some comments about technicalities.
pandas/core/generic.py
Outdated
@@ -1303,8 +1303,81 @@ def __invert__(self): | |||
|
|||
def equals(self, other): | |||
""" | |||
Determines if two NDFrame objects contain the same elements. NaNs in | |||
the same location are considered equal. | |||
Test whether two NDFrame objects contain the same elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can leave Test whether two objects contain the same elements.
for now.
The problem is that NDFrame is not a class we expect users to know about (it's private). So, it shouldn't be used in the documentation. Series or DataFrame
or object
are probably what we want to use in most cases.
pandas/core/generic.py
Outdated
return a DataFrame where each element is True if the respective | ||
element in each DataFrame is equal, False otherwise. | ||
numpy.array_equal : Return True if two arrays have the same shape | ||
and elements, False otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We add the method being documented to the See Also
, only in cases where the docstring is being reused by many methods, and we want those methods to reference each other. But this is not the case here. May be you can still add the pandas.utils.testing.assert_frame_equal
and assert_series_equal
here, but I don't think it adds value to reference .equals
itself.
pandas/core/generic.py
Outdated
Notes | ||
----- | ||
This function requires that the elements have the same dtype as their | ||
respective elements in the other DataFrame. However, the column labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the other Series or DataFrame
pandas/core/generic.py
Outdated
|
||
Examples | ||
-------- | ||
>>> df = pd.DataFrame({1:[0], 0:[1]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing spaces after colons. I think using 0 -> 1 and 1 -> 0 is a bit confusing, may be 1 -> 10 and 2 -> 20 makes the example easier to read?
pandas/core/generic.py
Outdated
types and values, but have different types for the column labels, | ||
which will still return True. | ||
|
||
>>> different_column_label = pd.DataFrame({1.0:[0], 0.0:[1]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
different_column_type
seems a clearer name to me
…ster' into docstring_DataFrame_equals
Hello @seantchan! Thanks for updating the PR.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, really good docstring, thanks for the work on this @seantchan
git diff upstream/master -u -- "*.py" | flake8 --diff