Skip to content

DOC: Improve the docstring of DataFrame.equals() #22539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 5, 2018
Merged
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 75 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1303,8 +1303,81 @@ def __invert__(self):

def equals(self, other):
"""
Determines if two NDFrame objects contain the same elements. NaNs in
the same location are considered equal.
Test whether two NDFrame objects contain the same elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can leave Test whether two objects contain the same elements. for now.

The problem is that NDFrame is not a class we expect users to know about (it's private). So, it shouldn't be used in the documentation. Series or DataFrame or object are probably what we want to use in most cases.


This function allows two NDFrame objects to be compared against
each other to see if they have the same shape and elements. NaNs in
the same location are considered equal. The column headers do not
need to have the same type, but the elements within the columns must
be the same dtype.

Parameters
----------
other : NDFrame
The other NDFrame to be compared with the first.

Returns
-------
bool
True if all elements are the same in both NDFrames, False
otherwise.

See Also
--------
Series.eq : Compare two Series objects of the same length
and return a Series where each element is True if the element
in each Series is equal, False otherwise.
DataFrame.eq : Compare two DataFrame objects of the same shape and
return a DataFrame where each element is True if the respective
element in each DataFrame is equal, False otherwise.
numpy.array_equal : Return True if two arrays have the same shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the blank line before this

and elements, False otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add the method being documented to the See Also, only in cases where the docstring is being reused by many methods, and we want those methods to reference each other. But this is not the case here. May be you can still add the pandas.utils.testing.assert_frame_equal and assert_series_equal here, but I don't think it adds value to reference .equals itself.


Notes
-----
This function requires that the elements have the same dtype as their
respective elements in the other DataFrame. However, the column labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the other Series or DataFrame

do not need to have the same type, as long as they are still
considered equal.

Examples
--------
>>> df = pd.DataFrame({1:[0], 0:[1]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing spaces after colons. I think using 0 -> 1 and 1 -> 0 is a bit confusing, may be 1 -> 10 and 2 -> 20 makes the example easier to read?

>>> df
1 0
0 0 1

DataFrames df and exactly_equal have the same types and values for
their elements and column labels, which will return True.

>>> exactly_equal = pd.DataFrame({1:[0], 0:[1]})
>>> exactly_equal
1 0
0 0 1
>>> df.equals(exactly_equal)
True

DataFrames df and different_column_label have the same element
types and values, but have different types for the column labels,
which will still return True.

>>> different_column_label = pd.DataFrame({1.0:[0], 0.0:[1]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different_column_type seems a clearer name to me

>>> different_column_label
1.0 0.0
0 0 1
>>> df.equals(different_column_label)
True

DataFrames df and different_data_type have different types for the
same values for their elements, and will return False even though
their column labels are the same values and types.

>>> different_data_type = pd.DataFrame({1:[0.0], 0:[1.0]})
>>> different_data_type
1 0
0 0.0 1.0
>>> df.equals(different_data_type)
False
"""
if not isinstance(other, self._constructor):
return False
Expand Down