Skip to content

DOC: Improve the docstring of DataFrame.equals() #22539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 5, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 53 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1303,8 +1303,59 @@ def __invert__(self):

def equals(self, other):
"""
Determines if two NDFrame objects contain the same elements. NaNs in
the same location are considered equal.
Test whether two DataFrame objects contain the same elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is used for both Series and DataFrame, we should be generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the DataFrame references in the docstring to NDFrame as originally written, however the validation script returned an error (Private classes (['NDFrame']) should not be mentioned in public docstring.). The updated PR still has NDFrame but can change it to something else (Series/DataFrame?) if that's better.


This function allows two DataFrame objects to be compared against
each other to see if they same shape and elements. NaNs in the same
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they have the same shape...

location are considered equal. The column headers do not need to have
the same type, but the elements within the columns must be
the same dtype.

Parameters
----------
other : DataFrame
The other DataFrame to be compared with the first.

Returns
-------
bool
True if all elements are the same in both DataFrames, False
otherwise.

See Also
--------
pandas.Series.eq : Compare two Series objects of the same length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the pandas. prefix, just Series.eq is enough.

I think we need eq and equals of both Series and DataFrame, not just Series.eq.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be redundant to mention equals since this is the docstring for that method?

and return a Series where each element is True if the element
in each Series is equal, False otherwise.

numpy.array_equal : Return True if two arrays have the same shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the blank line before this

and elements, False otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add the method being documented to the See Also, only in cases where the docstring is being reused by many methods, and we want those methods to reference each other. But this is not the case here. May be you can still add the pandas.utils.testing.assert_frame_equal and assert_series_equal here, but I don't think it adds value to reference .equals itself.


Notes
-----
This function requires that the elements have the same dtype as their
respective elements in the other DataFrame. However, the indices do
not need to have the same type, as long as they are still considered
equal.

Examples
--------
>>> a = pd.DataFrame({1:[0], 0:[1]})
>>> b = pd.DataFrame({1.0:[0], 0.0:[1]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent with other docstrings, it'd be good to use df for the main DataFrame the ones in which we'll call the method df.equals(), and something more descriptive for the other (e.g. exactly_equal, different_data_type, different_index_type...)

After creating the data, it'd be nice to display the content (i.e. >>> df)

Not sure in this case if it'd be better or worse, but in general we try to have somehow real-world (meaningful) data for the examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the same data in the updated PR - felt with text in order to use meaningful data it would be harder/more confusing to demonstrate. Can change that if needed.


DataFrames a and b have the same element types and values, but have
different types for the indices, which will still return True.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"indices" are the row labels, which are the same, in this case the different type is in the column labels.


>>> a.equals(b)
True

DataFrames a and c have different types for the same values for their
elements, and will return False even though the indices are the same
values and types.

>>> c = pd.DataFrame({1:[0.0], 0:[1.0]})
>>> a.equals(c)
False
"""
if not isinstance(other, self._constructor):
return False
Expand Down