Skip to content

DOC: Improve the docstring of DataFrame.equals() #22539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 5, 2018

Conversation

seantchan
Copy link
Contributor

@seantchan seantchan commented Aug 30, 2018

################################################################################
##################### Docstring (pandas.DataFrame.equals)  #####################
################################################################################

Test whether two DataFrame objects contain the same elements.

This function allows two DataFrame objects to be compared against
each other to see if they same shape and elements. NaNs in the same
location are considered equal. The column headers do not need to have
the same type, but the elements within the columns must be
the same dtype.

Parameters
----------
other : DataFrame
    The other DataFrame to be compared with the first.

Returns
-------
bool
    True if all elements are the same in both DataFrames, False
    otherwise.

See Also
--------
pandas.Series.eq : Compare two Series objects of the same length
    and return a Series where each element is True if the element
    in each Series is equal, False otherwise.

numpy.array_equal : Return True if two arrays have the same shape
    and elements, False otherwise.

Notes
-----
This function requires that the elements have the same dtype as their
respective elements in the other DataFrame. However, the indices do
not need to have the same type, as long as they are still considered
equal.

Examples
--------
>>> a = pd.DataFrame({1:[0], 0:[1]})
>>> b = pd.DataFrame({1.0:[0], 0.0:[1]})

DataFrames a and b have the same element types and values, but have
different types for the indices, which will still return True.

>>> a.equals(b)
True

DataFrames a and c have different types for the same values for their
elements, and will return False even though the indices are the same
values and types.

>>> c = pd.DataFrame({1:[0.0], 0:[1.0]})
>>> a.equals(c)
False

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.DataFrame.equals" correct. :)

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great docstring, added some comments.

Test whether two DataFrame objects contain the same elements.

This function allows two DataFrame objects to be compared against
each other to see if they same shape and elements. NaNs in the same
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they have the same shape...


See Also
--------
pandas.Series.eq : Compare two Series objects of the same length
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the pandas. prefix, just Series.eq is enough.

I think we need eq and equals of both Series and DataFrame, not just Series.eq.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be redundant to mention equals since this is the docstring for that method?

and return a Series where each element is True if the element
in each Series is equal, False otherwise.

numpy.array_equal : Return True if two arrays have the same shape
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the blank line before this

Examples
--------
>>> a = pd.DataFrame({1:[0], 0:[1]})
>>> b = pd.DataFrame({1.0:[0], 0.0:[1]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to be consistent with other docstrings, it'd be good to use df for the main DataFrame the ones in which we'll call the method df.equals(), and something more descriptive for the other (e.g. exactly_equal, different_data_type, different_index_type...)

After creating the data, it'd be nice to display the content (i.e. >>> df)

Not sure in this case if it'd be better or worse, but in general we try to have somehow real-world (meaningful) data for the examples.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used the same data in the updated PR - felt with text in order to use meaningful data it would be harder/more confusing to demonstrate. Can change that if needed.

>>> b = pd.DataFrame({1.0:[0], 0.0:[1]})

DataFrames a and b have the same element types and values, but have
different types for the indices, which will still return True.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"indices" are the row labels, which are the same, in this case the different type is in the column labels.

@@ -1303,8 +1303,59 @@ def __invert__(self):

def equals(self, other):
"""
Determines if two NDFrame objects contain the same elements. NaNs in
the same location are considered equal.
Test whether two DataFrame objects contain the same elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docstring is used for both Series and DataFrame, we should be generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the DataFrame references in the docstring to NDFrame as originally written, however the validation script returned an error (Private classes (['NDFrame']) should not be mentioned in public docstring.). The updated PR still has NDFrame but can change it to something else (Series/DataFrame?) if that's better.

@codecov
Copy link

codecov bot commented Sep 3, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@a5fe9cf). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #22539   +/-   ##
=========================================
  Coverage          ?   92.04%           
=========================================
  Files             ?      169           
  Lines             ?    50787           
  Branches          ?        0           
=========================================
  Hits              ?    46745           
  Misses            ?     4042           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.45% <ø> (?)
#single 42.29% <ø> (?)
Impacted Files Coverage Δ
pandas/core/generic.py 96.44% <ø> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a5fe9cf...9730d0b. Read the comment docs.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring looks really good. Just some comments about technicalities.

@@ -1303,8 +1303,81 @@ def __invert__(self):

def equals(self, other):
"""
Determines if two NDFrame objects contain the same elements. NaNs in
the same location are considered equal.
Test whether two NDFrame objects contain the same elements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can leave Test whether two objects contain the same elements. for now.

The problem is that NDFrame is not a class we expect users to know about (it's private). So, it shouldn't be used in the documentation. Series or DataFrame or object are probably what we want to use in most cases.

return a DataFrame where each element is True if the respective
element in each DataFrame is equal, False otherwise.
numpy.array_equal : Return True if two arrays have the same shape
and elements, False otherwise.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We add the method being documented to the See Also, only in cases where the docstring is being reused by many methods, and we want those methods to reference each other. But this is not the case here. May be you can still add the pandas.utils.testing.assert_frame_equal and assert_series_equal here, but I don't think it adds value to reference .equals itself.

Notes
-----
This function requires that the elements have the same dtype as their
respective elements in the other DataFrame. However, the column labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the other Series or DataFrame


Examples
--------
>>> df = pd.DataFrame({1:[0], 0:[1]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing spaces after colons. I think using 0 -> 1 and 1 -> 0 is a bit confusing, may be 1 -> 10 and 2 -> 20 makes the example easier to read?

types and values, but have different types for the column labels,
which will still return True.

>>> different_column_label = pd.DataFrame({1.0:[0], 0.0:[1]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

different_column_type seems a clearer name to me

@pep8speaks
Copy link

Hello @seantchan! Thanks for updating the PR.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, really good docstring, thanks for the work on this @seantchan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Update docstring of DataFrame/Series .equals() to explain the behavior with indices
4 participants