Skip to content

DOC: fix DataFrame.isin docstring and doctests #22767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Sep 25, 2018
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ci/doctests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ if [ "$DOCTEST" ]; then

# DataFrame / Series docstrings
pytest --doctest-modules -v pandas/core/frame.py \
-k"-axes -combine -isin -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_dict -to_stata"
-k"-axes -combine -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_dict -to_stata"

if [ $? -ne "0" ]; then
RET=1
Expand Down
59 changes: 34 additions & 25 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -7451,52 +7451,61 @@ def to_period(self, freq=None, axis=0, copy=True):

def isin(self, values):
"""
Return boolean DataFrame showing whether each element in the
DataFrame is contained in values.
Whether each element in the DataFrame is contained in values.

Parameters
----------
values : iterable, Series, DataFrame or dictionary
values : iterable, Series, DataFrame or dict
The result will only be true at a location if all the
labels match. If `values` is a Series, that's the index. If
`values` is a dictionary, the keys must be the column names,
`values` is a dict, the keys must be the column names,
which must match. If `values` is a DataFrame,
then both the index and column labels must match.

Returns
-------
DataFrame
DataFrame of boolean showing whether each element in the DataFrame
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean -> booleans

is contained in values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a See Also section? I think DataFrame.eq and Series.str.isin are worth referencing.

DataFrame of booleans
See Also
--------
DataFrame.eq: Equality test for DataFrame.
Series.isin: Equivalent method on Series.

Examples
--------

>>> df = pd.DataFrame({'num_legs': [2, 4], 'num_wings': [2, 0]},
... index=['falcon', 'dog'])
>>> df
num_legs num_wings
falcon 2 2
dog 4 0

When ``values`` is a list:

>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> df.isin([1, 3, 12, 'a'])
A B
0 True True
1 False False
2 True False
>>> df.isin([2])
num_legs num_wings
falcon True True
dog False False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about using df.isin([0, 2])? I like the example, but I don't like showing just the case with one value in values, as it'd be better in that case to simply do df == 2.

Also, I think adding a bit more description would help (e.g. When values is a list, check whether every value in the DataFrame is present in the list (which animals have 0 or 2 legs or wings))


When ``values`` is a dict:
When ``values`` is a dict.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add something to this sentence to explain what is happening when values is a dict. Something like "When values is a dict, we can pass values to check for each column separately:"

(similar for the example below as well)


>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [1, 4, 7]})
>>> df.isin({'A': [1, 3], 'B': [4, 7, 12]})
A B
0 True False # Note that B didn't match the 1 here.
1 False True
2 True True
>>> df.isin({'num_wings': [0, 3], 'num_legs': [0]})
num_legs num_wings
falcon False False
dog False True

When ``values`` is a Series or DataFrame:
When ``values`` is a Series or DataFrame. Note that 'falcon' does not
match based on the number of legs in df2.

>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
>>> df2 = pd.DataFrame({'A': [1, 3, 3, 2], 'B': ['e', 'f', 'f', 'e']})
>>> df2 = pd.DataFrame({'num_legs': [8, 0, 2], 'num_wings': [0, 2, 2]},
... index=['spider', 'falcon', 'parrot'])
>>> df.isin(df2)
A B
0 True False
1 False False # Column A in `df2` has a 3, but not at index 1.
2 True True
num_legs num_wings
falcon False True
dog False False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the example, but I don't like that we have to use wrong information (a falcon without legs) ;)

Not sure if better of worse, but what do you think about using a DataFrame with just the num_wings column instead? And may be we can get rid of the parrot, and show df2 content, so it's clearer to see what's going on?

Also, small detail, but I think we used other instead of df2 in other docstrings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see :). No problem I agree with the changes!

"""
if isinstance(values, dict):
from pandas.core.reshape.concat import concat
Expand Down