Skip to content

DOC: update the Series.str.join docstring #22174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Aug 8, 2018
Merged
Changes from 5 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 10 additions & 9 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -1105,12 +1105,17 @@ def str_join(arr, sep):
Returns
-------
Series/Index: object
The list entries concatenated by intervening occurrences of the
delimiter.

Notes
-----
If any of the lists does not contain string objects the result of the join
If any of the list items is not a string object, the result of the join
will be `NaN`.

If the Series does not contain string objects, performing a join will raise
an AttributeError.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's not very clear. Do you mean that if the Series elements are not lists or NaN the join operation will raise an AttributeError?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I agree. How about:
"If the supplied Series does not contain string objects, performing a join will raise an AttributeError.

s = pd.Series([1.1, np.nan]) # applying str.join raises an AttributeError
s = pd.Series([np.nan, np.nan]) # ditto
s = pd.Series([1.1, np.nan, ['swan', 'fish']]) # applying str.join produces something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be accurate, I'd say something like If the supplied Series does not contain strings or lists, performing a join will raise an AttributeError.

>>> import pandas
>>> pandas.Series(['foo', 'bar']).str.join('')
0    foo
1    bar
dtype: object

And if you think it's useful, you can add an example with a case when it raises the exception, in the Examples section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👌 We asked the question of mentioning exceptions last night - of the two examples below, which is the preferred method here?

Appreciate all your guidance on this, I'm learning a lot! Thank you.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't pay so much attention to the Raises section, to not add extra complexity, but it's surely a nice to have. So feel free to add it.

Notes is mainly used for implementation details. In this cases Raises can be more appropriate. I'd use Notes for example to explain that adding a NaN to a int column will convert it to float, and this kind of stuff.

Examples are usually the best way to illustrate things. But we don't want to get crazy with them either. Specially in docstrings, that being mixed with the code, can be annoying if too long. In this case, feel free to choose whether you think it's useful to show with an example a case where the AttributeError is raised. Or to omit it if you don't think it add much value. Up to you.


See Also
--------
str.join : Standard library version of this method.
Expand All @@ -1122,21 +1127,17 @@ def str_join(arr, sep):
Example with a list that contains non-string elements.

>>> s = pd.Series([['lion', 'elephant', 'zebra'],
... [1.1, 2.2, 3.3],
... ['cat', np.nan, 'dog'],
... ['cow', 4.5, 'goat']
... ['duck', ['swan', 'fish'], 'guppy']])
... [1.1, 2.2, 3.3],
... ['cat', np.nan, 'dog'],
... ['cow', 4.5, 'goat'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch of the typo, but I think the indentaion was correct before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

D'oh! Thank you.

... ['duck', ['swan', 'fish'], 'guppy']])
>>> s
0 [lion, elephant, zebra]
1 [1.1, 2.2, 3.3]
2 [cat, nan, dog]
3 [cow, 4.5, goat]
4 [duck, [swan, fish], guppy]
dtype: object

Join all lists using an '-', the lists containing object(s) of types other
than str will become a NaN.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment was useful, any particular reason to remove it?_

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We removed it after misdiagnosing an issue, I will reinstate it. Thank you!


>>> s.str.join('-')
0 lion-elephant-zebra
1 NaN
Expand Down