Skip to content

DOC: update the Series.str.join docstring #20463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 27, 2018

Conversation

fdroessler
Copy link
Contributor

@fdroessler fdroessler commented Mar 22, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • PR title is "DOC: update the docstring"
  • The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • The html version looks good: python doc/make.py --single <your-function-or-method>
  • It has been proofread on language by another sprint participant

Please include the output of the validation script below between the "```" ticks:

################################################################################
###################### Docstring (pandas.Series.str.join) ######################
################################################################################

Join lists contained as elements in the Series/Index with passed delimiter.

If the elements of a Series are lists themselves, join the content of these
lists using the delimiter passed to the function.
This function is an equivalent to :meth:`str.join`.

Parameters
----------
sep : str
    Delimiter to use between list entries.

Returns
-------
Series/Index of objects

Notes
-----
If any of the lists does not contain string objects the result of the join
will be `NaN`.

See Also
--------
str.join : Standard library version of this method.
Series.str.split : Split strings around given separator/delimiter.

Examples
--------

Example with a list that contains non-string elements.

>>> s = pd.Series([['lion', 'elephant', 'zebra'],
...                [1.1, 2.2, 3.3],
...                ["cat", np.nan, "dog"],
...                ["cow", 4.5, "goat"]])
>>> s
0    [lion, elephant, zebra]
1            [1.1, 2.2, 3.3]
2            [cat, nan, dog]
3           [cow, 4.5, goat]
dtype: object

Join all lists using an '-', the list of floats will become a NaN.

>>> s.str.join('-')
0    lion-elephant-zebra
1                    NaN
2                    NaN
3                    NaN
dtype: object

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Series.str.join" correct. :)

If the validation script still gives errors, but you think there is a good reason
to deviate in this case (and there are certainly such cases), please state this
explicitly.

@codecov
Copy link

codecov bot commented Mar 23, 2018

Codecov Report

Merging #20463 into master will increase coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20463      +/-   ##
==========================================
+ Coverage    91.8%   91.85%   +0.04%     
==========================================
  Files         152      152              
  Lines       49223    49231       +8     
==========================================
+ Hits        45191    45220      +29     
+ Misses       4032     4011      -21
Flag Coverage Δ
#multiple 90.23% <ø> (+0.04%) ⬆️
#single 41.83% <ø> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/strings.py 98.32% <ø> (ø) ⬆️
pandas/core/arrays/categorical.py 96.2% <0%> (-0.02%) ⬇️
pandas/core/frame.py 97.18% <0%> (ø) ⬆️
pandas/core/generic.py 95.85% <0%> (ø) ⬆️
pandas/plotting/_core.py 82.5% <0%> (ø) ⬆️
pandas/io/formats/csvs.py 98.13% <0%> (+0.08%) ⬆️
pandas/io/parsers.py 95.45% <0%> (+0.12%) ⬆️
pandas/core/groupby.py 92.55% <0%> (+0.41%) ⬆️
pandas/util/testing.py 84.73% <0%> (+0.61%) ⬆️
pandas/io/common.py 70.04% <0%> (+1.26%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 02477da...323d072. Read the comment docs.

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just a comment about naming conventions, and an idea to make examples more compact and more illustrative.

Examples
--------

>>> df = pd.Series({1: ['dog', 'cat', 'fish'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename df to s. df is used for DataFrame, and makes it a bit confusing.


Example with a list that contains non-string elements.

>>> df = pd.Series({1: [1.1, 2.2, 3.3],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before.

1 NaN
2 lion-elephant-zebra
3 cow-pig-chicken
dtype: object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this second example, the first one is actually a bit redundant. As this exemplifies both cases, strings and not strings.

And I think we could even show in one of the rows the floats, another the strings (both as you did), and use the third one to illustrate a list with a NaN, which I assume it returns a NaN, but it may be not obvious for all users.

@fdroessler
Copy link
Contributor Author

Thanks for the feedback @datapythonista have updated the PR

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, added couple of ideas.


>>> s = pd.Series({1: ['lion', 'elephant', 'zebra'],
... 2: [1.1, 2.2, 3.3],
... 3: [np.nan, np.nan, np.nan]})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment, but I think it'd be more useful for users to see that joining ['cat', np.nan, 'dog'] becomes NaN, than that joining some NaNs become NaN. That could also apply in the number example ['cat', 23, 'dog']. Feel free to disagree if you think it's more clear with unique types.

Also, it's something very subtle, but I find slightly distracting using a specific index in the example, that is not used. It surely doesn't make a big difference, but I'd construct the Series with a list of lists, and use the default index, so nobody thinks the index has an impact on joining the elements.

3 [nan, nan, nan]
dtype: object

Join all lists using an '-', the list of floats will become a NaN.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment needs to be updated after adding the NaN?


Returns
-------
joined : Series/Index of objects
Series/Index of objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use object instead of objects, as this is more a type definition than an explanations (may be one day we can use these types as annotations?)

@fdroessler
Copy link
Contributor Author

good points, thanks a lot 👍

Copy link
Member

@datapythonista datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jreback jreback added Docs Strings String extension data type and string data labels Mar 26, 2018
@TomAugspurger TomAugspurger merged commit 89444ad into pandas-dev:master Mar 27, 2018
@TomAugspurger
Copy link
Contributor

Thanks @fdroessler !

@TomAugspurger TomAugspurger added this to the 0.23.0 milestone Mar 27, 2018
javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018
dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018
kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs Strings String extension data type and string data
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants