Skip to content

DOC: Update the Series.str.len docstring #22187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 9, 2018
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 45 additions & 2 deletions pandas/core/strings.py
Original file line number Diff line number Diff line change
Expand Up @@ -2801,11 +2801,54 @@ def rindex(self, sub, start=0, end=None):
return self._wrap_result(result)

_shared_docs['len'] = ("""
Compute length of each string in the Series/Index.
Compute length of each element in the Series/Index. The element may be
a sequence (such as a string, tuple or list) or a collection
(such as a dictionary).

Returns
-------
lengths : Series/Index of integer values
Series or Index of integer values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use Python types, Series or Index of int would be more consistent with the rest of the docstrings.

A Series or Index of integer values indicating the length of each
element in the Series or Index.

See Also
--------
str.len : Python built-in function returning the length of an object.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about adding Series.size here? I think some users could come by mistake to this page looking how to check the length of the Series. And it'd be nice to help them find it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! Included in new commit.


Examples
--------

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this blank line here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right - I have taken it out

Returning a series of integer values as floats when `NaN` is returned as a
result.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a useful comment, but not sure if very relevant in this context (the user checking the examples of the str.len). As this is a general pandas "problem", I'd simply omit it here. But if you think it's important in this page, the Notes section is probably a better place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! A user can still see that the values are integers even if they happen to be floats.


>>> s = pd.Series(['dog', 5, 'bird', np.nan])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor thing, but feels like if we're trying to confuse the user by not placing the two string values one after the other. And may be I'd show the empty string '' instead of one of the animals, that they are illustrating the same case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have changed the structure of the consolidated example to incorporate an empty string.

>>> s
0 dog
1 5
2 bird
3 NaN
dtype: object
>>> s.str.len()
0 3.0
1 NaN
2 4.0
3 NaN
dtype: float64

Returning the length (number of entries) of dictionaries, lists or
tuples as integer values.

>>> s = pd.Series([{'foo':'bar'}, [2,3,5,7], ('one', 'two', 'three')])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing spaces after colon and commas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes - thanks for spotting! Have changed in new commit.

>>> s
0 {'foo': 'bar'}
1 [1, 3, 5, 7]
2 (one, two, three)
dtype: object
>>> s.str.len()
0 1
1 4
2 3
dtype: int64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just create a single Series s with the elements of both examples. I think that would make the example more concise, and it'd still illustrate the same cases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Changed in new commit.

""")
len = _noarg_wrapper(len, docstring=_shared_docs['len'], dtype=int)

Expand Down