Skip to content

DOC: docstring to series.unique #20474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 27, 2018
Merged

Conversation

minggli
Copy link
Contributor

@minggli minggli commented Mar 23, 2018

seems that no one has picked this up. moving _shared_doc['unique'] used by Series.unique() only to Series.unique.

################################################################################
####################### Docstring (pandas.Series.unique) #######################
################################################################################

Return unique values of Series object.

Uniques are returned in order of appearance. Hash table-based unique,
therefore does NOT sort.

Returns
-------
unique values.
  - If the input is an Index, the return is an Index
  - If the input is a Categorical dtype, the return is a Categorical
  - If the input is a Series/ndarray, the return will be an ndarray

See Also
--------
unique : return unique values of 1d array-like objects.
Index.unique : return Index with unique values from an Index object.

Examples
--------
>>> pd.Series([2, 1, 3, 3], name='A').unique()
array([2, 1, 3])

>>> pd.Series([2] + [1] * 5).unique()
array([2, 1])

>>> pd.Series([pd.Timestamp('20160101') for _ in range(3)]).unique()
array(['2016-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

>>> pd.Series([pd.Timestamp('20160101', tz='US/Eastern')                        for _ in range(3)]).unique()
array([Timestamp('2016-01-01 00:00:00-0500', tz='US/Eastern')],
      dtype=object)

An unordered Categorical will return categories in the order of
appearance.

>>> pd.Series(pd.Categorical(list('baabc'))).unique()
[b, a, c]
Categories (3, object): [b, a, c]

An ordered Categorical preserves the category ordering.

>>> pd.Series(pd.Categorical(list('baabc'), categories=list('abc'),                                      ordered=True)).unique()
[b, a, c]
Categories (3, object): [a < b < c]

################################################################################
################################## Validation ##################################
################################################################################

Docstring for "pandas.Series.unique" correct. :)

@minggli minggli changed the title remove _shared_docs and add doctring to series.unique DOC: docstring to series.unique Mar 23, 2018
def unique(self):
"""
Return unique values in the object. Uniques are returned in order
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a single line. Can you simplify? You can use an extended summary for the bit about uniques.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

broken into two lines. and minor rewording.

def unique(self):
"""
Return unique values in the object. Uniques are returned in order
of appearance, this does NOT sort. Hash table-based unique.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure hash table-based unique will be meaningful to many users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree but that's the same in pandas.unique. I think it makes sense to people who understand data structure, and for people who don't understand it there is enough clarification around it. It's important to let people to know it's hash table so no sorting.

unique
Index.unique
Series.unique
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add examples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aded. and removed self reference in See Also and added descriptions.

@codecov
Copy link

codecov bot commented Mar 24, 2018

Codecov Report

Merging #20474 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20474      +/-   ##
==========================================
- Coverage   91.85%   91.82%   -0.03%     
==========================================
  Files         152      152              
  Lines       49231    49248      +17     
==========================================
+ Hits        45220    45224       +4     
- Misses       4011     4024      +13
Flag Coverage Δ
#multiple 90.21% <ø> (-0.03%) ⬇️
#single 41.89% <ø> (+0.05%) ⬆️
Impacted Files Coverage Δ
pandas/core/series.py 93.84% <ø> (ø) ⬆️
pandas/core/base.py 96.79% <ø> (ø) ⬆️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️
pandas/core/arrays/categorical.py 96.19% <0%> (-0.02%) ⬇️
pandas/core/indexes/datetimes.py 95.73% <0%> (-0.01%) ⬇️
pandas/core/strings.py 98.32% <0%> (ø) ⬆️
pandas/core/generic.py 95.85% <0%> (ø) ⬆️
pandas/core/dtypes/missing.py 91.07% <0%> (ø) ⬆️
pandas/core/reshape/reshape.py 100% <0%> (ø) ⬆️
pandas/core/indexes/period.py 92.61% <0%> (ø) ⬆️
... and 5 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4fb963b...4931fcf. Read the comment docs.

@minggli
Copy link
Contributor Author

minggli commented Mar 24, 2018

@TomAugspurger comments welcome 👍

@jreback jreback added the Docs label Mar 24, 2018
def unique(self):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be worth trying to share this doc-string with pd.unique (at least the examples) no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point but they have some differences:

  • pd.unique takes param and Series.unique doesn't take.
  • pd.unique handles 1d array-like objects including Index and Series.unique applies on self.
  • pd.unique examples contain more and Series.unique only series examples

do we have pattern somewhere in regards to conditionally show docstring lines?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/pandas-dev/pandas/pull/20361/files is doing something similar for factorize. It's somewhat complex, since we have pd.unique, Series/Index.unique, and Categorical.unique. I'd be OK with improving the docstring here, and merging theme later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, this PR is mainly about improve docstring. and should be another PR synthesising docs of all unique methods.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fine too


See Also
--------
unique : return unique values of 1d array-like objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include pandas. for this one. Otherwise it's unclear what's being referenced. The explanation can say "Top-level unique method for any 1-d array-like object."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


An ordered Categorical preserves the category ordering.

>>> pd.Series(pd.Categorical(list('baabc'), categories=list('abc'), \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the \ here, and start the line with ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

>>> pd.Series([pd.Timestamp('20160101') for _ in range(3)]).unique()
array(['2016-01-01T00:00:00.000000000'], dtype='datetime64[ns]')

>>> pd.Series([pd.Timestamp('20160101', tz='US/Eastern') \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the \ here, and start the line with ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Returns
-------
unique values.
- If the input is an Index, the return is an Index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part can be simplified, since it is now only the docstring of Series.unique.
So there is also no "input"

In this case it is always an array, except for categorical dtype.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@jreback jreback added this to the 0.23.0 milestone Mar 26, 2018
@minggli
Copy link
Contributor Author

minggli commented Mar 27, 2018

@jorisvandenbossche does it look good?


Returns
-------
unique values : Series or Categorical
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's never a Series, only an numpy array or Categorical. I would also include something about that like

ndarray or Categorical
    The unique values returned as a NumPy array. In case of categorical data type, returned as a Categorical

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

@jorisvandenbossche jorisvandenbossche merged commit 23cf851 into pandas-dev:master Mar 27, 2018
@minggli minggli deleted the doc/unique branch March 27, 2018 20:51
@jorisvandenbossche
Copy link
Member

@minggli Thanks a lot!

@minggli
Copy link
Contributor Author

minggli commented Mar 27, 2018

Thanks @jorisvandenbossche

javadnoorb pushed a commit to javadnoorb/pandas that referenced this pull request Mar 29, 2018
dworvos pushed a commit to dworvos/pandas that referenced this pull request Apr 2, 2018
kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Series.unique docstring can be moved to series.py
4 participants