-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
CLN: Moving Series.rank and DataFrame.rank to generic.py #11924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1842,35 +1842,6 @@ def argsort(self, axis=0, kind='quicksort', order=None): | |||
np.argsort(values, kind=kind), index=self.index, | |||
dtype='int64').__finalize__(self) | |||
|
|||
def rank(self, method='average', na_option='keep', ascending=True, | |||
pct=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should have the signature:
def rank(self, method='average', axis=0, numeric_only=None, na_option='keep', ascending=True, pct=False)
this will be more consistent with other methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've modified the signature
need a whatsnew entry to explain this change (can be pretty short, just demonstrating the old/new) |
@@ -247,6 +247,8 @@ Subtraction by ``Timedelta`` in a ``Series`` by a ``Timestamp`` works (:issue:`1 | |||
ser | |||
pd.Timestamp('2012-01-01') - ser | |||
|
|||
- ``Series.rank()``, ``DataFrame.rank()`` and their groupby equivalents have a new, common signature ``rank(self, method='average', axis=0, numeric_only=None, na_option='keep', ascending=True, pct=False)``. (:issue:`11759`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this in a sub-section and show usage (of the changes, mainly to DataFrame).
Is it ok like that? |
|
||
# this method does not make sense when ndim > 2 | ||
if self.ndim > 2: | ||
raise NotImplementedError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
give a nice message
I've changed the error message, and removed the check for |
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
||
The **new** signature is ``.rank(method='average', axis=0, numeric_only=None, na_option='keep', ascending=True, pct=False)``. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche what do you think?
might be more useful to have df.rank(1)
allowed (and use the original DataFrame
signature, as probably more useful then specifying the
method``
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and this is more consistent with things like df.sum(1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about Series.rank
then? If axis
comes first, then it means the first parameter is the least useful...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this is what I did for sorting, so precedent for what you have here as well.
In [1]: Series.sort_values?
Type: instancemethod
String form: <unbound method Series.sort_values>
File: c:\miniconda\envs\neat\lib\site-packages\pandas\core\series.py
Definition: Series.sort_values(self, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')
Docstring:
Sort by the values along either axis
.. versionadded:: 0.17.0
Parameters
----------
by : string name or list of names which refer to the axis items
axis : index to direct sorting
ascending : bool or list of bool
Sort ascending vs. descending. Specify list for multiple sort orders.
If this is a list of bools, must match the length of the by
inplace : bool
if True, perform operation in-place
kind : {`quicksort`, `mergesort`, `heapsort`}
Choice of sorting algorithm. See also ndarray.np.sort for more information.
`mergesort` is the only stable algorithm. For DataFrames, this option is
only applied when sorting on a single column or label.
na_position : {'first', 'last'}
`first` puts NaNs at the beginning, `last` puts NaNs at the end
Returns
-------
sorted_obj : Series
can you rebase/update |
|
||
Parameters | ||
---------- | ||
axis: {index (0), columns (1)), default 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use the same type description as before in the dataframe docstring? ({0 or 'index', 1 or 'columns'}, default 0
) -> quotes to indicate it is a string argument
@jorisvandenbossche i've changed the docstring as you suggested, and added a test to check if |
Uh?
I don't understand. The same error appeared in the pull request #11582. Is it me, or is this another issue? |
neh, just back network connectivity with Travis, ignore / try again |
It's my lucky day
|
I must have jumped in a parallel universe at some point, the tests used to have failed... @jreback all green now |
sorry, no parallel universe, I just restarted the test .. :-) |
1 2 | ||
dtype: float64 | ||
|
||
In [4]: pd.DataFrame([0,1]).rank(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True, pct=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrap this so it lines up nicely ([3]) as well
minor comments. ping when pushed and green. |
All green |
thanks! |
I'm pulling this out from PR #11918. It is a first step toward solving issue #11759
The signature of
Series.rank
changes, and is now the same asDataFrame.rank
. This may cause some problems...