CLN: Moving Series.rank and DataFrame.rank to generic.py #11924

nbonnotte · 2015-12-29T22:27:20Z

I'm pulling this out from PR #11918. It is a first step toward solving issue #11759

The signature of Series.rank changes, and is now the same as DataFrame.rank. This may cause some problems...

jreback · 2015-12-29T22:33:48Z

pandas/core/series.py

@@ -1842,35 +1842,6 @@ def argsort(self, axis=0, kind='quicksort', order=None):
                np.argsort(values, kind=kind), index=self.index,
                dtype='int64').__finalize__(self)

-    def rank(self, method='average', na_option='keep', ascending=True,
-             pct=False):


I think that we should have the signature:

def rank(self, method='average', axis=0, numeric_only=None, na_option='keep', ascending=True, pct=False)

this will be more consistent with other methods.

I've modified the signature

jreback · 2015-12-29T22:34:29Z

need a whatsnew entry to explain this change (can be pretty short, just demonstrating the old/new)

jreback · 2016-01-11T13:51:10Z

doc/source/whatsnew/v0.18.0.txt

@@ -247,6 +247,8 @@ Subtraction by ``Timedelta`` in a ``Series`` by a ``Timestamp`` works (:issue:`1
   ser
   pd.Timestamp('2012-01-01') - ser

+- ``Series.rank()``, ``DataFrame.rank()`` and their groupby equivalents have a new, common signature ``rank(self, method='average', axis=0, numeric_only=None, na_option='keep', ascending=True, pct=False)``. (:issue:`11759`)


can you add this in a sub-section and show usage (of the changes, mainly to DataFrame).

nbonnotte · 2016-01-14T08:43:51Z

Is it ok like that?

jreback · 2016-01-14T12:50:33Z

pandas/core/generic.py

+
+        # this method does not make sense when ndim > 2
+        if self.ndim > 2:
+            raise NotImplementedError


give a nice message

nbonnotte · 2016-01-16T08:11:58Z

I've changed the error message, and removed the check for Series.

jreback · 2016-01-16T15:48:04Z

doc/source/whatsnew/v0.18.0.txt

+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The **new** signature is ``.rank(method='average', axis=0, numeric_only=None, na_option='keep', ascending=True, pct=False)``.
+


@jorisvandenbossche what do you think?

might be more useful to have df.rank(1) allowed (and use the original DataFrame signature, as probably more useful then specifying themethod``

and this is more consistent with things like df.sum(1)

What about Series.rank then? If axis comes first, then it means the first parameter is the least useful...

hmm, this is what I did for sorting, so precedent for what you have here as well.

In [1]: Series.sort_values? Type: instancemethod String form: <unbound method Series.sort_values> File: c:\miniconda\envs\neat\lib\site-packages\pandas\core\series.py Definition: Series.sort_values(self, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last') Docstring: Sort by the values along either axis .. versionadded:: 0.17.0 Parameters ---------- by : string name or list of names which refer to the axis items axis : index to direct sorting ascending : bool or list of bool Sort ascending vs. descending. Specify list for multiple sort orders. If this is a list of bools, must match the length of the by inplace : bool if True, perform operation in-place kind : {`quicksort`, `mergesort`, `heapsort`} Choice of sorting algorithm. See also ndarray.np.sort for more information. `mergesort` is the only stable algorithm. For DataFrames, this option is only applied when sorting on a single column or label. na_position : {'first', 'last'} `first` puts NaNs at the beginning, `last` puts NaNs at the end Returns ------- sorted_obj : Series

jreback · 2016-01-24T22:18:07Z

can you rebase/update

jorisvandenbossche · 2016-01-26T13:14:46Z

pandas/core/generic.py

+
+        Parameters
+        ----------
+        axis: {index (0), columns (1)), default 0


can you use the same type description as before in the dataframe docstring? ({0 or 'index', 1 or 'columns'}, default 0) -> quotes to indicate it is a string argument

nbonnotte · 2016-01-26T14:19:58Z

@jorisvandenbossche i've changed the docstring as you suggested, and added a test to check if .rank(axis=0) and .rank(axis='index') on the one hand, and .rank(axis=1) and .rank(axis='columns') on the other hand, are equivalent.

nbonnotte · 2016-01-26T15:51:49Z

Uh?

ERROR: test_get_underlying_price (pandas.io.tests.test_data.TestYahooOptions)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/util/testing.py", line 1831, in wrapper
    return t(*args, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/io/tests/test_data.py", line 402, in test_get_underlying_price
    quote_price = options_object._underlying_price_from_root(root)
  File "/home/travis/build/pydata/pandas/pandas/io/data.py", line 739, in _underlying_price_from_root
    underlying_price = root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\
IndexError: list index out of range

I don't understand.

The same error appeared in the pull request #11582. Is it me, or is this another issue?

jreback · 2016-01-26T15:56:56Z

neh, just back network connectivity with Travis, ignore / try again

nbonnotte · 2016-01-27T08:35:43Z

It's my lucky day

======================================================================
ERROR: test_get_goog_volume (pandas.io.tests.test_data.TestGoogle)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/indexes/base.py", line 1900, in get_value
    return tslib.get_value_box(s, key)
  File "pandas/tslib.pyx", line 807, in pandas.tslib.get_value_box (pandas/tslib.c:17638)
    cpdef object get_value_box(ndarray arr, object loc):
  File "pandas/tslib.pyx", line 816, in pandas.tslib.get_value_box (pandas/tslib.c:17306)
    i = <Py_ssize_t> loc
TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tseries/index.py", line 1344, in get_value
    return _maybe_box(self, Index.get_value(self, series, key),
  File "/home/travis/build/pydata/pandas/pandas/indexes/base.py", line 1908, in get_value
    raise e1
  File "/home/travis/build/pydata/pandas/pandas/indexes/base.py", line 1894, in get_value
    return self._engine.get_value(s, k)
  File "pandas/index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas/index.c:3204)
    cpdef get_value(self, ndarray arr, object key):
  File "pandas/index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas/index.c:2903)
    loc = self.get_loc(key)
  File "pandas/index.pyx", line 575, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:10977)
    self._date_check_type(val)
  File "pandas/index.pyx", line 581, in pandas.index.DatetimeEngine._date_check_type (pandas/index.c:11145)
    raise KeyError(val)
KeyError: 'JAN-02-2015'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pandas/index.pyx", line 565, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:10766)
    return self.mapping.get_item(val.value)
  File "pandas/hashtable.pyx", line 303, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6527)
    cpdef get_item(self, int64_t val):
  File "pandas/hashtable.pyx", line 309, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6465)
    raise KeyError(val)
KeyError: 1420156800000000000

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tseries/index.py", line 1354, in get_value
    return self.get_value_maybe_box(series, key)
  File "/home/travis/build/pydata/pandas/pandas/tseries/index.py", line 1364, in get_value_maybe_box
    values = self._engine.get_value(_values_from_object(series), key)
  File "pandas/index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas/index.c:3204)
    cpdef get_value(self, ndarray arr, object key):
  File "pandas/index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas/index.c:2903)
    loc = self.get_loc(key)
  File "pandas/index.pyx", line 567, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:10832)
    raise KeyError(val)
KeyError: Timestamp('2015-01-02 00:00:00')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/util/testing.py", line 1831, in wrapper
    return t(*args, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/io/tests/test_data.py", line 93, in test_get_goog_volume
    self.assertEqual(df.Volume.ix['JAN-02-2015'], 1446662)
  File "/home/travis/build/pydata/pandas/pandas/core/indexing.py", line 77, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/home/travis/build/pydata/pandas/pandas/core/indexing.py", line 993, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/home/travis/build/pydata/pandas/pandas/core/indexing.py", line 88, in _get_label
    return self.obj[label]
  File "/home/travis/build/pydata/pandas/pandas/core/series.py", line 558, in __getitem__
    result = self.index.get_value(self, key)
  File "/home/travis/build/pydata/pandas/pandas/tseries/index.py", line 1356, in get_value
    raise KeyError(key)
KeyError: 'JAN-02-2015'
----------------------------------------------------------------------
Ran 9254 tests in 337.827s
FAILED (SKIP=152, errors=1)

nbonnotte · 2016-01-27T12:19:40Z

I must have jumped in a parallel universe at some point, the tests used to have failed...

@jreback all green now

jorisvandenbossche · 2016-01-27T12:42:35Z

sorry, no parallel universe, I just restarted the test .. :-)
Maybe time to remove those data tests

jreback · 2016-01-27T13:21:18Z

doc/source/whatsnew/v0.18.0.txt

+     1    2
+     dtype: float64
+
+     In [4]: pd.DataFrame([0,1]).rank(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True, pct=False)


wrap this so it lines up nicely ([3]) as well

jreback · 2016-01-27T13:24:42Z

minor comments. ping when pushed and green.

nbonnotte · 2016-01-27T16:37:11Z

All green

jreback · 2016-01-28T21:41:45Z

thanks!

jreback changed the title ~~CLN Moving Series.rank and DataFrame.rank to generic.py~~ CLN: Moving Series.rank and DataFrame.rank to generic.py Dec 29, 2015

jreback added API Design Clean labels Dec 29, 2015

jreback reviewed Dec 29, 2015
View reviewed changes

jreback reviewed Jan 11, 2016
View reviewed changes

jreback added this to the 0.18.0 milestone Jan 14, 2016

jreback reviewed Jan 14, 2016
View reviewed changes

jreback reviewed Jan 16, 2016
View reviewed changes

jorisvandenbossche reviewed Jan 26, 2016
View reviewed changes

jreback reviewed Jan 27, 2016
View reviewed changes

CLN: Moving Series.rank and DataFrame.rank to generic.py

46f4a1b

jreback closed this in 41abbe5 Jan 28, 2016

nbonnotte deleted the move-rank-to-generic branch January 28, 2016 21:46

nbonnotte mentioned this pull request Jan 29, 2016

BUG in DataFrameGroupBy.rank raising an obscure TypeError #11918

Closed

nbonnotte mentioned this pull request Mar 13, 2016

TypeError: rank() got an unexpected keyword argument 'numeric_only' #11759

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN: Moving Series.rank and DataFrame.rank to generic.py #11924

CLN: Moving Series.rank and DataFrame.rank to generic.py #11924

nbonnotte commented Dec 29, 2015

jreback Dec 29, 2015

nbonnotte Jan 10, 2016

jreback commented Dec 29, 2015

jreback Jan 11, 2016

nbonnotte commented Jan 14, 2016

jreback Jan 14, 2016

nbonnotte commented Jan 16, 2016

jreback Jan 16, 2016

jreback Jan 16, 2016

nbonnotte Jan 16, 2016

jreback Jan 19, 2016

jreback commented Jan 24, 2016

jorisvandenbossche Jan 26, 2016

nbonnotte commented Jan 26, 2016

nbonnotte commented Jan 26, 2016

jreback commented Jan 26, 2016

nbonnotte commented Jan 27, 2016

nbonnotte commented Jan 27, 2016

jorisvandenbossche commented Jan 27, 2016

jreback Jan 27, 2016

jreback commented Jan 27, 2016

nbonnotte commented Jan 27, 2016

jreback commented Jan 28, 2016

		^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

		The new signature is ``.rank(method='average', axis=0, numeric_only=None, na_option='keep', ascending=True, pct=False)``.

CLN: Moving Series.rank and DataFrame.rank to generic.py #11924

CLN: Moving Series.rank and DataFrame.rank to generic.py #11924

Conversation

nbonnotte commented Dec 29, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 29, 2015

Choose a reason for hiding this comment

nbonnotte commented Jan 14, 2016

Choose a reason for hiding this comment

nbonnotte commented Jan 16, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jan 24, 2016

Choose a reason for hiding this comment

nbonnotte commented Jan 26, 2016

nbonnotte commented Jan 26, 2016

jreback commented Jan 26, 2016

nbonnotte commented Jan 27, 2016

nbonnotte commented Jan 27, 2016

jorisvandenbossche commented Jan 27, 2016

Choose a reason for hiding this comment

jreback commented Jan 27, 2016

nbonnotte commented Jan 27, 2016

jreback commented Jan 28, 2016