Skip to content

CLN: Moving Series.rank and DataFrame.rank to generic.py #11924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

CLN: Moving Series.rank and DataFrame.rank to generic.py #11924

wants to merge 1 commit into from

Conversation

nbonnotte
Copy link
Contributor

I'm pulling this out from PR #11918. It is a first step toward solving issue #11759

The signature of Series.rank changes, and is now the same as DataFrame.rank. This may cause some problems...

@jreback jreback changed the title CLN Moving Series.rank and DataFrame.rank to generic.py CLN: Moving Series.rank and DataFrame.rank to generic.py Dec 29, 2015
@@ -1842,35 +1842,6 @@ def argsort(self, axis=0, kind='quicksort', order=None):
np.argsort(values, kind=kind), index=self.index,
dtype='int64').__finalize__(self)

def rank(self, method='average', na_option='keep', ascending=True,
pct=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should have the signature:

def rank(self, method='average', axis=0, numeric_only=None, na_option='keep', ascending=True, pct=False)

this will be more consistent with other methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified the signature

@jreback
Copy link
Contributor

jreback commented Dec 29, 2015

need a whatsnew entry to explain this change (can be pretty short, just demonstrating the old/new)

@@ -247,6 +247,8 @@ Subtraction by ``Timedelta`` in a ``Series`` by a ``Timestamp`` works (:issue:`1
ser
pd.Timestamp('2012-01-01') - ser

- ``Series.rank()``, ``DataFrame.rank()`` and their groupby equivalents have a new, common signature ``rank(self, method='average', axis=0, numeric_only=None, na_option='keep', ascending=True, pct=False)``. (:issue:`11759`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add this in a sub-section and show usage (of the changes, mainly to DataFrame).

@nbonnotte
Copy link
Contributor Author

Is it ok like that?

@jreback jreback added this to the 0.18.0 milestone Jan 14, 2016

# this method does not make sense when ndim > 2
if self.ndim > 2:
raise NotImplementedError
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give a nice message

@nbonnotte
Copy link
Contributor Author

I've changed the error message, and removed the check for Series.

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The **new** signature is ``.rank(method='average', axis=0, numeric_only=None, na_option='keep', ascending=True, pct=False)``.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche what do you think?

might be more useful to have df.rank(1) allowed (and use the original DataFrame signature, as probably more useful then specifying themethod``

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this is more consistent with things like df.sum(1)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about Series.rank then? If axis comes first, then it means the first parameter is the least useful...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is what I did for sorting, so precedent for what you have here as well.

In [1]: Series.sort_values?
Type:        instancemethod
String form: <unbound method Series.sort_values>
File:        c:\miniconda\envs\neat\lib\site-packages\pandas\core\series.py
Definition:  Series.sort_values(self, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last')
Docstring:
Sort by the values along either axis

.. versionadded:: 0.17.0

Parameters
----------
by : string name or list of names which refer to the axis items
axis : index to direct sorting
ascending : bool or list of bool
     Sort ascending vs. descending. Specify list for multiple sort orders.
     If this is a list of bools, must match the length of the by
inplace : bool
     if True, perform operation in-place
kind : {`quicksort`, `mergesort`, `heapsort`}
     Choice of sorting algorithm. See also ndarray.np.sort for more information.
     `mergesort` is the only stable algorithm. For DataFrames, this option is
     only applied when sorting on a single column or label.
na_position : {'first', 'last'}
     `first` puts NaNs at the beginning, `last` puts NaNs at the end

Returns
-------
sorted_obj : Series

@jreback
Copy link
Contributor

jreback commented Jan 24, 2016

can you rebase/update


Parameters
----------
axis: {index (0), columns (1)), default 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use the same type description as before in the dataframe docstring? ({0 or 'index', 1 or 'columns'}, default 0) -> quotes to indicate it is a string argument

@nbonnotte
Copy link
Contributor Author

@jorisvandenbossche i've changed the docstring as you suggested, and added a test to check if .rank(axis=0) and .rank(axis='index') on the one hand, and .rank(axis=1) and .rank(axis='columns') on the other hand, are equivalent.

@nbonnotte
Copy link
Contributor Author

Uh?

ERROR: test_get_underlying_price (pandas.io.tests.test_data.TestYahooOptions)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/util/testing.py", line 1831, in wrapper
    return t(*args, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/io/tests/test_data.py", line 402, in test_get_underlying_price
    quote_price = options_object._underlying_price_from_root(root)
  File "/home/travis/build/pydata/pandas/pandas/io/data.py", line 739, in _underlying_price_from_root
    underlying_price = root.xpath('.//*[@class="time_rtq_ticker Fz-30 Fw-b"]')[0]\
IndexError: list index out of range

I don't understand.

The same error appeared in the pull request #11582. Is it me, or is this another issue?

@jreback
Copy link
Contributor

jreback commented Jan 26, 2016

neh, just back network connectivity with Travis, ignore / try again

@nbonnotte
Copy link
Contributor Author

It's my lucky day

======================================================================
ERROR: test_get_goog_volume (pandas.io.tests.test_data.TestGoogle)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/indexes/base.py", line 1900, in get_value
    return tslib.get_value_box(s, key)
  File "pandas/tslib.pyx", line 807, in pandas.tslib.get_value_box (pandas/tslib.c:17638)
    cpdef object get_value_box(ndarray arr, object loc):
  File "pandas/tslib.pyx", line 816, in pandas.tslib.get_value_box (pandas/tslib.c:17306)
    i = <Py_ssize_t> loc
TypeError: 'str' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tseries/index.py", line 1344, in get_value
    return _maybe_box(self, Index.get_value(self, series, key),
  File "/home/travis/build/pydata/pandas/pandas/indexes/base.py", line 1908, in get_value
    raise e1
  File "/home/travis/build/pydata/pandas/pandas/indexes/base.py", line 1894, in get_value
    return self._engine.get_value(s, k)
  File "pandas/index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas/index.c:3204)
    cpdef get_value(self, ndarray arr, object key):
  File "pandas/index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas/index.c:2903)
    loc = self.get_loc(key)
  File "pandas/index.pyx", line 575, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:10977)
    self._date_check_type(val)
  File "pandas/index.pyx", line 581, in pandas.index.DatetimeEngine._date_check_type (pandas/index.c:11145)
    raise KeyError(val)
KeyError: 'JAN-02-2015'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "pandas/index.pyx", line 565, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:10766)
    return self.mapping.get_item(val.value)
  File "pandas/hashtable.pyx", line 303, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6527)
    cpdef get_item(self, int64_t val):
  File "pandas/hashtable.pyx", line 309, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:6465)
    raise KeyError(val)
KeyError: 1420156800000000000

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/tseries/index.py", line 1354, in get_value
    return self.get_value_maybe_box(series, key)
  File "/home/travis/build/pydata/pandas/pandas/tseries/index.py", line 1364, in get_value_maybe_box
    values = self._engine.get_value(_values_from_object(series), key)
  File "pandas/index.pyx", line 103, in pandas.index.IndexEngine.get_value (pandas/index.c:3204)
    cpdef get_value(self, ndarray arr, object key):
  File "pandas/index.pyx", line 111, in pandas.index.IndexEngine.get_value (pandas/index.c:2903)
    loc = self.get_loc(key)
  File "pandas/index.pyx", line 567, in pandas.index.DatetimeEngine.get_loc (pandas/index.c:10832)
    raise KeyError(val)
KeyError: Timestamp('2015-01-02 00:00:00')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/travis/build/pydata/pandas/pandas/util/testing.py", line 1831, in wrapper
    return t(*args, **kwargs)
  File "/home/travis/build/pydata/pandas/pandas/io/tests/test_data.py", line 93, in test_get_goog_volume
    self.assertEqual(df.Volume.ix['JAN-02-2015'], 1446662)
  File "/home/travis/build/pydata/pandas/pandas/core/indexing.py", line 77, in __getitem__
    return self._getitem_axis(key, axis=0)
  File "/home/travis/build/pydata/pandas/pandas/core/indexing.py", line 993, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/home/travis/build/pydata/pandas/pandas/core/indexing.py", line 88, in _get_label
    return self.obj[label]
  File "/home/travis/build/pydata/pandas/pandas/core/series.py", line 558, in __getitem__
    result = self.index.get_value(self, key)
  File "/home/travis/build/pydata/pandas/pandas/tseries/index.py", line 1356, in get_value
    raise KeyError(key)
KeyError: 'JAN-02-2015'
----------------------------------------------------------------------
Ran 9254 tests in 337.827s
FAILED (SKIP=152, errors=1)

@nbonnotte
Copy link
Contributor Author

I must have jumped in a parallel universe at some point, the tests used to have failed...

@jreback all green now

@jorisvandenbossche
Copy link
Member

sorry, no parallel universe, I just restarted the test .. :-)
Maybe time to remove those data tests

1 2
dtype: float64

In [4]: pd.DataFrame([0,1]).rank(axis=0, numeric_only=None, method='average', na_option='keep', ascending=True, pct=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrap this so it lines up nicely ([3]) as well

@jreback
Copy link
Contributor

jreback commented Jan 27, 2016

minor comments. ping when pushed and green.

@nbonnotte
Copy link
Contributor Author

All green

@jreback jreback closed this in 41abbe5 Jan 28, 2016
@jreback
Copy link
Contributor

jreback commented Jan 28, 2016

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants