ERR: Consistent errors for non-numeric ranking. (#19560) #20670

mapehe · 2018-04-12T17:21:59Z

closes Raise ValueError When Attempting to Rank Object Dtypes #19560
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

*This is only partial solution to issue #19560.
*There were some errors with the tests, but I guess they are unrelated to these changes since I also
have them with master.
*I modified some tests so that they don't contradict with the update "don't allow objects to be ranked unless they are ordered categorials" that was suggested in #19560.

jreback · 2018-04-14T12:55:19Z

it looks like another commit is included here. can you rebase on master.

codecov · 2018-04-14T14:05:26Z

Codecov Report

Merging #20670 into master will decrease coverage by 0.03%.
The diff coverage is 71.42%.

@@            Coverage Diff             @@
##           master   #20670      +/-   ##
==========================================
- Coverage   91.84%   91.81%   -0.04%     
==========================================
  Files         153      153              
  Lines       49275    49277       +2     
==========================================
- Hits        45255    45242      -13     
- Misses       4020     4035      +15

Flag	Coverage Δ
#multiple	`90.2% <71.42%> (-0.04%)`	⬇️
#single	`41.89% <0%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/algorithms.py	`94.1% <71.42%> (-0.26%)`	⬇️
pandas/plotting/_converter.py	`65.07% <0%> (-1.74%)`	⬇️
pandas/util/_test_decorators.py	`92% <0%> (-0.5%)`	⬇️
pandas/core/generic.py	`95.89% <0%> (-0.05%)`	⬇️
pandas/core/base.py	`96.79% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d104ecd...d330a46. Read the comment docs.

mapehe · 2018-04-14T14:15:40Z

Rebased and added a whatsnew entry @jreback.

jreback · 2018-04-14T14:18:08Z

doc/source/whatsnew/v0.23.0.txt

@@ -418,6 +418,8 @@ Other Enhancements
 Backwards incompatible API changes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

+- Using :func:`DataFrame.rank` on a data frame with non-numeric entries other than ordered categoricals will raise a ValueError.


move to the list of issues. use double-backticks around ValueError.

'data frame' -> DataFrame

jreback · 2018-04-14T14:20:13Z

pandas/core/algorithms.py

+            raise ValueError("pandas.core.algorithms.rank "
+                             "not supported for unordered "
+                             "non-numeric data")
+        if is_categorical_dtype(values):


could be simpler, maybe

if is_object_dtype(values) and not (is_categorical_dtype(values) and values.ordered): raise ...

in the error message
just say ".rank().format(type(value).__name__)"

jreback · 2018-04-14T14:21:23Z

pandas/tests/frame/test_rank.py

@@ -71,23 +71,22 @@ def test_rank2(self):
        result = df.rank(0, pct=True)
        tm.assert_frame_equal(result, expected)

-        df = DataFrame([['b', 'c', 'a'], ['a', 'c', 'b']])


can you make this a separate test and move all of the cases. Ideally you could parameterize them

jreback · 2018-04-14T14:22:14Z

pandas/tests/frame/test_rank.py

@@ -218,7 +217,7 @@ def test_rank_methods_frame(self):
                    tm.assert_frame_equal(result, expected)

    def test_rank_descending(self):


can you add a test with assert that the object ones raise (unless it duplicates too much the above tests)

jreback · 2018-04-14T14:22:30Z

pandas/tests/frame/test_rank.py

        results = self.results

        for method, axis, dtype in product(results, [0, 1], dtypes):
-            if (dtype, method) in disabled:
+            if dtype == object:


use is_object_dtype

jreback · 2018-04-14T14:22:36Z

pandas/tests/series/test_rank.py

@@ -134,22 +134,27 @@ def test_rank_categorical(self):
        assert_series_equal(ordered.rank(), exp)
        assert_series_equal(ordered.rank(ascending=False), exp_desc)

-        # Unordered categoricals should be ranked as objects


same as above

jreback · 2018-04-14T14:24:01Z

this can fully close the issue. e.g. the issue is about raising an error, not actually supporting this.

WillAyd · 2018-04-16T21:20:46Z

pandas/tests/series/test_rank.py

+                     "not supported for unordered "
+                     "non-numeric data")
+
+        # Ranking unordered categorials depreciated per #19560


I'd prefer to say "not supported" instead of "deprecated"

WillAyd · 2018-04-16T21:25:58Z

pandas/tests/series/test_rank.py

-        assert_series_equal(res1, exp_unordered1)
+
+        # Won't raise ValueError because entries not objects.
+        unordered1.rank()


Is this right? Certainly the example passes the "eye test" but if the Categorical is not ordered what semantics are we using for ranking?

WillAyd · 2018-04-16T21:26:33Z

pandas/tests/series/test_rank.py

@@ -379,9 +375,15 @@ def test_rank_int(self):
    def test_rank_object_bug(self):


Name of this test can be changed to test_rank_na_object_raises

jreback · 2018-09-25T16:24:09Z

can you rebase

jreback · 2018-11-23T03:28:18Z

closing as stale. if you'd like to continue, pls ping.

jreback added Groupby Error Reporting Incorrect or improved errors from pandas labels Apr 14, 2018

mapehe force-pushed the submit3 branch from 0990a89 to a0f38d7 Compare April 14, 2018 14:05

ERR: Consistent errors for non-numeric ranking. (pandas-dev#19560)

d330a46

mapehe force-pushed the submit3 branch from a0f38d7 to d330a46 Compare April 14, 2018 14:14

jreback requested changes Apr 14, 2018

View reviewed changes

WillAyd requested changes Apr 16, 2018

View reviewed changes

jreback closed this Nov 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ERR: Consistent errors for non-numeric ranking. (#19560) #20670

ERR: Consistent errors for non-numeric ranking. (#19560) #20670

mapehe commented Apr 12, 2018 •

edited by jreback

Loading

jreback commented Apr 14, 2018

codecov bot commented Apr 14, 2018 •

edited

Loading

mapehe commented Apr 14, 2018

jreback Apr 14, 2018

jreback Apr 14, 2018

jreback Apr 14, 2018

jreback Apr 14, 2018

jreback Apr 14, 2018

jreback Apr 14, 2018

jreback commented Apr 14, 2018

WillAyd Apr 16, 2018

WillAyd Apr 16, 2018

WillAyd Apr 16, 2018

jreback commented Sep 25, 2018

jreback commented Nov 23, 2018

		@@ -218,7 +217,7 @@ def test_rank_methods_frame(self):
		tm.assert_frame_equal(result, expected)

		def test_rank_descending(self):

		@@ -379,9 +375,15 @@ def test_rank_int(self):
		def test_rank_object_bug(self):

ERR: Consistent errors for non-numeric ranking. (#19560) #20670

ERR: Consistent errors for non-numeric ranking. (#19560) #20670

Conversation

mapehe commented Apr 12, 2018 • edited by jreback Loading

jreback commented Apr 14, 2018

codecov bot commented Apr 14, 2018 • edited Loading

Codecov Report

mapehe commented Apr 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 14, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 25, 2018

jreback commented Nov 23, 2018

mapehe commented Apr 12, 2018 •

edited by jreback

Loading

codecov bot commented Apr 14, 2018 •

edited

Loading