Skip to content

ERR: Consistent errors for non-numeric ranking. (#19560) #20670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

mapehe
Copy link
Contributor

@mapehe mapehe commented Apr 12, 2018

*This is only partial solution to issue #19560.
*There were some errors with the tests, but I guess they are unrelated to these changes since I also
have them with master.
*I modified some tests so that they don't contradict with the update "don't allow objects to be ranked unless they are ordered categorials" that was suggested in #19560.

@jreback
Copy link
Contributor

jreback commented Apr 14, 2018

it looks like another commit is included here. can you rebase on master.

@jreback jreback added Groupby Error Reporting Incorrect or improved errors from pandas labels Apr 14, 2018
@codecov
Copy link

codecov bot commented Apr 14, 2018

Codecov Report

Merging #20670 into master will decrease coverage by 0.03%.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20670      +/-   ##
==========================================
- Coverage   91.84%   91.81%   -0.04%     
==========================================
  Files         153      153              
  Lines       49275    49277       +2     
==========================================
- Hits        45255    45242      -13     
- Misses       4020     4035      +15
Flag Coverage Δ
#multiple 90.2% <71.42%> (-0.04%) ⬇️
#single 41.89% <0%> (-0.02%) ⬇️
Impacted Files Coverage Δ
pandas/core/algorithms.py 94.1% <71.42%> (-0.26%) ⬇️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️
pandas/util/_test_decorators.py 92% <0%> (-0.5%) ⬇️
pandas/core/generic.py 95.89% <0%> (-0.05%) ⬇️
pandas/core/base.py 96.79% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d104ecd...d330a46. Read the comment docs.

@mapehe
Copy link
Contributor Author

mapehe commented Apr 14, 2018

Rebased and added a whatsnew entry @jreback.

@@ -418,6 +418,8 @@ Other Enhancements
Backwards incompatible API changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Using :func:`DataFrame.rank` on a data frame with non-numeric entries other than ordered categoricals will raise a ValueError.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the list of issues. use double-backticks around ValueError.

'data frame' -> DataFrame

raise ValueError("pandas.core.algorithms.rank "
"not supported for unordered "
"non-numeric data")
if is_categorical_dtype(values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be simpler, maybe

if is_object_dtype(values) and not (is_categorical_dtype(values) and values.ordered):
    raise ...

in the error message
just say ".rank().format(type(value).__name__)"

@@ -71,23 +71,22 @@ def test_rank2(self):
result = df.rank(0, pct=True)
tm.assert_frame_equal(result, expected)

df = DataFrame([['b', 'c', 'a'], ['a', 'c', 'b']])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make this a separate test and move all of the cases. Ideally you could parameterize them

@@ -218,7 +217,7 @@ def test_rank_methods_frame(self):
tm.assert_frame_equal(result, expected)

def test_rank_descending(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test with assert that the object ones raise (unless it duplicates too much the above tests)

results = self.results

for method, axis, dtype in product(results, [0, 1], dtypes):
if (dtype, method) in disabled:
if dtype == object:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use is_object_dtype

@@ -134,22 +134,27 @@ def test_rank_categorical(self):
assert_series_equal(ordered.rank(), exp)
assert_series_equal(ordered.rank(ascending=False), exp_desc)

# Unordered categoricals should be ranked as objects
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above

@jreback
Copy link
Contributor

jreback commented Apr 14, 2018

this can fully close the issue. e.g. the issue is about raising an error, not actually supporting this.

"not supported for unordered "
"non-numeric data")

# Ranking unordered categorials depreciated per #19560
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to say "not supported" instead of "deprecated"

assert_series_equal(res1, exp_unordered1)

# Won't raise ValueError because entries not objects.
unordered1.rank()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? Certainly the example passes the "eye test" but if the Categorical is not ordered what semantics are we using for ranking?

@@ -379,9 +375,15 @@ def test_rank_int(self):
def test_rank_object_bug(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name of this test can be changed to test_rank_na_object_raises

@jreback
Copy link
Contributor

jreback commented Sep 25, 2018

can you rebase

@jreback
Copy link
Contributor

jreback commented Nov 23, 2018

closing as stale. if you'd like to continue, pls ping.

@jreback jreback closed this Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Raise ValueError When Attempting to Rank Object Dtypes
3 participants