Skip to content

BUG: Series/DataFrame.rank returns empty object on failure #40418

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rhshadrach opened this issue Mar 13, 2021 · 5 comments
Closed

BUG: Series/DataFrame.rank returns empty object on failure #40418

rhshadrach opened this issue Mar 13, 2021 · 5 comments
Labels
API - Consistency Internal Consistency of API/Behavior Bug Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply

Comments

@rhshadrach
Copy link
Member

This was noticed when working on #40288

Example:

df = DataFrame({"A": 3 * [object]})
print(df.rank())
print(df.A.rank())

The last two lines are an empty DataFrame and empty Series respectively.

The docstring for numeric_only says:

For DataFrame objects, rank only numeric columns if set to True.

The current behavior with numeric_only=None (the default value) is:

Try with all columns. If a TypeError is raised, try with numeric_only=True.

When numeric_only is True and the Series/DataFrame contain no numeric columns, rank then operates on an empty object returning an empty result.

This is causing issues when rank is used in transform lists and dictionaries. Namely, we'd like to have partial-failure for TypeErrors, but rank is returning an empty result instead of raising a TypeError. This could be special-cased, but it seems to me that returning an empty object (at least when numeric_only is not True) is undesirable itself.

Some options I see are:

  • Remove None as being an option and replace the default numeric_only=None with numeric_only=False.
  • If numeric_only is None and the fallback of selecting only numeric columns results in an empty object, raise a TypeError.
  • If selecting only numeric columns results in an empty object (even when numeric_only=True), raise a TypeError.
@rhshadrach rhshadrach added Bug Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply labels Mar 13, 2021
@iamrajhans
Copy link
Contributor

@rhshadrach would like to take this one

@rhshadrach
Copy link
Member Author

Sounds great @iamrajhans. It's not clear to me what the desired behavior here is. I would recommend looking into the behavior of numeric_only for other pandas' methods and make this consistent.

@iamrajhans
Copy link
Contributor

@rhshadrach so I checked numeric_only behaviour, in the case of numerice_only=True it is returning empty Dataframe/Series

>>> import pandas as pd
>>> df = pd.DataFrame({"A": 3 * [object]})
>>> df.sum(numeric_only=True)
Series([], dtype: float64)

but in the case of numeric_only=False, it is giving TypeError: unsupported operand type(s) for +: 'type' and 'type

@rhshadrach
Copy link
Member Author

Thanks @iamrajhans, it seems to me there are two separate APIs for reducer vs transformer:

  • Reducers have a numeric_only argument, will return an empty object when numeric_only=True if there are no numeric columns.
  • Transformers don't have a numeric_only argument, will raise if attempting to operate on non-numeric data if it can't be handled.

The only exception I see to this (but I may be missing others) is rank, which is a transformer whose API fits in the reducer camp.

I don't see any advantage to have two different APIs for reducer vs transformer, it'd be nice if these were more common. One step in this direction that I think would be non-controversial would be to add a numeric_only argument to the rest of the transformers.

Another question I have is whether the returning of an empty DataFrame (for both transformers and reducers) is desired behavior. I personally would expect/like a TypeError to be raised instead, but I wonder if there are other thoughts or aspects of pandas that depend on returning an empty object.

@rhshadrach rhshadrach added API - Consistency Internal Consistency of API/Behavior and removed Bug labels Mar 26, 2021
@mroeschke mroeschke added the Bug label Aug 19, 2021
@rhshadrach
Copy link
Member Author

No longer relevant now that numeric_only=None has been removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior Bug Nuisance Columns Identifying/Dropping nuisance columns in reductions, groupby.add, DataFrame.apply
Projects
None yet
Development

No branches or pull requests

3 participants