Skip to content

BUG: DataFrame.mode index dtype is not type stable #33321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
TomAugspurger opened this issue Apr 6, 2020 · 2 comments · Fixed by #38732
Closed
3 tasks done

BUG: DataFrame.mode index dtype is not type stable #33321

TomAugspurger opened this issue Apr 6, 2020 · 2 comments · Fixed by #38732
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@TomAugspurger
Copy link
Contributor

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

The .index.dtype is not stable for DataFrame.mode. It depends on whether the DataFrame is empty and possible the dtypes

In [60]: pd.DataFrame([], columns=['a', 'b']).mode().index.dtype
Out[60]: dtype('O')

Problem description

The index dtype should always be Int64, to match the non-empty case

Expected Output

In [61]: pd.DataFrame({"A": ['a']}).mode().index.dtype
Out[61]: dtype('int64')
@TomAugspurger TomAugspurger added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2020
@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Apr 6, 2020
@TomAugspurger TomAugspurger added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Dtype Conversions Unexpected or buggy dtype conversions and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 6, 2020
@TomAugspurger TomAugspurger mentioned this issue Apr 6, 2020
2 tasks
@gregorylivschitz
Copy link

@TomAugspurger

This is happening because the empty dataframe returns an empty index and therefore the dtype is dtype('o'):

pd.DataFrame([], columns=['a', 'b']).index.dtype
dtype('O')
pd.DataFrame({"A": ['a']}).index.dtype
dtype('int64')

So there are 2 solutions I can think of.

  1. We change mode when it's an empty dataframe to return a dataframe where the mode for that is actually NaN.
    So go from:
pd.DataFrame([], columns=['a', 'b']).mode()
Empty DataFrame
Columns: [a, b]
Index: []

To something like:

pd.DataFrame([], columns=['a', 'b']).mode()
   a    b      
0  NaN  NaN
  1. We change what empty dataframe returns when index is called.

Do you like any of those solutions?

@TomAugspurger
Copy link
Contributor Author

We can construct an empty index with the right dtype (int64)

In [5]: pd.DataFrame(columns=['a', 'b'], index=pd.Index([], dtype='int64')).index.dtype
Out[5]: dtype('int64')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants