Skip to content

BUG: dtype compat with get_dummies #8725

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jreback opened this issue Nov 3, 2014 · 2 comments
Closed

BUG: dtype compat with get_dummies #8725

jreback opened this issue Nov 3, 2014 · 2 comments
Labels
Bug Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@jreback
Copy link
Contributor

jreback commented Nov 3, 2014

from SO

I think should not coerce to floats but be an int (or a categorical)

In [14]: df = pd.DataFrame({'hour': [0, 1, 3, 8, 13, 14], 'val': np.random.randn(6)})

In [15]: df['hour_cat'] = pd.Categorical(df['hour'], categories=range(24))

In [16]: df
Out[16]: 
   hour       val hour_cat
0     0  0.326395        0
1     1  1.179421        1
2     3 -1.043560        3
3     8  0.430482        8
4    13  0.558709       13
5    14 -0.385931       14

In [17]: df.dtypes
Out[17]: 
hour           int64
val          float64
hour_cat    category
dtype: object

In [6]: pd.get_dummies(df['hour_cat'])
Out[6]: 
   0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23
0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
2   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
3   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
4   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0
5   0   0   0   0   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0

In [7]: pd.get_dummies(df['hour_cat']).dtypes
Out[7]: 
0     float64
1     float64
2     float64
3     float64
4     float64
5     float64
6     float64
7     float64
8     float64
9     float64
10    float64
11    float64
12    float64
13    float64
14    float64
15    float64
16    float64
17    float64
18    float64
19    float64
20    float64
21    float64
22    float64
23    float64
dtype: object

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions labels Nov 3, 2014
@jreback jreback added this to the 0.15.2 milestone Nov 3, 2014
@onesandzeroes
Copy link
Contributor

You didn't call get_dummies() on the categorical in your example code, you called it on the original int64 variable, so does this issue apply to get_dummies() in general? Same happens on the categorical:

pd.get_dummies(df['hour_cat']).dtypes
Out[12]: 
0     float64
1     float64
2     float64
3     float64
4     float64
5     float64
6     float64
7     float64
8     float64
9     float64
# etc.
dtype: object

@jreback
Copy link
Contributor Author

jreback commented Nov 3, 2014

sorry..that was a typo....but it has the same problem either way.
have to think about this. I just fixed unstack with a categorical. This is almost the same. It has to reshape then convert at the end.

@jreback jreback added the Categorical Categorical Data Type label Nov 3, 2014
@jreback jreback modified the milestones: 0.16.0, 0.15.2 Dec 3, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
avishaylivne added a commit to avishaylivne/pandas that referenced this issue May 25, 2016
Make get_dummies() return columns of dtype=bool instead of np.float64
BUG: dtype compat with get_dummies pandas-dev#8725
pandas-dev#8725

Since all the point of this method is to return binary features there's no point to waste RAM and represent them using floats
@jreback jreback modified the milestones: 0.19.0, Next Major Release Jul 27, 2016
@jreback jreback modified the milestones: 0.20.0, 0.19.0 Aug 18, 2016
TomAugspurger added a commit to TomAugspurger/pandas that referenced this issue Sep 1, 2016
Closes pandas-dev#8725

Ensures that get_dummies on a DataFrame whose output is a mix of
floats / ints & dummy-encoded columns doesn't coerce the dummy-encoded
cols from uint8 to ints / floats.
@jreback jreback closed this as completed in ccec504 Sep 2, 2016
@jreback jreback modified the milestones: 0.19.0, 0.20.0 Sep 2, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Categorical Categorical Data Type Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
2 participants