Skip to content

Integer dtype is promoted to int64 #2759

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pprett opened this issue Jan 27, 2013 · 11 comments
Closed

Integer dtype is promoted to int64 #2759

pprett opened this issue Jan 27, 2013 · 11 comments

Comments

@pprett
Copy link
Contributor

pprett commented Jan 27, 2013

It seems that DataFrame (both constructor and astype) promotes the dtypes np.int[8,16,32] to np.int64 ::

>>> pd.DataFrame(data=np.ones((10, 2), dtype=np.int32), dtype=np.int32).get_dtype_counts()
int64    2

I'm not sure whether this is a but or on purpose (if so, a note in the docstring would be great).
Is there a specific reason why this is the case (e.g. to accomodate NA values?). If there are no NA's in the frame, can the promotion be turned off?

@jreback
Copy link
Contributor

jreback commented Jan 27, 2013

this is a known issue (see #622)
and will be fixed in next release (see #2708)

@pprett
Copy link
Contributor Author

pprett commented Jan 27, 2013

thanks - I checked for related issues but I didn't see that one - I'll close this one

@pprett pprett closed this as completed Jan 27, 2013
@jreback
Copy link
Contributor

jreback commented Jan 27, 2013

can u provide a small example of how u plan to use the different dtype frames?
for testing and possibly to make available as an example

@pprett
Copy link
Contributor Author

pprett commented Jan 27, 2013

I have a dataset comprising two blocks: block A contains numerical variables (float64), block B contains indicator variables (aka dummy features aka one-hot encoding - preferably with dtype int8). I wanted to concatenate both blocks using pd.concat - however, this promotes block B to float64 which is quite memory intensive.

Here is an example::

>>> A = pd.DataFrame(data=np.ones((10, 2)), columns=['foo', 'bar'], dtype=np.float64)
>>> B = pd.DataFrame(data=np.ones((10, 2)), dtype=np.float32)
>>> pd.concat((A, B), axis=1).get_dtype_counts()
float64    4

If I insert the columns of B into A it preserves the dtype::

>>> A[B.columns] = B
>>> A.get_dtype_counts()
float32    2
float64    2

@wesm
Copy link
Member

wesm commented Jan 27, 2013

reopening so this can be converted to a test case for 0.11

@wesm wesm reopened this Jan 27, 2013
@jreback
Copy link
Contributor

jreback commented Jan 27, 2013

With the PR #2708, this works (I'll add as a test case)

In [24]: A = pd.DataFrame(data=np.ones((10, 2)), columns=['foo', 'bar'], dtype=np.float64)

In [26]: B = pd.DataFrame(data=np.ones((10, 2)), dtype=np.float32)

In [27]: pd.concat((A, B), axis=1).get_dtype_counts()
Out[27]: 
float32    2
float64    2

@jreback
Copy link
Contributor

jreback commented Jan 27, 2013

I suspect you also want to do something like this:

In [3]: df = pd.DataFrame(np.random.rand(5,3),columns=['a','b','c'])

In [4]: df
Out[4]: 
          a         b         c
0  0.366995  0.256089  0.742321
1  0.236588  0.478794  0.173399
2  0.625570  0.997990  0.862075
3  0.699456  0.391814  0.098816
4  0.793263  0.417381  0.488489

In [5]: mask = pd.DataFrame(np.zeros((5,3)),dtype='int8')

In [8]: mask.ix[0,0] = 1; mask.ix[1,1] = 1; mask.ix[3,2] = 1

In [9]: mask
Out[9]: 
   0  1  2
0  1  0  0
1  0  1  0
2  0  0  0
3  0  0  1
4  0  0  0

In [10]: mask.dtypes
Out[10]: 
0    int8
1    int8
2    int8

# this operation actually needs a dtype change because NA is not supported in ints
# its better to change to object that let shift upcast you to float64s
ms = mask.astype('O').shift().fillna(0).astype('int8')

In [17]: ms
Out[17]: 
   0  1  2
0  0  0  0
1  1  0  0
2  0  1  0
3  0  0  0
4  0  0  1

In [18]: ms.dtypes
Out[18]: 
0    int8
1    int8
2    int8

# what you are ultimately after ?
In [19]: df.where(ms)
Out[19]: 
          a        b         c
0       NaN      NaN       NaN
1  0.236588      NaN       NaN
2       NaN  0.99799       NaN
3       NaN      NaN       NaN
4       NaN      NaN  0.488489

@pprett
Copy link
Contributor Author

pprett commented Jan 27, 2013

cool - looking forward to #2708 being merged - thanks guys

@jreback
Copy link
Contributor

jreback commented Feb 11, 2013

this is merged into master...ok to close?

@wesm
Copy link
Member

wesm commented Feb 11, 2013

yep. closed, thanks!

@wesm wesm closed this as completed Feb 11, 2013
@pprett
Copy link
Contributor Author

pprett commented Feb 11, 2013

thanks guys - Pandas rocks!

2013/2/11 Wes McKinney [email protected]

yep. closed, thanks!


Reply to this email directly or view it on GitHubhttps://github.com//issues/2759#issuecomment-13366351..

Peter Prettenhofer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants