Integer dtype is promoted to int64 #2759

pprett · 2013-01-27T10:17:34Z

It seems that DataFrame (both constructor and astype) promotes the dtypes np.int[8,16,32] to np.int64 ::

>>> pd.DataFrame(data=np.ones((10, 2), dtype=np.int32), dtype=np.int32).get_dtype_counts()
int64    2

I'm not sure whether this is a but or on purpose (if so, a note in the docstring would be great).
Is there a specific reason why this is the case (e.g. to accomodate NA values?). If there are no NA's in the frame, can the promotion be turned off?

The text was updated successfully, but these errors were encountered:

jreback · 2013-01-27T12:32:42Z

this is a known issue (see #622)
and will be fixed in next release (see #2708)

pprett · 2013-01-27T12:34:48Z

thanks - I checked for related issues but I didn't see that one - I'll close this one

jreback · 2013-01-27T13:18:10Z

can u provide a small example of how u plan to use the different dtype frames?
for testing and possibly to make available as an example

pprett · 2013-01-27T18:13:01Z

I have a dataset comprising two blocks: block A contains numerical variables (float64), block B contains indicator variables (aka dummy features aka one-hot encoding - preferably with dtype int8). I wanted to concatenate both blocks using pd.concat - however, this promotes block B to float64 which is quite memory intensive.

Here is an example::

>>> A = pd.DataFrame(data=np.ones((10, 2)), columns=['foo', 'bar'], dtype=np.float64)
>>> B = pd.DataFrame(data=np.ones((10, 2)), dtype=np.float32)
>>> pd.concat((A, B), axis=1).get_dtype_counts()
float64    4

If I insert the columns of B into A it preserves the dtype::

>>> A[B.columns] = B
>>> A.get_dtype_counts()
float32    2
float64    2

wesm · 2013-01-27T18:15:44Z

reopening so this can be converted to a test case for 0.11

jreback · 2013-01-27T21:22:02Z

With the PR #2708, this works (I'll add as a test case)

In [24]: A = pd.DataFrame(data=np.ones((10, 2)), columns=['foo', 'bar'], dtype=np.float64)

In [26]: B = pd.DataFrame(data=np.ones((10, 2)), dtype=np.float32)

In [27]: pd.concat((A, B), axis=1).get_dtype_counts()
Out[27]: 
float32    2
float64    2

jreback · 2013-01-27T21:33:33Z

I suspect you also want to do something like this:

In [3]: df = pd.DataFrame(np.random.rand(5,3),columns=['a','b','c'])

In [4]: df
Out[4]: 
          a         b         c
0  0.366995  0.256089  0.742321
1  0.236588  0.478794  0.173399
2  0.625570  0.997990  0.862075
3  0.699456  0.391814  0.098816
4  0.793263  0.417381  0.488489

In [5]: mask = pd.DataFrame(np.zeros((5,3)),dtype='int8')

In [8]: mask.ix[0,0] = 1; mask.ix[1,1] = 1; mask.ix[3,2] = 1

In [9]: mask
Out[9]: 
   0  1  2
0  1  0  0
1  0  1  0
2  0  0  0
3  0  0  1
4  0  0  0

In [10]: mask.dtypes
Out[10]: 
0    int8
1    int8
2    int8

# this operation actually needs a dtype change because NA is not supported in ints
# its better to change to object that let shift upcast you to float64s
ms = mask.astype('O').shift().fillna(0).astype('int8')

In [17]: ms
Out[17]: 
   0  1  2
0  0  0  0
1  1  0  0
2  0  1  0
3  0  0  0
4  0  0  1

In [18]: ms.dtypes
Out[18]: 
0    int8
1    int8
2    int8

# what you are ultimately after ?
In [19]: df.where(ms)
Out[19]: 
          a        b         c
0       NaN      NaN       NaN
1  0.236588      NaN       NaN
2       NaN  0.99799       NaN
3       NaN      NaN       NaN
4       NaN      NaN  0.488489

pprett · 2013-01-27T21:45:05Z

cool - looking forward to #2708 being merged - thanks guys

jreback · 2013-02-11T02:16:51Z

this is merged into master...ok to close?

wesm · 2013-02-11T02:17:32Z

yep. closed, thanks!

pprett · 2013-02-11T07:44:07Z

thanks guys - Pandas rocks!

2013/2/11 Wes McKinney [email protected]

yep. closed, thanks!

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/2759#issuecomment-13366351..

Peter Prettenhofer

pprett closed this as completed Jan 27, 2013

wesm reopened this Jan 27, 2013

jreback mentioned this issue Jan 27, 2013

ENH: should shift return same dtype objects as input? #2761

Closed

jreback mentioned this issue Feb 7, 2013

BUG: issue in HDFStore with too many selectors in a where #2755

Merged

wesm closed this as completed Feb 11, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integer dtype is promoted to int64 #2759

Integer dtype is promoted to int64 #2759

pprett commented Jan 27, 2013

jreback commented Jan 27, 2013

pprett commented Jan 27, 2013

jreback commented Jan 27, 2013

pprett commented Jan 27, 2013

wesm commented Jan 27, 2013

jreback commented Jan 27, 2013

jreback commented Jan 27, 2013

pprett commented Jan 27, 2013

jreback commented Feb 11, 2013

wesm commented Feb 11, 2013

pprett commented Feb 11, 2013

Integer dtype is promoted to int64 #2759

Integer dtype is promoted to int64 #2759

Comments

pprett commented Jan 27, 2013

jreback commented Jan 27, 2013

pprett commented Jan 27, 2013

jreback commented Jan 27, 2013

pprett commented Jan 27, 2013

wesm commented Jan 27, 2013

jreback commented Jan 27, 2013

jreback commented Jan 27, 2013

pprett commented Jan 27, 2013

jreback commented Feb 11, 2013

wesm commented Feb 11, 2013

pprett commented Feb 11, 2013