You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
You can do this by specifying the downcast keyword. This is NOT automatic as a general operation this can be expensive.
In [10]: df.set_index(['x','y']).unstack().fillna(-1,downcast='infer')
Out[10]:
z
y j k
x
a 0 1
b 2 -1
In [11]: df.set_index(['x','y']).unstack().fillna(-1,downcast='infer').dtypes
Out[11]:
y
z j int64
k int64
dtype: object
There may be some merit to this being allowed directly, even if the functionality can be accomplished with a series of operations. For instance, when trying to limit memory usage on a big dataset, perhaps it would be preferable to keep the data as np.int8.
In [15]: idx=np.array([0, 0, 1], dtype=np.int32)
In [16]: idx2=np.array([0, 1, 0], dtype=np.int8)
In [17]: value=np.array([0, 1, 2], dtype=np.int8)
In [18]: df=pd.DataFrame({'idx':idx, 'idx2':idx2, 'value':value})
In [19]: df.dtypesOut[19]:
idxint32idx2int8valueint8dtype: objectIn [20]: df.set_index(['idx', 'idx2']).unstack().dtypesOut[20]:
idx2value0float641float64dtype: object
After the unstack my data table is suddenly much larger than necessary.
Also, from looking at the code this would be fairly trivial to implement, without much impact on existing code.
Currently:
If I want to fill with -1, i need to
fillna
and thenastype
back toint
. Ideally:The text was updated successfully, but these errors were encountered: