-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Unstack with mixed dtypes coerces everything to object #11847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pls show a copy-pastable example |
Ah, looking for an example helped me narrow down the bug. It is specific to passing a list of levels to unstack, even when that list only has a single entry. E.g. compare:
So a workaround in my case with multiple levels is to replace |
so looks like what you want is: #9023 which is almost finished. in fact if you are looking for something to do...could use some updating :) |
i'll mark this as a bug, which may be independent. want to see if you can put in a fix with the existing framework? |
Thanks! I will try but I do not use pandas from master and I've never played with the source so it won't be quick. |
Picking this up to take a look |
closes #11847 Changed the way in which the original data frame is copied (dropped use of .values, since it does not preserve dtypes). Author: Pawel Kordek <[email protected]> Closes #14053 from kordek/#11847 and squashes the following commits: 6a381ce [Pawel Kordek] BUG: GH11847 Unstack with mixed dtypes coerces everything to object (cherry picked from commit d531718)
closes pandas-dev#11847 Changed the way in which the original data frame is copied (dropped use of .values, since it does not preserve dtypes). Author: Pawel Kordek <[email protected]> Closes pandas-dev#14053 from kordek/pandas-dev#11847 and squashes the following commits: 6a381ce [Pawel Kordek] BUG: GH11847 Unstack with mixed dtypes coerces everything to object
Related to #2929, if I unstack a dataframe with mixed dtypes they all get coerced to object and I have to recast to go back which is surprisingly slow (30 seconds for 400k rows and 400 np.float32 columns)
Is there any reason pandas doesn't keep the np.float32 dtype, especially since it supports missing values so even when there are missing index/column positions it shouldn't pose a problem?
The text was updated successfully, but these errors were encountered: