-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: pd.unstack(level, values=["a", "b", "c"]) #36916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
you can just filter before the unstack |
@jreback Thanks for your answer. However, filtering does not help if there is no value for As written above, as an extreme example you can test unstacking an empty dataframe. |
well after then i don't really see this as very common |
Actually I end up having this problem quite often when I want to ensure that the dataframe schema keeps being propagated. Also, I cannot think of any solution to this problem beside of changing the "unstack" function. Do you know a workaround @jreback? |
It's not clear to me what would happen if the index level being unstacked has more or less values. Would they both raise? If so, why not just check the column of the result and raise then? |
There should not be any error raised. Instead, more values should just be dropped and less values should be filled with N/A. When specifying the values which should be unstacked, I want to ensure that the resulting dataframe always has the same schema, independent of the input. Compare this to e.g. Spark. There you always know beforehand what the resulting schema will be, but therefore you have to explicitly specify the values which should be converted to columns. |
@Hoeze: Here is an attempt, I'd be interested in hearing if there are edge-cases this misses. It works on the empty frame.
e.g.
produces
|
u can just reindex |
Ahh, thanks @jreback - comment updated. |
Thanks a lot for solving this @rhshadrach, looks great 😀 |
I can do you one better @Hoeze - that function can be executed on the current version of pandas ;) In all seriousness though, this is three separate operations: select, unstack, and reindex. I do no think we should add a new function to the API just because there is a use case where they are used together. |
Yes, but having this as an option inside |
this is not a good default as it's a bit unexpected |
Is your feature request related to a problem?
I would like to unstack only certain values from a DataFrame to know exactly which columns will be in the resulting dataframe.
For example, when the dataframe is empty,
df.unstack(0)
will just return an empty dataframe without columns:#21255
Advantages:
Describe the solution you'd like
I'd propose a values option for
pd.DataFrame.unstack()
that lets one specify which values will be unstacked into columns.E.g.
df.unstack(0, values=["A", "B", "C"])
would ensure that the resulting dataframe has exactly the toplevel columnsA
,B
andC
.API breaking implications
N/A
Describe alternatives you've considered
I checked out
pd.DataFrame.pivot()
, however, it is more complicated to use and cannot use index levels for unstacking.The text was updated successfully, but these errors were encountered: