Skip to content

ENH: pd.unstack(level, values=["a", "b", "c"]) #36916

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Hoeze opened this issue Oct 6, 2020 · 13 comments
Closed

ENH: pd.unstack(level, values=["a", "b", "c"]) #36916

Hoeze opened this issue Oct 6, 2020 · 13 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@Hoeze
Copy link

Hoeze commented Oct 6, 2020

Is your feature request related to a problem?

I would like to unstack only certain values from a DataFrame to know exactly which columns will be in the resulting dataframe.
For example, when the dataframe is empty, df.unstack(0) will just return an empty dataframe without columns:
#21255

Advantages:

  • predictable column layout
  • subsetting for certain values

Describe the solution you'd like

I'd propose a values option for pd.DataFrame.unstack() that lets one specify which values will be unstacked into columns.
E.g. df.unstack(0, values=["A", "B", "C"]) would ensure that the resulting dataframe has exactly the toplevel columns A, B and C.

API breaking implications

N/A

Describe alternatives you've considered

I checked out pd.DataFrame.pivot(), however, it is more complicated to use and cannot use index levels for unstacking.

@Hoeze Hoeze added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 6, 2020
@jreback
Copy link
Contributor

jreback commented Oct 6, 2020

you can just filter before the unstack
no new to add yet another api

@Hoeze
Copy link
Author

Hoeze commented Oct 6, 2020

@jreback Thanks for your answer.

However, filtering does not help if there is no value for B.
In this case, B will be missing in the resulting toplevel columns.

As written above, as an extreme example you can test unstacking an empty dataframe.

@jreback
Copy link
Contributor

jreback commented Oct 6, 2020

well after then

i don't really see this as very common

@Hoeze
Copy link
Author

Hoeze commented Oct 6, 2020

Actually I end up having this problem quite often when I want to ensure that the dataframe schema keeps being propagated. Also, I cannot think of any solution to this problem beside of changing the "unstack" function.

Do you know a workaround @jreback?

@rhshadrach
Copy link
Member

would ensure that the resulting dataframe has exactly the toplevel columns

It's not clear to me what would happen if the index level being unstacked has more or less values. Would they both raise? If so, why not just check the column of the result and raise then?

@Hoeze
Copy link
Author

Hoeze commented Feb 13, 2021

would ensure that the resulting dataframe has exactly the toplevel columns

It's not clear to me what would happen if the index level being unstacked has more or less values. Would they both raise? If so, why not just check the column of the result and raise then?

There should not be any error raised. Instead, more values should just be dropped and less values should be filled with N/A.

When specifying the values which should be unstacked, I want to ensure that the resulting dataframe always has the same schema, independent of the input.

Compare this to e.g. Spark. There you always know beforehand what the resulting schema will be, but therefore you have to explicitly specify the values which should be converted to columns.

@rhshadrach
Copy link
Member

rhshadrach commented Feb 13, 2021

@Hoeze: Here is an attempt, I'd be interested in hearing if there are edge-cases this misses. It works on the empty frame.

def rigid_unstack(df, level, values, fill_value=np.nan):
    return (
        df[df.index.get_level_values(level).isin(values)]
        .unstack(level)
        .reindex(
            columns=pd.MultiIndex.from_product([df.columns, values]),
            fill_value=fill_value,
        )
    )

e.g.

df = pd.DataFrame({'a': [0, 1], 'b': [2, 3], 'c': [4, 5], 'd': [6, 7]}).set_index(['a', 'b'])
print(rigid_unstack(df, 0, [1, 2, 3]))

produces

   c          d        
a  1   2   3  1   2   3
b                      
3  5 NaN NaN  7 NaN NaN

@jreback
Copy link
Contributor

jreback commented Feb 13, 2021

u can just reindex

@rhshadrach
Copy link
Member

rhshadrach commented Feb 13, 2021

Ahh, thanks @jreback - comment updated.

@Hoeze
Copy link
Author

Hoeze commented Mar 23, 2021

Thanks a lot for solving this @rhshadrach, looks great 😀
Can we have that in the next Pandas version?

@rhshadrach
Copy link
Member

I can do you one better @Hoeze - that function can be executed on the current version of pandas ;)

In all seriousness though, this is three separate operations: select, unstack, and reindex. I do no think we should add a new function to the API just because there is a use case where they are used together.

@Hoeze
Copy link
Author

Hoeze commented Mar 25, 2021

Yes, but having this as an option inside unstack would have been nice 🙂
Otherwise feel free to close, I do have a working solution to my problem now 👍

@jreback
Copy link
Contributor

jreback commented Mar 25, 2021

this is not a good default as it's a bit unexpected

@jreback jreback added this to the No action milestone Mar 25, 2021
@jreback jreback closed this as completed Mar 25, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants