ENH: pd.unstack(level, values=["a", "b", "c"]) #36916

Hoeze · 2020-10-06T14:02:32Z

Is your feature request related to a problem?

I would like to unstack only certain values from a DataFrame to know exactly which columns will be in the resulting dataframe.
For example, when the dataframe is empty, df.unstack(0) will just return an empty dataframe without columns:
#21255

Advantages:

predictable column layout
subsetting for certain values

Describe the solution you'd like

I'd propose a values option for pd.DataFrame.unstack() that lets one specify which values will be unstacked into columns.
E.g. df.unstack(0, values=["A", "B", "C"]) would ensure that the resulting dataframe has exactly the toplevel columns A, B and C.

API breaking implications

N/A

Describe alternatives you've considered

I checked out pd.DataFrame.pivot(), however, it is more complicated to use and cannot use index levels for unstacking.

The text was updated successfully, but these errors were encountered:

jreback · 2020-10-06T14:56:16Z

you can just filter before the unstack
no new to add yet another api

Hoeze · 2020-10-06T15:14:11Z

@jreback Thanks for your answer.

However, filtering does not help if there is no value for B.
In this case, B will be missing in the resulting toplevel columns.

As written above, as an extreme example you can test unstacking an empty dataframe.

jreback · 2020-10-06T15:44:41Z

well after then

i don't really see this as very common

Hoeze · 2020-10-06T16:22:31Z

Actually I end up having this problem quite often when I want to ensure that the dataframe schema keeps being propagated. Also, I cannot think of any solution to this problem beside of changing the "unstack" function.

Do you know a workaround @jreback?

rhshadrach · 2021-02-11T02:22:17Z

would ensure that the resulting dataframe has exactly the toplevel columns

It's not clear to me what would happen if the index level being unstacked has more or less values. Would they both raise? If so, why not just check the column of the result and raise then?

Hoeze · 2021-02-13T15:07:03Z

would ensure that the resulting dataframe has exactly the toplevel columns

It's not clear to me what would happen if the index level being unstacked has more or less values. Would they both raise? If so, why not just check the column of the result and raise then?

There should not be any error raised. Instead, more values should just be dropped and less values should be filled with N/A.

When specifying the values which should be unstacked, I want to ensure that the resulting dataframe always has the same schema, independent of the input.

Compare this to e.g. Spark. There you always know beforehand what the resulting schema will be, but therefore you have to explicitly specify the values which should be converted to columns.

rhshadrach · 2021-02-13T15:53:39Z

@Hoeze: Here is an attempt, I'd be interested in hearing if there are edge-cases this misses. It works on the empty frame.

def rigid_unstack(df, level, values, fill_value=np.nan):
    return (
        df[df.index.get_level_values(level).isin(values)]
        .unstack(level)
        .reindex(
            columns=pd.MultiIndex.from_product([df.columns, values]),
            fill_value=fill_value,
        )
    )

e.g.

df = pd.DataFrame({'a': [0, 1], 'b': [2, 3], 'c': [4, 5], 'd': [6, 7]}).set_index(['a', 'b'])
print(rigid_unstack(df, 0, [1, 2, 3]))

produces

   c          d        
a  1   2   3  1   2   3
b                      
3  5 NaN NaN  7 NaN NaN

jreback · 2021-02-13T16:10:31Z

u can just reindex

rhshadrach · 2021-02-13T16:21:36Z

Ahh, thanks @jreback - comment updated.

Hoeze · 2021-03-23T19:33:43Z

Thanks a lot for solving this @rhshadrach, looks great 😀
Can we have that in the next Pandas version?

rhshadrach · 2021-03-24T11:17:17Z

I can do you one better @Hoeze - that function can be executed on the current version of pandas ;)

In all seriousness though, this is three separate operations: select, unstack, and reindex. I do no think we should add a new function to the API just because there is a use case where they are used together.

Hoeze · 2021-03-25T11:49:35Z

Yes, but having this as an option inside unstack would have been nice 🙂
Otherwise feel free to close, I do have a working solution to my problem now 👍

jreback · 2021-03-25T11:53:39Z

this is not a good default as it's a bit unexpected

Hoeze added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 6, 2020

jreback added this to the No action milestone Mar 25, 2021

jreback closed this as completed Mar 25, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: pd.unstack(level, values=["a", "b", "c"]) #36916

ENH: pd.unstack(level, values=["a", "b", "c"]) #36916

Hoeze commented Oct 6, 2020

jreback commented Oct 6, 2020

Hoeze commented Oct 6, 2020 •

edited

Loading

jreback commented Oct 6, 2020

Hoeze commented Oct 6, 2020

rhshadrach commented Feb 11, 2021

Hoeze commented Feb 13, 2021 •

edited

Loading

rhshadrach commented Feb 13, 2021 •

edited

Loading

jreback commented Feb 13, 2021

rhshadrach commented Feb 13, 2021 •

edited

Loading

Hoeze commented Mar 23, 2021

rhshadrach commented Mar 24, 2021

Hoeze commented Mar 25, 2021 •

edited

Loading

jreback commented Mar 25, 2021

ENH: pd.unstack(level, values=["a", "b", "c"]) #36916

ENH: pd.unstack(level, values=["a", "b", "c"]) #36916

Comments

Hoeze commented Oct 6, 2020

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

jreback commented Oct 6, 2020

Hoeze commented Oct 6, 2020 • edited Loading

jreback commented Oct 6, 2020

Hoeze commented Oct 6, 2020

rhshadrach commented Feb 11, 2021

Hoeze commented Feb 13, 2021 • edited Loading

rhshadrach commented Feb 13, 2021 • edited Loading

jreback commented Feb 13, 2021

rhshadrach commented Feb 13, 2021 • edited Loading

Hoeze commented Mar 23, 2021

rhshadrach commented Mar 24, 2021

Hoeze commented Mar 25, 2021 • edited Loading

jreback commented Mar 25, 2021

Hoeze commented Oct 6, 2020 •

edited

Loading

Hoeze commented Feb 13, 2021 •

edited

Loading

rhshadrach commented Feb 13, 2021 •

edited

Loading

rhshadrach commented Feb 13, 2021 •

edited

Loading

Hoeze commented Mar 25, 2021 •

edited

Loading