Skip to content

Simultaneously melt multiple columns #17676

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
tdpetrou opened this issue Sep 25, 2017 · 3 comments
Open

Simultaneously melt multiple columns #17676

tdpetrou opened this issue Sep 25, 2017 · 3 comments
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@tdpetrou
Copy link
Contributor

Melt Enhancement

Summary: This is a proposal with a pull request to enhance melt to simultaneously melt multiple groups of columns and to add functionality from wide_to_long along with better MultiIndexing capabilities. See this notebook for more examples.

  • Melts different groups of columns by passing a list of lists into value_vars. Each group gets melted into its own column. This feature replaces the need for lreshape.
  • When melting different groups of columns, groups do not have to be the same length. The shorter groups are filled with missing values.
  • Adds parameters stubnames(boolean), prefix and sep from function wide_to_long. It keeps the suffixes in separate columns and does not align them in the same way.
  • Can select any number of MultiIndex levels and greatly increase MultiIndex functionality
  • Works with repeated column names, which normally show up when selecting a subset of MultiIndex levels
  • Performance is ~30-40% faster than original melt, slightly slower than lreshape and much faster than wide_to_long
>>> df = pd.DataFrame({'City': ['Houston', 'Austin', 'Hoover'],
                   'State': ['Texas', 'Texas', 'Alabama'],
                   'Name':['Aria', 'Penelope', 'Niko'],
                   'Mango':[4, 10, 90],
                   'Orange': [10, 8, 14], 
                   'Watermelon':[40, 99, 43],
                   'Gin':[16, 200, 34],
                   'Vodka':[20, 33, 18]},
                 columns=['City', 'State', 'Name', 'Mango', 'Orange', 'Watermelon', 'Gin', 'Vodka'])

      City    State      Name  Mango  Orange  Watermelon  Gin  Vodka
0  Houston    Texas      Aria      4      10          40   16     20
1   Austin    Texas  Penelope     10       8          99  200     33
2   Hoover  Alabama      Niko     90      14          43   34     18

Use a list of lists in value_vars to melt the fruit and drinks

>>> df.melt(id_vars=['City', 'State'], value_vars=[['Mango', 'Orange', 'Watermelon'], ['Gin', 'Vodka']], 
                    var_name=['Fruit', 'Drink'], value_name=['Pounds', 'Ounces'])

      City    State       Fruit  Pounds  Drink  Ounces
0  Houston    Texas       Mango       4    Gin    16.0
1   Austin    Texas       Mango      10    Gin   200.0
2   Hoover  Alabama       Mango      90    Gin    34.0
3  Houston    Texas      Orange      10  Vodka    20.0
4   Austin    Texas      Orange       8  Vodka    33.0
5   Hoover  Alabama      Orange      14  Vodka    18.0
6  Houston    Texas  Watermelon      40    nan     NaN
7   Austin    Texas  Watermelon      99    nan     NaN
8   Hoover  Alabama  Watermelon      43    nan     NaN

wide_to_long functionality. Added parameters stubnames(boolean), sep and suffix.

>>> df1 = pd.DataFrame({'group': ['a', 'b', 'c'],
                   'exp_1':[4, 10, -9],
                   'exp_2': [10, 8, 14], 
                   'res_1':[8, 5, 4],
                   'res_3':[11, 0, 7]}, columns=['group', 'exp_1', 'exp_2', 'res_1', 'res_3'])

  group  exp_1  exp_2  res_1  res_3
0     a      4     10      8     11
1     b     10      8      5      0
2     c     -9     14      4      7

>>> df1.melt(id_vars='group', value_vars=['exp','res'], stubnames=True, sep='_')

  group  variable_exp  exp  variable_res  res
0     a             1    4             1    8
1     b             1   10             1    5
2     c             1   -9             1    4
3     a             2   10             3   11
4     b             2    8             3    0
5     c             2   14             3    7

Also adds support for all kinds of multiindexing

>>> df2 = df.copy()
>>> df2.columns = pd.MultiIndex.from_arrays([list('aabbcccd'), list('ffffgggg'), df.columns], 
                                       names=[None, None, 'some vars'])

                 a                  b            c                     d
                 f                  f            g                     g
some vars     City    State      Name Mango Orange Watermelon  Gin Vodka
0          Houston    Texas      Aria     4     10         40   16    20
1           Austin    Texas  Penelope    10      8         99  200    33
2           Hoover  Alabama      Niko    90     14         43   34    18

>>> df2.melt(id_vars=[('a', 'f', 'State')], 
           value_vars=[[('b', 'f', 'Name'), ('c', 'g', 'Watermelon')],
                       [('b','f','Mango'), ('c','g', 'Orange'), ('d', 'g', 'Vodka')]],
           var_name=[['myvar1', 'myvar2', 'myvar3'],
                     ['next_myvar1', 'next_myvar2', 'next_myvar3']],
           value_name=['some values', 'more_values'])

 (a, f, State) myvar1 myvar2      myvar3 some values next_myvar1 next_myvar2  \
0         Texas      b      f        Name        Aria           b           f   
1         Texas      b      f        Name    Penelope           b           f   
2       Alabama      b      f        Name        Niko           b           f   
3         Texas      c      g  Watermelon          40           c           g   
4         Texas      c      g  Watermelon          99           c           g   
5       Alabama      c      g  Watermelon          43           c           g   
6         Texas    nan    nan         nan         NaN           d           g   
7         Texas    nan    nan         nan         NaN           d           g   
8       Alabama    nan    nan         nan         NaN           d           g   

  next_myvar3  more_values  
0       Mango            4  
1       Mango           10  
2       Mango           90  
3      Orange           10  
4      Orange            8  
5      Orange           14  
6       Vodka           20  
7       Vodka           33  
8       Vodka           18 

Problem description

Currently, there is poor support for simultaneous melting of multiple groups of columns. lreshape is old and undocumented. wide_to_long api does not match melt and it's slow.

@tdpetrou tdpetrou mentioned this issue Sep 25, 2017
2 tasks
@sinhrks sinhrks added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Sep 26, 2017
@mansi195
Copy link

Its showing me following error when I am using - list of lists in value_vars:
Location based indexing can only have [labels (MUST BE IN THE INDEX), slices of labels (BOTH endpoints included! Can be slices of integers if the index is integers), listlike of labels, boolean] types

@Nasiek
Copy link

Nasiek commented Sep 28, 2019

the example with fruits and drinks is throwing an ValueError: Location based indexing can only have [labels (MUST BE IN THE INDEX), slices of labels (BOTH endpoints included! Can be slices of integers if the index is integers), listlike of labels, boolean] types for the fruits, drinks example...

@samukweku
Copy link
Contributor

samukweku commented Apr 21, 2022

One efficient option for simultaneously melting multiple columns is pivot_longer from pyjanitor ... it offers a variety of options for transforming to long form; for this particular use case, simply pass a list of regular expressions that match the columns to be melted :

# pip install pyjanitor 
import pandas as pd
import janitor

Solution to the first example:

(df
.pivot_longer(
    index = ['City', 'State'], 
    column_names = slice('Mango', 'Vodka'), 
    names_to = ('Fruit', 'Drink'), 
    values_to = ['Pounds', 'Ounces'], 
    names_pattern = ['M|O|W', 'G|V'])
)
      City    State       Fruit  Pounds  Drink  Ounces
0  Houston    Texas       Mango       4    Gin    16.0
1   Austin    Texas       Mango      10    Gin   200.0
2   Hoover  Alabama       Mango      90    Gin    34.0
3  Houston    Texas      Orange      10  Vodka    20.0
4   Austin    Texas      Orange       8  Vodka    33.0
5   Hoover  Alabama      Orange      14  Vodka    18.0
6  Houston    Texas  Watermelon      40    NaN     NaN
7   Austin    Texas  Watermelon      99    NaN     NaN
8   Hoover  Alabama  Watermelon      43    NaN     NaN

Solution to the second example:

(df1
.pivot_longer(
    index = 'group', 
    names_to = ('variable_exp', 'variable_res'), 
    values_to = ('exp', 'res'), 
    names_pattern = ('exp', 'res'))
.assign(
    variable_exp = lambda df: df.variable_exp.str[-1], 
    variable_res = lambda df: df.variable_res.str[-1])
)
  group variable_exp  exp variable_res  res
0     a            1    4            1    8
1     b            1   10            1    5
2     c            1   -9            1    4
3     a            2   10            3   11
4     b            2    8            3    0
5     c            2   14            3    7

You can view the source_code, specifically from line 815.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants