Simultaneously melt multiple columns #17676

tdpetrou · 2017-09-25T23:50:32Z

Melt Enhancement

Summary: This is a proposal with a pull request to enhance melt to simultaneously melt multiple groups of columns and to add functionality from wide_to_long along with better MultiIndexing capabilities. See this notebook for more examples.

Melts different groups of columns by passing a list of lists into value_vars. Each group gets melted into its own column. This feature replaces the need for lreshape.
When melting different groups of columns, groups do not have to be the same length. The shorter groups are filled with missing values.
Adds parameters stubnames(boolean), prefix and sep from function wide_to_long. It keeps the suffixes in separate columns and does not align them in the same way.
Can select any number of MultiIndex levels and greatly increase MultiIndex functionality
Works with repeated column names, which normally show up when selecting a subset of MultiIndex levels
Performance is ~30-40% faster than original melt, slightly slower than lreshape and much faster than wide_to_long

>>> df = pd.DataFrame({'City': ['Houston', 'Austin', 'Hoover'],
                   'State': ['Texas', 'Texas', 'Alabama'],
                   'Name':['Aria', 'Penelope', 'Niko'],
                   'Mango':[4, 10, 90],
                   'Orange': [10, 8, 14], 
                   'Watermelon':[40, 99, 43],
                   'Gin':[16, 200, 34],
                   'Vodka':[20, 33, 18]},
                 columns=['City', 'State', 'Name', 'Mango', 'Orange', 'Watermelon', 'Gin', 'Vodka'])

      City    State      Name  Mango  Orange  Watermelon  Gin  Vodka
0  Houston    Texas      Aria      4      10          40   16     20
1   Austin    Texas  Penelope     10       8          99  200     33
2   Hoover  Alabama      Niko     90      14          43   34     18

Use a list of lists in value_vars to melt the fruit and drinks

>>> df.melt(id_vars=['City', 'State'], value_vars=[['Mango', 'Orange', 'Watermelon'], ['Gin', 'Vodka']], 
                    var_name=['Fruit', 'Drink'], value_name=['Pounds', 'Ounces'])

      City    State       Fruit  Pounds  Drink  Ounces
0  Houston    Texas       Mango       4    Gin    16.0
1   Austin    Texas       Mango      10    Gin   200.0
2   Hoover  Alabama       Mango      90    Gin    34.0
3  Houston    Texas      Orange      10  Vodka    20.0
4   Austin    Texas      Orange       8  Vodka    33.0
5   Hoover  Alabama      Orange      14  Vodka    18.0
6  Houston    Texas  Watermelon      40    nan     NaN
7   Austin    Texas  Watermelon      99    nan     NaN
8   Hoover  Alabama  Watermelon      43    nan     NaN

wide_to_long functionality. Added parameters stubnames(boolean), sep and suffix.

>>> df1 = pd.DataFrame({'group': ['a', 'b', 'c'],
                   'exp_1':[4, 10, -9],
                   'exp_2': [10, 8, 14], 
                   'res_1':[8, 5, 4],
                   'res_3':[11, 0, 7]}, columns=['group', 'exp_1', 'exp_2', 'res_1', 'res_3'])

  group  exp_1  exp_2  res_1  res_3
0     a      4     10      8     11
1     b     10      8      5      0
2     c     -9     14      4      7

>>> df1.melt(id_vars='group', value_vars=['exp','res'], stubnames=True, sep='_')

  group  variable_exp  exp  variable_res  res
0     a             1    4             1    8
1     b             1   10             1    5
2     c             1   -9             1    4
3     a             2   10             3   11
4     b             2    8             3    0
5     c             2   14             3    7

Also adds support for all kinds of multiindexing

>>> df2 = df.copy()
>>> df2.columns = pd.MultiIndex.from_arrays([list('aabbcccd'), list('ffffgggg'), df.columns], 
                                       names=[None, None, 'some vars'])

                 a                  b            c                     d
                 f                  f            g                     g
some vars     City    State      Name Mango Orange Watermelon  Gin Vodka
0          Houston    Texas      Aria     4     10         40   16    20
1           Austin    Texas  Penelope    10      8         99  200    33
2           Hoover  Alabama      Niko    90     14         43   34    18

>>> df2.melt(id_vars=[('a', 'f', 'State')], 
           value_vars=[[('b', 'f', 'Name'), ('c', 'g', 'Watermelon')],
                       [('b','f','Mango'), ('c','g', 'Orange'), ('d', 'g', 'Vodka')]],
           var_name=[['myvar1', 'myvar2', 'myvar3'],
                     ['next_myvar1', 'next_myvar2', 'next_myvar3']],
           value_name=['some values', 'more_values'])

 (a, f, State) myvar1 myvar2      myvar3 some values next_myvar1 next_myvar2  \
0         Texas      b      f        Name        Aria           b           f   
1         Texas      b      f        Name    Penelope           b           f   
2       Alabama      b      f        Name        Niko           b           f   
3         Texas      c      g  Watermelon          40           c           g   
4         Texas      c      g  Watermelon          99           c           g   
5       Alabama      c      g  Watermelon          43           c           g   
6         Texas    nan    nan         nan         NaN           d           g   
7         Texas    nan    nan         nan         NaN           d           g   
8       Alabama    nan    nan         nan         NaN           d           g   

  next_myvar3  more_values  
0       Mango            4  
1       Mango           10  
2       Mango           90  
3      Orange           10  
4      Orange            8  
5      Orange           14  
6       Vodka           20  
7       Vodka           33  
8       Vodka           18

Problem description

Currently, there is poor support for simultaneous melting of multiple groups of columns. lreshape is old and undocumented. wide_to_long api does not match melt and it's slow.

The text was updated successfully, but these errors were encountered:

mansi195 · 2018-07-19T13:04:47Z

Its showing me following error when I am using - list of lists in value_vars:
Location based indexing can only have [labels (MUST BE IN THE INDEX), slices of labels (BOTH endpoints included! Can be slices of integers if the index is integers), listlike of labels, boolean] types

Nasiek · 2019-09-28T22:31:55Z

the example with fruits and drinks is throwing an ValueError: Location based indexing can only have [labels (MUST BE IN THE INDEX), slices of labels (BOTH endpoints included! Can be slices of integers if the index is integers), listlike of labels, boolean] types for the fruits, drinks example...

samukweku · 2022-04-21T01:19:00Z

One efficient option for simultaneously melting multiple columns is pivot_longer from pyjanitor ... it offers a variety of options for transforming to long form; for this particular use case, simply pass a list of regular expressions that match the columns to be melted :

# pip install pyjanitor 
import pandas as pd
import janitor

Solution to the first example:

(df
.pivot_longer(
    index = ['City', 'State'], 
    column_names = slice('Mango', 'Vodka'), 
    names_to = ('Fruit', 'Drink'), 
    values_to = ['Pounds', 'Ounces'], 
    names_pattern = ['M|O|W', 'G|V'])
)
      City    State       Fruit  Pounds  Drink  Ounces
0  Houston    Texas       Mango       4    Gin    16.0
1   Austin    Texas       Mango      10    Gin   200.0
2   Hoover  Alabama       Mango      90    Gin    34.0
3  Houston    Texas      Orange      10  Vodka    20.0
4   Austin    Texas      Orange       8  Vodka    33.0
5   Hoover  Alabama      Orange      14  Vodka    18.0
6  Houston    Texas  Watermelon      40    NaN     NaN
7   Austin    Texas  Watermelon      99    NaN     NaN
8   Hoover  Alabama  Watermelon      43    NaN     NaN

Solution to the second example:

(df1
.pivot_longer(
    index = 'group', 
    names_to = ('variable_exp', 'variable_res'), 
    values_to = ('exp', 'res'), 
    names_pattern = ('exp', 'res'))
.assign(
    variable_exp = lambda df: df.variable_exp.str[-1], 
    variable_res = lambda df: df.variable_res.str[-1])
)
  group variable_exp  exp variable_res  res
0     a            1    4            1    8
1     b            1   10            1    5
2     c            1   -9            1    4
3     a            2   10            3   11
4     b            2    8            3    0
5     c            2   14            3    7

You can view the source_code, specifically from line 815.

tdpetrou mentioned this issue Sep 25, 2017

Melt enhance #17677

Closed

2 tasks

sinhrks added Reshaping Concat, Merge/Join, Stack/Unstack, Explode API Design labels Sep 26, 2017

mroeschke added Enhancement and removed API Design labels Jun 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simultaneously melt multiple columns #17676

Simultaneously melt multiple columns #17676

tdpetrou commented Sep 25, 2017

mansi195 commented Jul 19, 2018

Nasiek commented Sep 28, 2019

samukweku commented Apr 21, 2022 •

edited

Loading

Simultaneously melt multiple columns #17676

Simultaneously melt multiple columns #17676

Comments

tdpetrou commented Sep 25, 2017

Melt Enhancement

Problem description

mansi195 commented Jul 19, 2018

Nasiek commented Sep 28, 2019

samukweku commented Apr 21, 2022 • edited Loading

samukweku commented Apr 21, 2022 •

edited

Loading