CLN/API: wide_to_long or lreshape #15003

jreback · 2016-12-28T11:11:41Z

In [27]: data = pd.DataFrame({'hr1': [514, 573], 'hr2': [545, 526],
    ...:                       'team': ['Red Sox', 'Yankees'],
    ...:                       'year1': [2007, 2008], 'year2': [2008, 2008]})
    ...: 

In [28]: data
Out[28]: 
   hr1  hr2     team  year1  year2
0  514  545  Red Sox   2007   2008
1  573  526  Yankees   2008   2008

In [29]: pd.lreshape(data, {'year': ['year1', 'year2'], 'hr': ['hr1', 'hr2']})
Out[29]: 
      team  year   hr
0  Red Sox  2007  514
1  Yankees  2008  573
2  Red Sox  2008  545
3  Yankees  2008  526

In [30]: pd.wide_to_long(data, ['hr', 'year'], 'team', 'index')
Out[30]: 
                hr  year
team    index           
Red Sox 1      514  2007
Yankees 1      573  2008
Red Sox 2      545  2008
Yankees 2      526  2008

So we should drop one of these.

The text was updated successfully, but these errors were encountered:

jreback · 2016-12-28T11:12:15Z

cc @Nuffe

erikcs · 2016-12-28T16:35:24Z

Yes having both is redundant, but I think wide_to_long is more flexible?

lreshape does not handle group variables of different length

wide_to_long produces the correct result for lreshape's test case (but that is with dropna=False, which is also the output Stata gives)

I could not make lreshape produce the intented output for all of wide_to_long's test cases. This one, or this, for example.

tdpetrou · 2017-08-18T19:35:04Z

Is lreshape getting deprecated? There are some SO answers getting a decent amount of upvotes.

tdpetrou · 2017-08-22T23:11:27Z

@jreback I really like wide_to_long as it's the easiest way to 'simultaneously melt' different sets of columns. It would be nice if the identification variables, i were optional as lreshape is slightly easier when there are no identificaiton variables. Also, it would be good if i were changed to id_vars and j changed to var_name. Maybe this can all be solved if melt were to take a list of lists of columns.

jreback · 2017-08-24T12:48:07Z

@tdpetrou well reducing the API surface area is good. not averse to modifying .melt() to do this. if you have a proposal pls put it up.

jreback · 2017-08-24T12:49:08Z

the is we have 3 functions to do somewhat similar things. happy to consolidate the API. (aside from which documentaiton on lresahpe is nil and wide_to_long not much better)

tdpetrou · 2017-08-24T15:44:00Z

The simplest addition to melt would be to add functionality to do the simultaneous melting of different sets of columns. I think this would be achievable with the value_vars parameter accepting a list of lists or even a dictionary of lists (like lreshape). I think this would eliminate any use of lreshape.

To add the functionality of pd.wide_to_long, you might have to add three parameters, stubs, sep and suffix, where stubs would be a boolean whether or not the value_vars are stubnames or not.

erikcs · 2017-10-13T07:49:27Z

I agree the current configuration is not elegant. I made an earlier PR to wide_to_long to fix some edge cases that where wrong (which I discovered while cleaning a data set) but don't think it fits nicely into a consistent "calculus of data manipulations".

Looking to R and the "tidyverse" they now and then change their API and introduce new "verbs" for existing concepts: before, long was melt, and wide was done with dcast. Now it's gather and spread. In econometrics and statistics, long and wide is the common nomenclature, and is what Stata adheres to. Stata may be a dinosaur, but they are extremely consistent in their API and naming scheme.

Pandas' melt is a copy of Hadley Wickham's melt, which is a modification of base R's reshape (same command name as Stata by the way) with a new name - giving the API a impression of bits and pieces taken from here and there.

I don't really have a good and general proposal for a solution here, more than that IMHO a nomenclature should perhaps be chosen and stuck with.

tdpetrou · 2017-10-13T14:08:06Z

@erikcs I made a major enhancement to melt in #17677. With that, it can simultaneously melt any number of columns, and supports any kind of multiindex (it had very poor support before that) and handles duplicate column names as well. It also has wide_to_long functionality and with a little more tweaking it will exactly replicate it.

mroeschke · 2021-05-02T01:04:34Z

It appears #34314 and #34313 are the more current discussion issues for deprecating one of these issues so closing in favor of those

jreback added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 28, 2016

jreback added this to the Next Major Release milestone Dec 28, 2016

jreback mentioned this issue Sep 22, 2017

wide_to_long does not convert integer suffixes to int #17627

Closed

tpanza mentioned this issue Apr 9, 2020

DOC: lreshape and wide_to_long references #33417

Closed

This was referenced May 22, 2020

DEPR: pd.lreshape #34313

Open

DEPR: pd.wide_to_long #34314

Open

simonjayhawkins mentioned this issue Jun 22, 2020

lreshape and wide_to_long documentation (Closes #33417) #33418

Merged

5 tasks

mroeschke closed this as completed May 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLN/API: wide_to_long or lreshape #15003

CLN/API: wide_to_long or lreshape #15003

jreback commented Dec 28, 2016

jreback commented Dec 28, 2016

erikcs commented Dec 28, 2016 •

edited

Loading

tdpetrou commented Aug 18, 2017

tdpetrou commented Aug 22, 2017

jreback commented Aug 24, 2017

jreback commented Aug 24, 2017

tdpetrou commented Aug 24, 2017

erikcs commented Oct 13, 2017 •

edited

Loading

tdpetrou commented Oct 13, 2017 •

edited

Loading

mroeschke commented May 2, 2021

CLN/API: wide_to_long or lreshape #15003

CLN/API: wide_to_long or lreshape #15003

Comments

jreback commented Dec 28, 2016

jreback commented Dec 28, 2016

erikcs commented Dec 28, 2016 • edited Loading

tdpetrou commented Aug 18, 2017

tdpetrou commented Aug 22, 2017

jreback commented Aug 24, 2017

jreback commented Aug 24, 2017

tdpetrou commented Aug 24, 2017

erikcs commented Oct 13, 2017 • edited Loading

tdpetrou commented Oct 13, 2017 • edited Loading

mroeschke commented May 2, 2021

erikcs commented Dec 28, 2016 •

edited

Loading

erikcs commented Oct 13, 2017 •

edited

Loading

tdpetrou commented Oct 13, 2017 •

edited

Loading