Melt enhance #17677

tdpetrou · 2017-09-25T23:51:25Z

[ x] closes Simultaneously melt multiple columns #17676
[ x] tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback · 2017-09-26T00:12:22Z

pandas/core/reshape/reshape.py

-def melt(frame, id_vars=None, value_vars=None, var_name=None,
-         value_name='value', col_level=None):
+def _melt(frame, id_vars=None, value_vars=None, var_name=None,
+          value_name='value', col_level=None, stubnames=False,


i would be fine with another module called melt.py
which contained all of the code for melt;

that way helper functions can be broken out and impl is understandable

I created melt.py and test_melt.py. I moved melt out of reshape.py and corrected some imports. I heavily commented the code as well. I do need at least 20 more tests as there is a huge range of possibilities now with this new implementation.

jreback · 2017-09-26T13:32:48Z

pandas/tests/reshape/test_reshape.py

@@ -33,6 +34,33 @@ def setup_method(self, method):
        self.df1.columns = [list('ABC'), list('abc')]
        self.df1.columns.names = ['CAP', 'low']

+        self.df2 = DataFrame(
+            {'City': ['Houston', 'Austin', 'Hoover'],


also move to test_melt.py (all melt related)

jreback · 2017-09-26T16:25:23Z

@tdpetrou I wouldn't go too far atm. This needs quite a lot of review. I am not real happy with all of these arguments.

tdpetrou · 2017-09-26T16:29:44Z

Yea, the wide_to_long functionality can be nixed. It doesn't even exactly replicate it and then we'd be back to the same function signature

tdpetrou · 2017-09-26T16:35:06Z

If you really wanted to incorporate wide_to_long functionality and minimize arguments, you could have a single extra parameter prefix that takes either a string or a two-item tuple with the first item being the separator.

tdpetrou · 2017-10-30T01:06:53Z

@jreback Can we review this? This new melt function adds a ton of new functionality that I think would be very beneficial.

jreback · 2017-10-30T01:11:26Z

if you can rebase would be great

jorisvandenbossche · 2017-10-30T10:12:10Z

@tdpetrou I would like to review this PR, but it is very difficult to see what has changed or what are the added enhancements to melt. Therefore, can you update the docstring, add a whatsnew note and add narrative docs in reshaping.rst ?
I agree with moving the implementation to a separate melt.py module, but for reviewing purposes I maybe would like to keep it where it was, and only afterwards move it (so not refactor + move at the same time).

Yea, the wide_to_long functionality can be nixed. It doesn't even exactly replicate it and then we'd be back to the same function signature

For reviewing and discussing purposes, it might also be better to keep it in two separate PRs (the ability to melt on multiple columns (the referenced issue) and the introduction of wide_to_long similar functionality). But I don't know to what extent the implementation of the one is needed for the other? (how independent both additions are?)

jreback · 2017-10-30T12:54:14Z

This proposed API is very complicated. i would be much more in favor of something much simpler. Having a single function do a single thing is much much more intutive. A possible resolution is to introduce a lazy object with methods itself, akin to groupby. e.g.

r = df.reshape(.....)
r.melt()
r.to_long()

or somesuch, with each function performing 1 action

jorisvandenbossche · 2017-10-30T13:34:46Z

@jreback I am not a fan of that, as those are not really 'two-step' functions like you have with groupby/resample/rolling. So I don't see the point of doing something like that. If there is nothing really in common (apart from that it are reshaping methods), it should just be separate functions/methods (that could live in the same namespace, although in pandas we tend to either add it top-level or as method, so then already is in the one big namespace).

Let's focus here on the ability of handling multiple columns in melt, I don't think that part is that controversial, and if I am not wrong (but didn't look yet in detail) does not add new keywords (is that correct @tdpetrou ?)

tdpetrou · 2017-10-30T14:19:02Z

To first address the two-step function. I agree with @jorisvandenbossche and don't think that is necessary. Although, the wide_to_long functionality can be captured with a single additional parameter, let's keep things simple and just drop it completely from this PR.

So, I will:

Keep melt in its original file reshape.py
Remove the wide to long functionality - meaning no extra parameters
Change the whatsnew
Add notes to reshaping.rst

TomAugspurger · 2017-10-30T15:41:04Z

@tdpetrou can you take a look at #17459 and see how those would conflict / overlap with eachother?

I haven't had a chance to go through the changes here in detail yet.

tdpetrou · 2017-10-30T18:54:08Z

I think everything is updated and rebased appropriately. I had to delete a few changes #17628 that were mixed in here. I made a little notebook on the enhancement as well (https://nbviewer.jupyter.org/github/tdpetrou/Machine-Learning-Books-With-Python/blob/master/melt%20enhancemenet.ipynb)

@TomAugspurger I don't think it will be much of a problem to keep the index. It'll just be like another entry toid_vars

One more thing: There are two test failures in the wide_to_long test class. This is because my new melt function treats None and an empty list as the same thing when passed to either value_vars or id_vars. The current melt function will return an empty dataframe if you pass an empty list to value_vars. However, if you pass an empty list to id_vars then it returns the same thing as if it were None.

jreback · 2017-10-31T00:58:12Z

so, this is pretty much impossible to review as is. I would first do a simple PR which moves the implementation of .melt to a separate module. This would be a simple copy/paste. Then you can make this implementation much more simple, but for example not in-lining functions inside melt itself, rather defining them in the module.

jreback · 2017-10-31T00:55:54Z

doc/source/whatsnew/v0.22.0.txt

@@ -13,7 +13,7 @@ version.
 New features
 ~~~~~~~~~~~~

-
+- Simultaneous melting of independent groups of columns is now possible with ``melt``.


you will need a full example, pls pointers to the docs. but let's settle on an API.

Added Sentence to 'highlights' section and two examples to 'New Features' section.

API will be backwards compatiable.

value_vars can take a list of lists

value_name and var_name can take a list of new column names

@TomAugspurger mentioned the addition of keep_index parameter. This can easily be added.

wide_to_long funcitonality can be exactly duplicated (and more) with a single additional parameter that takes a tuple of suffix and sep but this might be too much for one function.

Regardless, wide_to_long can be refactored to be faster.

This new melt enhancement makes lreshape obsolete (other than it being faster)

jreback · 2017-10-31T00:56:21Z

pandas/core/frame.py

+
+    Simultaneously melt multiple groups of columns:
+
+    >>> df2 = pd.DataFrame({'City': ['Houston', 'Miami'],


simple examples first

This is as simple as it gets. One id column, two column groups of each length two and only two rows of data.

tdpetrou · 2017-11-02T19:39:49Z

@jreback
I ....

put melt in its own module
removed any nested functions
made a separate tests module
added a couple examples to whatsnew
simplified an example and added one in the reshaping docs

jreback

I want a separate PR entirely which simply moves melt to melt.py. don't add anything (except changes that are required in .api for example) or testing to make sure it all passes green. we will then merge that. you can rebase this pr on top. this way its easy to see what you actually changed.

jreback · 2017-11-03T00:06:01Z

doc/source/reshaping.rst

+   df = pd.DataFrame({'State': ['Texas', 'Florida', 'Alabama'],
+                      'Mango':[4, 10, 90],
+                      'Orange': [10, 8, 14], 
+                      'Watermelon':[40, 99, 43]},


generaly like to keep docs to 80 chars or less (readibility)

jreback · 2017-11-03T00:06:37Z

doc/source/whatsnew/v0.22.0.txt

+
+Simultaneous unpivoting of independent groups of columns with ``melt``
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+Previously, ``melt`` was only able to unpivot a single group of columns. This was done by passing all the column names in the group as a list to the ``value_vars`` parameter.


add the issue number(s)

jreback · 2017-11-03T00:07:09Z

pandas/core/frame.py

-    var_name : scalar
-        Name to use for the 'variable' column. If None it uses
+        Column(s) to unpivot. If list of lists, simultaneously unpivot
+        each sublist into its own variable column. If not specified, uses all


add a mention where this changes as appropriate (meaning in 0.22.0) in the doc-string

tdpetrou · 2017-11-07T13:09:02Z

@jreback Getting back to this now. I put melt in its own module as requested #18148

pep8speaks · 2017-12-10T22:34:12Z

Hello @tdpetrou! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 10, 2017 at 22:57 Hours UTC

tdpetrou · 2017-12-10T22:40:16Z

@jreback I just rebased this and would like to work on it

jreback · 2017-12-10T23:07:00Z

ok. this is doing a lot of things. I would appreciate to break this apart into separate distinct changes. The code needs some work. I can comment, but easier to do it in smaller pieces.

tdpetrou · 2017-12-11T16:41:55Z

@jreback There is basically one piece, now that the wide_to_long functionality has been removed. Essentially, you pass a list of lists into value_vars to simultaneously melt those sets of columns.

I suppose you could break it up into 3 pieces

Make only original melt work with my new way
Add in MultiIndex support - this handles all varieties of string/integer MultiIndex
Finally add in simultaneously melt

I think I can break it up like this without too much effort. What do you think?

jreback · 2018-02-10T18:43:05Z

closing as stale. ping if you want to update.

jreback requested changes Sep 26, 2017

View reviewed changes

jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Sep 26, 2017

tdpetrou mentioned this pull request Oct 13, 2017

CLN/API: wide_to_long or lreshape #15003

Closed

jreback requested changes Oct 31, 2017

View reviewed changes

jreback requested changes Nov 3, 2017

View reviewed changes

tdpetrou mentioned this pull request Nov 7, 2017

melt moved into its own module #18148

Merged

jreback mentioned this pull request Nov 12, 2017

BUG: coerce pd.wide_to_long suffixes to ints #17628

Merged

3 tasks

tdpetrou added 4 commits December 10, 2017 17:07

BUG: coerce pd.wide_to_long suffixes to numeric

ce2499a

ENH: simultaneous melting

b4f3a30

added lots of comments

d570a71

updated melt docs and put melt in own module

755c3db

tdpetrou added 2 commits December 10, 2017 17:50

added newline

68e55d9

0.21 whatsnew back to original

614fc01

jreback closed this Feb 10, 2018

smsaladi mentioned this pull request Oct 8, 2019

ENH: Add optional argument keep_index to dataframe melt method (merged master onto old PR) #28859

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Melt enhance #17677

Melt enhance #17677

tdpetrou commented Sep 25, 2017

jreback Sep 26, 2017

tdpetrou Sep 26, 2017

jreback Sep 26, 2017

jreback commented Sep 26, 2017

tdpetrou commented Sep 26, 2017

tdpetrou commented Sep 26, 2017

tdpetrou commented Oct 30, 2017

jreback commented Oct 30, 2017

jorisvandenbossche commented Oct 30, 2017

jreback commented Oct 30, 2017

jorisvandenbossche commented Oct 30, 2017

tdpetrou commented Oct 30, 2017

TomAugspurger commented Oct 30, 2017

tdpetrou commented Oct 30, 2017 •

edited

Loading

jreback commented Oct 31, 2017

jreback Oct 31, 2017

tdpetrou Nov 2, 2017

jreback Oct 31, 2017

tdpetrou Nov 2, 2017

tdpetrou commented Nov 2, 2017

jreback left a comment

jreback Nov 3, 2017

jreback Nov 3, 2017

jreback Nov 3, 2017

tdpetrou commented Nov 7, 2017

pep8speaks commented Dec 10, 2017 •

edited

Loading

tdpetrou commented Dec 10, 2017

jreback commented Dec 10, 2017

tdpetrou commented Dec 11, 2017

jreback commented Feb 10, 2018


		Simultaneously melt multiple groups of columns:

		>>> df2 = pd.DataFrame({'City': ['Houston', 'Miami'],

Melt enhance #17677

Melt enhance #17677

Conversation

tdpetrou commented Sep 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 26, 2017

tdpetrou commented Sep 26, 2017

tdpetrou commented Sep 26, 2017

tdpetrou commented Oct 30, 2017

jreback commented Oct 30, 2017

jorisvandenbossche commented Oct 30, 2017

jreback commented Oct 30, 2017

jorisvandenbossche commented Oct 30, 2017

tdpetrou commented Oct 30, 2017

TomAugspurger commented Oct 30, 2017

tdpetrou commented Oct 30, 2017 • edited Loading

jreback commented Oct 31, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdpetrou commented Nov 2, 2017

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tdpetrou commented Nov 7, 2017

pep8speaks commented Dec 10, 2017 • edited Loading

Comment last updated on December 10, 2017 at 22:57 Hours UTC

tdpetrou commented Dec 10, 2017

jreback commented Dec 10, 2017

tdpetrou commented Dec 11, 2017

jreback commented Feb 10, 2018

tdpetrou commented Oct 30, 2017 •

edited

Loading

pep8speaks commented Dec 10, 2017 •

edited

Loading