Skip to content

Melt enhance #17677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Melt enhance #17677

wants to merge 6 commits into from

Conversation

tdpetrou
Copy link
Contributor

def melt(frame, id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None):
def _melt(frame, id_vars=None, value_vars=None, var_name=None,
value_name='value', col_level=None, stubnames=False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would be fine with another module called melt.py
which contained all of the code for melt;

that way helper functions can be broken out and impl is understandable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created melt.py and test_melt.py. I moved melt out of reshape.py and corrected some imports. I heavily commented the code as well. I do need at least 20 more tests as there is a huge range of possibilities now with this new implementation.

@@ -33,6 +34,33 @@ def setup_method(self, method):
self.df1.columns = [list('ABC'), list('abc')]
self.df1.columns.names = ['CAP', 'low']

self.df2 = DataFrame(
{'City': ['Houston', 'Austin', 'Hoover'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also move to test_melt.py (all melt related)

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Sep 26, 2017
@jreback
Copy link
Contributor

jreback commented Sep 26, 2017

@tdpetrou I wouldn't go too far atm. This needs quite a lot of review. I am not real happy with all of these arguments.

@tdpetrou
Copy link
Contributor Author

Yea, the wide_to_long functionality can be nixed. It doesn't even exactly replicate it and then we'd be back to the same function signature

@tdpetrou
Copy link
Contributor Author

If you really wanted to incorporate wide_to_long functionality and minimize arguments, you could have a single extra parameter prefix that takes either a string or a two-item tuple with the first item being the separator.

@tdpetrou
Copy link
Contributor Author

@jreback Can we review this? This new melt function adds a ton of new functionality that I think would be very beneficial.

@jreback
Copy link
Contributor

jreback commented Oct 30, 2017

if you can rebase would be great

@jorisvandenbossche
Copy link
Member

@tdpetrou I would like to review this PR, but it is very difficult to see what has changed or what are the added enhancements to melt. Therefore, can you update the docstring, add a whatsnew note and add narrative docs in reshaping.rst ?
I agree with moving the implementation to a separate melt.py module, but for reviewing purposes I maybe would like to keep it where it was, and only afterwards move it (so not refactor + move at the same time).

Yea, the wide_to_long functionality can be nixed. It doesn't even exactly replicate it and then we'd be back to the same function signature

For reviewing and discussing purposes, it might also be better to keep it in two separate PRs (the ability to melt on multiple columns (the referenced issue) and the introduction of wide_to_long similar functionality). But I don't know to what extent the implementation of the one is needed for the other? (how independent both additions are?)

@jreback
Copy link
Contributor

jreback commented Oct 30, 2017

This proposed API is very complicated. i would be much more in favor of something much simpler. Having a single function do a single thing is much much more intutive. A possible resolution is to introduce a lazy object with methods itself, akin to groupby. e.g.

r = df.reshape(.....)
r.melt()
r.to_long()

or somesuch, with each function performing 1 action

@jorisvandenbossche
Copy link
Member

@jreback I am not a fan of that, as those are not really 'two-step' functions like you have with groupby/resample/rolling. So I don't see the point of doing something like that. If there is nothing really in common (apart from that it are reshaping methods), it should just be separate functions/methods (that could live in the same namespace, although in pandas we tend to either add it top-level or as method, so then already is in the one big namespace).

Let's focus here on the ability of handling multiple columns in melt, I don't think that part is that controversial, and if I am not wrong (but didn't look yet in detail) does not add new keywords (is that correct @tdpetrou ?)

@tdpetrou
Copy link
Contributor Author

To first address the two-step function. I agree with @jorisvandenbossche and don't think that is necessary. Although, the wide_to_long functionality can be captured with a single additional parameter, let's keep things simple and just drop it completely from this PR.

So, I will:

  • Keep melt in its original file reshape.py
  • Remove the wide to long functionality - meaning no extra parameters
  • Change the whatsnew
  • Add notes to reshaping.rst

@TomAugspurger
Copy link
Contributor

@tdpetrou can you take a look at #17459 and see how those would conflict / overlap with eachother?

I haven't had a chance to go through the changes here in detail yet.

@tdpetrou
Copy link
Contributor Author

tdpetrou commented Oct 30, 2017

I think everything is updated and rebased appropriately. I had to delete a few changes #17628 that were mixed in here. I made a little notebook on the enhancement as well (https://nbviewer.jupyter.org/github/tdpetrou/Machine-Learning-Books-With-Python/blob/master/melt%20enhancemenet.ipynb)

@TomAugspurger I don't think it will be much of a problem to keep the index. It'll just be like another entry toid_vars

One more thing: There are two test failures in the wide_to_long test class. This is because my new melt function treats None and an empty list as the same thing when passed to either value_vars or id_vars. The current melt function will return an empty dataframe if you pass an empty list to value_vars. However, if you pass an empty list to id_vars then it returns the same thing as if it were None.

@jreback
Copy link
Contributor

jreback commented Oct 31, 2017

so, this is pretty much impossible to review as is. I would first do a simple PR which moves the implementation of .melt to a separate module. This would be a simple copy/paste. Then you can make this implementation much more simple, but for example not in-lining functions inside melt itself, rather defining them in the module.

@@ -13,7 +13,7 @@ version.
New features
~~~~~~~~~~~~

-
- Simultaneous melting of independent groups of columns is now possible with ``melt``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you will need a full example, pls pointers to the docs. but let's settle on an API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Sentence to 'highlights' section and two examples to 'New Features' section.

API will be backwards compatiable.

  • value_vars can take a list of lists
  • value_name and var_name can take a list of new column names

@TomAugspurger mentioned the addition of keep_index parameter. This can easily be added.

wide_to_long funcitonality can be exactly duplicated (and more) with a single additional parameter that takes a tuple of suffix and sep but this might be too much for one function.

Regardless, wide_to_long can be refactored to be faster.

This new melt enhancement makes lreshape obsolete (other than it being faster)


Simultaneously melt multiple groups of columns:

>>> df2 = pd.DataFrame({'City': ['Houston', 'Miami'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simple examples first

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is as simple as it gets. One id column, two column groups of each length two and only two rows of data.

@tdpetrou
Copy link
Contributor Author

tdpetrou commented Nov 2, 2017

@jreback
I ....

  • put melt in its own module
  • removed any nested functions
  • made a separate tests module
  • added a couple examples to whatsnew
  • simplified an example and added one in the reshaping docs

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want a separate PR entirely which simply moves melt to melt.py. don't add anything (except changes that are required in .api for example) or testing to make sure it all passes green. we will then merge that. you can rebase this pr on top. this way its easy to see what you actually changed.

df = pd.DataFrame({'State': ['Texas', 'Florida', 'Alabama'],
'Mango':[4, 10, 90],
'Orange': [10, 8, 14],
'Watermelon':[40, 99, 43]},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generaly like to keep docs to 80 chars or less (readibility)


Simultaneous unpivoting of independent groups of columns with ``melt``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Previously, ``melt`` was only able to unpivot a single group of columns. This was done by passing all the column names in the group as a list to the ``value_vars`` parameter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number(s)

var_name : scalar
Name to use for the 'variable' column. If None it uses
Column(s) to unpivot. If list of lists, simultaneously unpivot
each sublist into its own variable column. If not specified, uses all
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a mention where this changes as appropriate (meaning in 0.22.0) in the doc-string

@tdpetrou
Copy link
Contributor Author

tdpetrou commented Nov 7, 2017

@jreback Getting back to this now. I put melt in its own module as requested #18148

@pep8speaks
Copy link

pep8speaks commented Dec 10, 2017

Hello @tdpetrou! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 10, 2017 at 22:57 Hours UTC

@tdpetrou
Copy link
Contributor Author

@jreback I just rebased this and would like to work on it

@jreback
Copy link
Contributor

jreback commented Dec 10, 2017

ok. this is doing a lot of things. I would appreciate to break this apart into separate distinct changes. The code needs some work. I can comment, but easier to do it in smaller pieces.

@tdpetrou
Copy link
Contributor Author

@jreback There is basically one piece, now that the wide_to_long functionality has been removed. Essentially, you pass a list of lists into value_vars to simultaneously melt those sets of columns.

I suppose you could break it up into 3 pieces

  • Make only original melt work with my new way
  • Add in MultiIndex support - this handles all varieties of string/integer MultiIndex
  • Finally add in simultaneously melt

I think I can break it up like this without too much effort. What do you think?

@jreback
Copy link
Contributor

jreback commented Feb 10, 2018

closing as stale. ping if you want to update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simultaneously melt multiple columns
5 participants