-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Melt enhance #17677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Melt enhance #17677
Conversation
pandas/core/reshape/reshape.py
Outdated
def melt(frame, id_vars=None, value_vars=None, var_name=None, | ||
value_name='value', col_level=None): | ||
def _melt(frame, id_vars=None, value_vars=None, var_name=None, | ||
value_name='value', col_level=None, stubnames=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would be fine with another module called melt.py
which contained all of the code for melt;
that way helper functions can be broken out and impl is understandable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created melt.py and test_melt.py. I moved melt out of reshape.py and corrected some imports. I heavily commented the code as well. I do need at least 20 more tests as there is a huge range of possibilities now with this new implementation.
pandas/tests/reshape/test_reshape.py
Outdated
@@ -33,6 +34,33 @@ def setup_method(self, method): | |||
self.df1.columns = [list('ABC'), list('abc')] | |||
self.df1.columns.names = ['CAP', 'low'] | |||
|
|||
self.df2 = DataFrame( | |||
{'City': ['Houston', 'Austin', 'Hoover'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also move to test_melt.py (all melt related)
@tdpetrou I wouldn't go too far atm. This needs quite a lot of review. I am not real happy with all of these arguments. |
Yea, the wide_to_long functionality can be nixed. It doesn't even exactly replicate it and then we'd be back to the same function signature |
If you really wanted to incorporate wide_to_long functionality and minimize arguments, you could have a single extra parameter prefix that takes either a string or a two-item tuple with the first item being the separator. |
@jreback Can we review this? This new melt function adds a ton of new functionality that I think would be very beneficial. |
if you can rebase would be great |
@tdpetrou I would like to review this PR, but it is very difficult to see what has changed or what are the added enhancements to melt. Therefore, can you update the docstring, add a whatsnew note and add narrative docs in reshaping.rst ?
For reviewing and discussing purposes, it might also be better to keep it in two separate PRs (the ability to melt on multiple columns (the referenced issue) and the introduction of wide_to_long similar functionality). But I don't know to what extent the implementation of the one is needed for the other? (how independent both additions are?) |
This proposed API is very complicated. i would be much more in favor of something much simpler. Having a single function do a single thing is much much more intutive. A possible resolution is to introduce a lazy object with methods itself, akin to groupby. e.g.
or somesuch, with each function performing 1 action |
@jreback I am not a fan of that, as those are not really 'two-step' functions like you have with groupby/resample/rolling. So I don't see the point of doing something like that. If there is nothing really in common (apart from that it are reshaping methods), it should just be separate functions/methods (that could live in the same namespace, although in pandas we tend to either add it top-level or as method, so then already is in the one big namespace). Let's focus here on the ability of handling multiple columns in melt, I don't think that part is that controversial, and if I am not wrong (but didn't look yet in detail) does not add new keywords (is that correct @tdpetrou ?) |
To first address the two-step function. I agree with @jorisvandenbossche and don't think that is necessary. Although, the So, I will:
|
I think everything is updated and rebased appropriately. I had to delete a few changes #17628 that were mixed in here. I made a little notebook on the enhancement as well (https://nbviewer.jupyter.org/github/tdpetrou/Machine-Learning-Books-With-Python/blob/master/melt%20enhancemenet.ipynb) @TomAugspurger I don't think it will be much of a problem to keep the index. It'll just be like another entry to One more thing: There are two test failures in the wide_to_long test class. This is because my new melt function treats |
so, this is pretty much impossible to review as is. I would first do a simple PR which moves the implementation of .melt to a separate module. This would be a simple copy/paste. Then you can make this implementation much more simple, but for example not in-lining functions inside melt itself, rather defining them in the module. |
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -13,7 +13,7 @@ version. | |||
New features | |||
~~~~~~~~~~~~ | |||
|
|||
- | |||
- Simultaneous melting of independent groups of columns is now possible with ``melt``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you will need a full example, pls pointers to the docs. but let's settle on an API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added Sentence to 'highlights' section and two examples to 'New Features' section.
API will be backwards compatiable.
value_vars
can take a list of listsvalue_name
andvar_name
can take a list of new column names
@TomAugspurger mentioned the addition of keep_index
parameter. This can easily be added.
wide_to_long
funcitonality can be exactly duplicated (and more) with a single additional parameter that takes a tuple of suffix
and sep
but this might be too much for one function.
Regardless, wide_to_long
can be refactored to be faster.
This new melt enhancement makes lreshape obsolete (other than it being faster)
|
||
Simultaneously melt multiple groups of columns: | ||
|
||
>>> df2 = pd.DataFrame({'City': ['Houston', 'Miami'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
simple examples first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is as simple as it gets. One id column, two column groups of each length two and only two rows of data.
@jreback
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want a separate PR entirely which simply moves melt to melt.py. don't add anything (except changes that are required in .api for example) or testing to make sure it all passes green. we will then merge that. you can rebase this pr on top. this way its easy to see what you actually changed.
df = pd.DataFrame({'State': ['Texas', 'Florida', 'Alabama'], | ||
'Mango':[4, 10, 90], | ||
'Orange': [10, 8, 14], | ||
'Watermelon':[40, 99, 43]}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generaly like to keep docs to 80 chars or less (readibility)
|
||
Simultaneous unpivoting of independent groups of columns with ``melt`` | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
Previously, ``melt`` was only able to unpivot a single group of columns. This was done by passing all the column names in the group as a list to the ``value_vars`` parameter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number(s)
var_name : scalar | ||
Name to use for the 'variable' column. If None it uses | ||
Column(s) to unpivot. If list of lists, simultaneously unpivot | ||
each sublist into its own variable column. If not specified, uses all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a mention where this changes as appropriate (meaning in 0.22.0) in the doc-string
Hello @tdpetrou! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 10, 2017 at 22:57 Hours UTC |
@jreback I just rebased this and would like to work on it |
ok. this is doing a lot of things. I would appreciate to break this apart into separate distinct changes. The code needs some work. I can comment, but easier to do it in smaller pieces. |
@jreback There is basically one piece, now that the wide_to_long functionality has been removed. Essentially, you pass a list of lists into I suppose you could break it up into 3 pieces
I think I can break it up like this without too much effort. What do you think? |
closing as stale. ping if you want to update. |
git diff upstream/master -u -- "*.py" | flake8 --diff