ENH: Add optional argument keep_index to dataframe melt method #17459

NiklasKeck · 2017-09-07T07:20:52Z

Setting keep_index to True will reuse the original DataFrame index +
names of melted columns as additional level. closes issue #17440

closes Index gets lost when DataFrame melt method is used #17440
passes git diff upstream/master -u -- "*.py" | flake8 --diff

I appreciate any corrections, comments and/or help very much, as this is my first pull request on such a big project. Thank you.

Setting keep_index to True will reuse the original DataFrame index + names of melted columns as additional level. closes issue pandas-dev#17440

codecov · 2017-09-07T09:41:29Z

Codecov Report

Merging #17459 into master will decrease coverage by 0.02%.
The diff coverage is 33.33%.

@@            Coverage Diff             @@
##           master   #17459      +/-   ##
==========================================
- Coverage   91.15%   91.13%   -0.03%     
==========================================
  Files         163      163              
  Lines       49591    49599       +8     
==========================================
- Hits        45207    45200       -7     
- Misses       4384     4399      +15

Flag	Coverage Δ
#multiple	`88.91% <33.33%> (-0.02%)`	⬇️
#single	`40.24% <0%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.72% <ø> (-0.1%)`	⬇️
pandas/core/reshape/reshape.py	`98.25% <33.33%> (-1.04%)`	⬇️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20fee85...0c64bf0. Read the comment docs.

TomAugspurger · 2017-09-07T14:35:02Z

Thanks.

We'll need tests and docs as well. Test can go in tests/reshape. For the docs, it'll need a whatsnew note in doc/source/whatsnew/v0.21.0.txt, prose docs in doc/source/reshaping.rst. It'd also be nice to have an example in the docstring.

About the implementation, the keep_index keyword doesn't fully describe what we're, since it's "keep the index, and append var_name. I wonder it makes more sense to have a keyword like index={None, 'full', 'original', 'var} (names TBD).

None: the current behavior, discard the original index and end with a RangeIndex
'full': original index + the metadata from var_name
'original': the original index
'var': The newly created var

But as I write this, I wonder if the last two would ever be useful? Do we just need a better name than keep_index?

NiklasKeck · 2017-09-08T10:25:22Z

Thank you for the comments!

I agree that keep_index is not descriptive enough. I find it hard to come up with something short that describes the whole idea within a boolean argument. Using your idea of an index keyword with multiple options looks good.

Maybe rename 'full' to 'append_variables' instead? So the whole options would be:

index = {None, ‘original‘, ‘append_variables‘}

index = ‘append_variables‘ would probably be intuitive to understand as index = index + variables

I cannot think of a good usecase for the option 'var', but I started using pandas not long ago, so there might be plenty.

Another idea:

keep_index (boolean) if True:

Just keep the original index (append nothing) and let the user decide what to append in a next step to make the index unique.
Just keep the original index and append an additional RangeIndex level (the melt_id from issue Index gets lost when DataFrame melt method is used #17440) to ensure uniqueness.

Anyway I would go for @TomAugspurger‘s idea to use a keyword with multiple options.

When we have decided what's best, I will challenge myself with writing tests and documentation :).

jreback · 2017-09-08T10:22:59Z

pandas/core/frame.py

@@ -4367,6 +4367,10 @@ def unstack(self, level=-1, fill_value=None):
        Name to use for the 'value' column.
    col_level : int or string, optional
        If columns are a MultiIndex then use this level to melt.
+    keep_index : boolean, optional, default False


this is commonly called index=False everywhere else.

add a versionadded

So better to just name it index and if True resulting in the original index with duplicate entries? What about the option @TomAugspurger proposed?

jreback · 2017-09-08T10:25:46Z

pandas/core/reshape/reshape.py

+
+    if keep_index:
+        orig_index_values = list(np.tile(frame.index.get_values(), K))
+


this is quite awkward, you have several cases which you need to disambiguate. e.g. if the original is a MI or not.

Thanks @jreback for looking over my code and the comment.

I think what I wrote should work with any number of levels.

E. g.

arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'], ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']] tuples = list(zip(*arrays)) idx_multi = pd.MultiIndex.from_tuples(tuples) idx_single = pd.Index(arrays[0]) # Index print(list(np.tile(idx_single, 1))) print(list(np.tile(idx_single, 2))) # MultiIndex print(list(np.tile(idx_multi, 1))) print(list(np.tile(idx_multi, 2)))

But do I have to make it more explicit (= Pythonic)? Or did I miss something else?

jreback · 2017-10-28T15:48:58Z

pls rebase.

jreback · 2017-11-25T16:15:45Z

closing as stale

gitgithan · 2018-12-14T04:35:16Z

@NiklasKeck @TomAugspurger What happened to this pull request? I came from #17440 and wish to contribute.
1st time contributor here, what should i know?
Below is what i currently think i should do

read https://pandas.pydata.org/pandas-docs/stable/contributing.html#committing-your-code,
find out which files/functions need to be changed (how do i find out all the paths a function call can take?)
find out which files/functions the changes can affect
identify their effects and ensure they do not damage existing usability (how is this done?)

Do i have to choose 1 of Travis-CI, Appveyor , or CircleCI to hook onto my github?

TomAugspurger · 2018-12-14T12:05:24Z

We needed to merge master into this PR to see if the tests still passed.

You can see the changed fils in
https://github.com/pandas-dev/pandas/pull/17459/files

And then run the tests as described in the contributing docs.

You don't have to do anything with the CI services.

ENH: Add optional argument keep_index to dataframe melt method

0c64bf0

Setting keep_index to True will reuse the original DataFrame index + names of melted columns as additional level. closes issue pandas-dev#17440

gfyoung added API Design Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Sep 7, 2017

jreback requested changes Sep 8, 2017

View reviewed changes

TomAugspurger mentioned this pull request Oct 30, 2017

Melt enhance #17677

Closed

2 tasks

jreback closed this Nov 25, 2017

TomAugspurger mentioned this pull request Dec 11, 2018

Index gets lost when DataFrame melt method is used #17440

Closed

smsaladi mentioned this pull request Oct 8, 2019

ENH: Add optional argument keep_index to dataframe melt method (merged master onto old PR) #28859

Closed

5 tasks

simonjayhawkins mentioned this pull request Apr 21, 2020

ENH: Add optional argument index to pd.melt to maintain index values #33659

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add optional argument keep_index to dataframe melt method #17459

ENH: Add optional argument keep_index to dataframe melt method #17459

NiklasKeck commented Sep 7, 2017 •

edited

Loading

codecov bot commented Sep 7, 2017 •

edited

Loading

TomAugspurger commented Sep 7, 2017

NiklasKeck commented Sep 8, 2017

jreback Sep 8, 2017

NiklasKeck Sep 8, 2017

jreback Sep 8, 2017

NiklasKeck Sep 8, 2017

jreback commented Oct 28, 2017

jreback commented Nov 25, 2017

gitgithan commented Dec 14, 2018

TomAugspurger commented Dec 14, 2018


		if keep_index:
		orig_index_values = list(np.tile(frame.index.get_values(), K))

ENH: Add optional argument keep_index to dataframe melt method #17459

ENH: Add optional argument keep_index to dataframe melt method #17459

Conversation

NiklasKeck commented Sep 7, 2017 • edited Loading

codecov bot commented Sep 7, 2017 • edited Loading

Codecov Report

TomAugspurger commented Sep 7, 2017

NiklasKeck commented Sep 8, 2017

jreback Sep 8, 2017

Choose a reason for hiding this comment

NiklasKeck Sep 8, 2017

Choose a reason for hiding this comment

jreback Sep 8, 2017

Choose a reason for hiding this comment

NiklasKeck Sep 8, 2017

Choose a reason for hiding this comment

jreback commented Oct 28, 2017

jreback commented Nov 25, 2017

gitgithan commented Dec 14, 2018

TomAugspurger commented Dec 14, 2018

NiklasKeck commented Sep 7, 2017 •

edited

Loading

codecov bot commented Sep 7, 2017 •

edited

Loading