-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add optional argument keep_index to dataframe melt method #17459
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -720,8 +720,7 @@ def _convert_level_number(level_num, columns): | |
versionadded="", | ||
other='DataFrame.melt')) | ||
def melt(frame, id_vars=None, value_vars=None, var_name=None, | ||
value_name='value', col_level=None): | ||
# TODO: what about the existing index? | ||
value_name='value', col_level=None, keep_index=False): | ||
if id_vars is not None: | ||
if not is_list_like(id_vars): | ||
id_vars = [id_vars] | ||
|
@@ -779,7 +778,22 @@ def melt(frame, id_vars=None, value_vars=None, var_name=None, | |
mdata[col] = np.asanyarray(frame.columns | ||
._get_level_values(i)).repeat(N) | ||
|
||
return DataFrame(mdata, columns=mcolumns) | ||
result = DataFrame(mdata, columns=mcolumns) | ||
|
||
if keep_index: | ||
orig_index_values = list(np.tile(frame.index.get_values(), K)) | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is quite awkward, you have several cases which you need to disambiguate. e.g. if the original is a MI or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks @jreback for looking over my code and the comment. I think what I wrote should work with any number of levels. E. g. arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]
tuples = list(zip(*arrays))
idx_multi = pd.MultiIndex.from_tuples(tuples)
idx_single = pd.Index(arrays[0])
# Index
print(list(np.tile(idx_single, 1)))
print(list(np.tile(idx_single, 2)))
# MultiIndex
print(list(np.tile(idx_multi, 1)))
print(list(np.tile(idx_multi, 2))) But do I have to make it more explicit (= Pythonic)? Or did I miss something else? |
||
if len(frame.index.names) == len(set(frame.index.names)): | ||
orig_index_names = frame.index.names | ||
else: | ||
orig_index_names = ["original_index_{i}".format(i=i) | ||
for i in range(len(frame.index.names))] | ||
|
||
result[orig_index_names] = DataFrame(orig_index_values) | ||
|
||
result = result.set_index(orig_index_names + list(var_name)) | ||
|
||
return result | ||
|
||
|
||
def lreshape(data, groups, dropna=True, label=None): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is commonly called
index=False
everywhere else.add a versionadded
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So better to just name it
index
and if True resulting in the original index with duplicate entries? What about the option @TomAugspurger proposed?