Skip to content

ENH: Rename index when using DataFrame.reset_index #42346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

gcaria
Copy link
Contributor

@gcaria gcaria commented Jul 2, 2021

Probably the hardest part of this PR is choosing a good name for the new argument. As mentioned in #6878 the name keyword is already used in Series.reset_index, although to rename the data column. It could be argued that using name also in this PR would not cause confusion, since it's hard to assume one can rename a column's name in DataFrame.reset_index (which column would it be?).

Anyway, I've opted for names. I've also thought of rename but that sounds like asking for a boolean. new_name is an option too, although a bit ugly.

@gcaria gcaria changed the title Rename index for df reset index Rename index when using DataFrame.reset_index Jul 2, 2021
Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gcaria thank you for the PR!

Please take a look at the review comments below.


names = ["first", "second"]
stacked.index.names = names
deleveled = stacked.reset_index()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that you do not even need deleveled here. Indeed, you do not use names kwarg here. As for assertions, then you probably can do assertions against stacked instead of deleveled.

Copy link
Contributor Author

@gcaria gcaria Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I was trying to test the new renaming feature, by taking advantage of the standard (before this PR) implementation of reset_index, where the level names are simply copied over. So stacked is the starting point from which reset_index creates two DataFrames, which should have exactly the same added columns, except for the names.

I notice now that setting stacked.index.names is not relevant, and could/should be skipped.

Copy link
Contributor Author

@gcaria gcaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivanovmg for your feedback!
I've added some comments that I hope help clarifying the intent of these changes.

Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello! A couple of follow-up comments.

@gcaria gcaria force-pushed the rename_index_for_df_reset_index branch from 93163f0 to 2fca991 Compare July 6, 2021 08:04
Copy link
Contributor Author

@gcaria gcaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I see the problem with that else statement, you're absolutely right!

By the way, could you point me to where the whatsnew file is so that I can edit it?

@ivanovmg
Copy link
Member

ivanovmg commented Jul 6, 2021

Now I see the problem with that else statement, you're absolutely right!

By the way, could you point me to where the whatsnew file is so that I can edit it?

Whatnew file is here: doc/source/whatsnew/v1.4.0.rst

@gcaria gcaria force-pushed the rename_index_for_df_reset_index branch from dd538e4 to 8b41541 Compare July 6, 2021 11:24
@gcaria gcaria changed the title Rename index when using DataFrame.reset_index ENH: Rename index when using DataFrame.reset_index Jul 6, 2021
@jreback jreback added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jul 7, 2021
@pep8speaks
Copy link

pep8speaks commented Jul 9, 2021

Hello @gcaria! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-09 11:12:17 UTC

@gcaria gcaria force-pushed the rename_index_for_df_reset_index branch 2 times, most recently from 3897758 to aa7cb2a Compare July 9, 2021 19:38
Copy link
Contributor Author

@gcaria gcaria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jreback for your feedback. I've added a few checks where errors are raised.

For now I've set that names has to be a tuple or list for a MultiIndex, but technically any sequence can be accepted. Accepting a dictionary like {'old' : 'new'} would be useful to rename the levels without worrying about their order.

@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Aug 19, 2021
@gcaria gcaria force-pushed the rename_index_for_df_reset_index branch from 1705967 to fcc5db2 Compare August 23, 2021 17:32
@gcaria
Copy link
Contributor Author

gcaria commented Aug 23, 2021

Sorry for the long pause, I've finally added the Index.get_default_index_names and MultiIndex.get_default_index_names methods, and rebased to master since I was at it.

@gcaria gcaria force-pushed the rename_index_for_df_reset_index branch 2 times, most recently from de49b85 to cb5249c Compare August 23, 2021 18:07
@alimcmaster1
Copy link
Member

@gcaria - this looks almost there

Please can you get the code-checks passing - see the contributing guide

pandas/core/indexes/multi.py:1407:21: PDF005 leading space in concatenated strings
pandas/core/frame.py:5639:89: E501 line too long (94 > 88 characters)
pandas/core/frame.py:5640:89: E501 line too long (89 > 88 characters)

Looks like these changes are also causing some test failures - see Azure Pipelines.

TestResetIndex.test_reset_index_rename_multiindex

Can you also merge master and address any remaining comments above

Thanks!

@alimcmaster1 alimcmaster1 self-assigned this Aug 31, 2021
@gcaria gcaria force-pushed the rename_index_for_df_reset_index branch 2 times, most recently from 928393d to fc1ee7a Compare September 6, 2021 13:44
@lithomas1 lithomas1 removed the Stale label Sep 6, 2021
@gcaria gcaria force-pushed the rename_index_for_df_reset_index branch from fc1ee7a to 6f4cd27 Compare September 7, 2021 09:19
@gcaria gcaria force-pushed the rename_index_for_df_reset_index branch from 6f4cd27 to 437a9b2 Compare September 7, 2021 09:20
@gcaria gcaria force-pushed the rename_index_for_df_reset_index branch from 437a9b2 to 96f52fb Compare September 7, 2021 09:21
def get_default_index_names(self, df: DataFrame, names: str = None):

if names is not None and not isinstance(names, str):
raise ValueError("Names must be a string")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or list/tuple of strings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah use is_hashable here to be consistent with other cases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we likley have tests that hit this..but

Copy link
Member

@alimcmaster1 alimcmaster1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few mypy failures - otherwise looks almost there!

pandas/core/indexes/base.py:1562: error: Variable "pandas.core.indexes.base.Index.str" is not valid as a type  [valid-type]
pandas/core/indexes/base.py:1562: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
pandas/core/frame.py:5803: error: Value of type "Union[Hashable, Sequence[Hashable]]" is not indexable  [index]

Copy link
Member

@ivanovmg ivanovmg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments from my side.

@@ -1559,6 +1559,18 @@ def _validate_names(

return new_names

def get_default_index_names(self, df: DataFrame, names: str = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you type the output?

@@ -1391,6 +1391,25 @@ def format(
# --------------------------------------------------------------------
# Names Methods

def get_default_index_names(self, names=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you type?

Comment on lines 1399 to 1404
if not names:
names = [
(n if n is not None else f"level_{i}") for i, n in enumerate(self.names)
]
else:
if len(names) != self.nlevels:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can consider returning right away and getting rid of else statement, which will simplify this code:

if not names:
    return [...]

if len(names) != self.nlevels:
    raise ValueError

return names

def get_default_index_names(self, df: DataFrame, names: str = None):

if names is not None and not isinstance(names, str):
raise ValueError("Names must be a string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah use is_hashable here to be consistent with other cases

def get_default_index_names(self, df: DataFrame, names: str = None):

if names is not None and not isinstance(names, str):
raise ValueError("Names must be a string")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we likley have tests that hit this..but

(n if n is not None else f"level_{i}")
for i, n in enumerate(self.index.names)
]
names = self.index.get_default_index_names(names)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must not be hit in tests (as you are incorrectly passing self)

if names is not None and not isinstance(names, str):
raise ValueError("Names must be a string")

default = "index" if "index" not in df else "level_0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead test if .index._is_multi or not

@@ -1559,6 +1559,21 @@ def _validate_names(

return new_names

def get_default_index_names(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must be the same signature as the multi-index one

@@ -1559,6 +1559,21 @@ def _validate_names(

return new_names

def get_default_index_names(
self, df: DataFrame, names: str = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be passed the dataframe at all

@@ -1391,6 +1391,26 @@ def format(
# --------------------------------------------------------------------
# Names Methods

def get_default_index_names(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this a private method

@jreback
Copy link
Contributor

jreback commented Oct 4, 2021

this was looking good, if you'd merge master and address comments

@alimcmaster1
Copy link
Member

@gcaria - do you still want to work on this? See comment above from Jeff

@gcaria
Copy link
Contributor Author

gcaria commented Oct 13, 2021

Unfortunately I don't have time to work on this anymore, but all the work is almost done.

@mroeschke
Copy link
Member

Thanks for the update. If you happen to find time to work on this PR in the future, or if anyone else wants to finish up this feature, we can reopen.

@mroeschke mroeschke closed this Oct 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: name(s) argument for reset_index?
7 participants