ENH: Rename index when using DataFrame.reset_index #42346

gcaria · 2021-07-02T15:42:15Z

closes ENH: name(s) argument for reset_index? #6878
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Probably the hardest part of this PR is choosing a good name for the new argument. As mentioned in #6878 the name keyword is already used in Series.reset_index, although to rename the data column. It could be argued that using name also in this PR would not cause confusion, since it's hard to assume one can rename a column's name in DataFrame.reset_index (which column would it be?).

Anyway, I've opted for names. I've also thought of rename but that sounds like asking for a boolean. new_name is an option too, although a bit ugly.

ivanovmg

@gcaria thank you for the PR!

Please take a look at the review comments below.

pandas/core/frame.py

pandas/tests/frame/methods/test_reset_index.py

ivanovmg · 2021-07-05T03:03:12Z

pandas/tests/frame/methods/test_reset_index.py

+
+        names = ["first", "second"]
+        stacked.index.names = names
+        deleveled = stacked.reset_index()


I suppose that you do not even need deleveled here. Indeed, you do not use names kwarg here. As for assertions, then you probably can do assertions against stacked instead of deleveled.

Here I was trying to test the new renaming feature, by taking advantage of the standard (before this PR) implementation of reset_index, where the level names are simply copied over. So stacked is the starting point from which reset_index creates two DataFrames, which should have exactly the same added columns, except for the names.

I notice now that setting stacked.index.names is not relevant, and could/should be skipped.

gcaria

Thanks @ivanovmg for your feedback!
I've added some comments that I hope help clarifying the intent of these changes.

ivanovmg

Hello! A couple of follow-up comments.

pandas/tests/frame/methods/test_reset_index.py

pandas/core/frame.py

gcaria

Now I see the problem with that else statement, you're absolutely right!

By the way, could you point me to where the whatsnew file is so that I can edit it?

ivanovmg · 2021-07-06T10:46:34Z

Now I see the problem with that else statement, you're absolutely right!

By the way, could you point me to where the whatsnew file is so that I can edit it?

Whatnew file is here: doc/source/whatsnew/v1.4.0.rst

pandas/core/frame.py

pandas/tests/frame/methods/test_reset_index.py

pandas/core/frame.py

pep8speaks · 2021-07-09T19:33:48Z

Hello @gcaria! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-09-09 11:12:17 UTC

gcaria

Thanks @jreback for your feedback. I've added a few checks where errors are raised.

For now I've set that names has to be a tuple or list for a MultiIndex, but technically any sequence can be accepted. Accepting a dictionary like {'old' : 'new'} would be useful to rename the levels without worrying about their order.

github-actions · 2021-08-19T00:02:12Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

gcaria · 2021-08-23T17:33:14Z

Sorry for the long pause, I've finally added the Index.get_default_index_names and MultiIndex.get_default_index_names methods, and rebased to master since I was at it.

alimcmaster1 · 2021-08-31T22:44:06Z

@gcaria - this looks almost there

Please can you get the code-checks passing - see the contributing guide

pandas/core/indexes/multi.py:1407:21: PDF005 leading space in concatenated strings
pandas/core/frame.py:5639:89: E501 line too long (94 > 88 characters)
pandas/core/frame.py:5640:89: E501 line too long (89 > 88 characters)

Looks like these changes are also causing some test failures - see Azure Pipelines.

TestResetIndex.test_reset_index_rename_multiindex

Can you also merge master and address any remaining comments above

Thanks!

alimcmaster1 · 2021-09-07T21:52:31Z

pandas/core/indexes/base.py

+    def get_default_index_names(self, df: DataFrame, names: str = None):
+
+        if names is not None and not isinstance(names, str):
+            raise ValueError("Names must be a string")


Or list/tuple of strings?

yeah use is_hashable here to be consistent with other cases

we likley have tests that hit this..but

alimcmaster1

Few mypy failures - otherwise looks almost there!

pandas/core/indexes/base.py:1562: error: Variable "pandas.core.indexes.base.Index.str" is not valid as a type  [valid-type]
pandas/core/indexes/base.py:1562: note: See https://mypy.readthedocs.io/en/stable/common_issues.html#variables-vs-type-aliases
pandas/core/frame.py:5803: error: Value of type "Union[Hashable, Sequence[Hashable]]" is not indexable  [index]

ivanovmg

A couple of comments from my side.

ivanovmg · 2021-09-08T01:27:42Z

pandas/core/indexes/base.py

@@ -1559,6 +1559,18 @@ def _validate_names(

        return new_names

+    def get_default_index_names(self, df: DataFrame, names: str = None):


Can you type the output?

ivanovmg · 2021-09-08T01:28:02Z

pandas/core/indexes/multi.py

@@ -1391,6 +1391,25 @@ def format(
    # --------------------------------------------------------------------
    # Names Methods

+    def get_default_index_names(self, names=None):


Can you type?

ivanovmg · 2021-09-08T01:30:23Z

pandas/core/indexes/multi.py

+        if not names:
+            names = [
+                (n if n is not None else f"level_{i}") for i, n in enumerate(self.names)
+            ]
+        else:
+            if len(names) != self.nlevels:


You can consider returning right away and getting rid of else statement, which will simplify this code:

if not names: return [...] if len(names) != self.nlevels: raise ValueError return names

jreback · 2021-09-09T11:49:11Z

pandas/core/indexes/base.py

+    def get_default_index_names(self, df: DataFrame, names: str = None):
+
+        if names is not None and not isinstance(names, str):
+            raise ValueError("Names must be a string")


yeah use is_hashable here to be consistent with other cases

jreback · 2021-09-09T11:49:29Z

pandas/core/indexes/base.py

+    def get_default_index_names(self, df: DataFrame, names: str = None):
+
+        if names is not None and not isinstance(names, str):
+            raise ValueError("Names must be a string")


we likley have tests that hit this..but

jreback · 2021-09-09T11:50:26Z

pandas/core/frame.py

-                    (n if n is not None else f"level_{i}")
-                    for i, n in enumerate(self.index.names)
-                ]
+                names = self.index.get_default_index_names(names)


this must not be hit in tests (as you are incorrectly passing self)

jreback · 2021-09-09T11:51:26Z

pandas/core/indexes/base.py

+        if names is not None and not isinstance(names, str):
+            raise ValueError("Names must be a string")
+
+        default = "index" if "index" not in df else "level_0"


instead test if .index._is_multi or not

jreback · 2021-09-09T11:53:04Z

pandas/core/indexes/base.py

@@ -1559,6 +1559,21 @@ def _validate_names(

        return new_names

+    def get_default_index_names(


this must be the same signature as the multi-index one

jreback · 2021-09-09T11:53:22Z

pandas/core/indexes/base.py

@@ -1559,6 +1559,21 @@ def _validate_names(

        return new_names

+    def get_default_index_names(
+        self, df: DataFrame, names: str = None


this shouldn't be passed the dataframe at all

jreback · 2021-09-09T11:53:47Z

pandas/core/indexes/multi.py

@@ -1391,6 +1391,26 @@ def format(
    # --------------------------------------------------------------------
    # Names Methods

+    def get_default_index_names(


make this a private method

jreback · 2021-10-04T00:31:05Z

this was looking good, if you'd merge master and address comments

alimcmaster1 · 2021-10-12T23:14:59Z

@gcaria - do you still want to work on this? See comment above from Jeff

gcaria · 2021-10-13T07:02:28Z

Unfortunately I don't have time to work on this anymore, but all the work is almost done.

mroeschke · 2021-10-31T01:10:37Z

Thanks for the update. If you happen to find time to work on this PR in the future, or if anyone else wants to finish up this feature, we can reopen.

gcaria changed the title ~~Rename index for df reset index~~ Rename index when using DataFrame.reset_index Jul 2, 2021

ivanovmg reviewed Jul 5, 2021

View reviewed changes

gcaria commented Jul 5, 2021

View reviewed changes

ivanovmg reviewed Jul 5, 2021

View reviewed changes

pandas/tests/frame/methods/test_reset_index.py Outdated Show resolved Hide resolved

pandas/tests/frame/methods/test_reset_index.py Show resolved Hide resolved

pandas/core/frame.py Outdated Show resolved Hide resolved

gcaria force-pushed the rename_index_for_df_reset_index branch from 93163f0 to 2fca991 Compare July 6, 2021 08:04

gcaria commented Jul 6, 2021

View reviewed changes

gcaria force-pushed the rename_index_for_df_reset_index branch from dd538e4 to 8b41541 Compare July 6, 2021 11:24

gcaria changed the title ~~Rename index when using DataFrame.reset_index~~ ENH: Rename index when using DataFrame.reset_index Jul 6, 2021

jreback requested changes Jul 7, 2021

View reviewed changes

pandas/core/frame.py Outdated Show resolved Hide resolved

pandas/core/frame.py Outdated Show resolved Hide resolved

jreback added Enhancement Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jul 7, 2021

jreback requested changes Jul 8, 2021

View reviewed changes

pandas/core/frame.py Outdated Show resolved Hide resolved

pandas/core/frame.py Show resolved Hide resolved

pandas/tests/frame/methods/test_reset_index.py Show resolved Hide resolved

pandas/core/frame.py Outdated Show resolved Hide resolved

gcaria force-pushed the rename_index_for_df_reset_index branch 2 times, most recently from 3897758 to aa7cb2a Compare July 9, 2021 19:38

gcaria commented Jul 9, 2021

View reviewed changes

github-actions bot added the Stale label Aug 19, 2021

gcaria force-pushed the rename_index_for_df_reset_index branch from 1705967 to fcc5db2 Compare August 23, 2021 17:32

gcaria force-pushed the rename_index_for_df_reset_index branch 2 times, most recently from de49b85 to cb5249c Compare August 23, 2021 18:07

alimcmaster1 self-assigned this Aug 31, 2021

gcaria force-pushed the rename_index_for_df_reset_index branch 2 times, most recently from 928393d to fc1ee7a Compare September 6, 2021 13:44

lithomas1 removed the Stale label Sep 6, 2021

gcaria force-pushed the rename_index_for_df_reset_index branch from fc1ee7a to 6f4cd27 Compare September 7, 2021 09:19

Add argument to rename index when resetting it.

ef2a6a5

gcaria added 6 commits September 7, 2021 11:20

Add tests.

69b3073

Split test function

141da42

Add issue number.

0673a5f

Edit whatsnew

ab57232

Add an example in the doc-string.

ecf88ff

Edit doc-string, test failure cases.

548a30f

gcaria force-pushed the rename_index_for_df_reset_index branch from 6f4cd27 to 437a9b2 Compare September 7, 2021 09:20

Add get_default_index_names method to Index and MultiIndex

96f52fb

gcaria force-pushed the rename_index_for_df_reset_index branch from 437a9b2 to 96f52fb Compare September 7, 2021 09:21

alimcmaster1 reviewed Sep 7, 2021

View reviewed changes

alimcmaster1 requested changes Sep 7, 2021

View reviewed changes

ivanovmg reviewed Sep 8, 2021

View reviewed changes

gcaria added 2 commits September 8, 2021 12:15

Simplify logic

3c006c0

Add typing

11e4687

jreback requested changes Sep 9, 2021

View reviewed changes

mroeschke closed this Oct 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Rename index when using DataFrame.reset_index #42346

ENH: Rename index when using DataFrame.reset_index #42346

gcaria commented Jul 2, 2021 •

edited

Loading

ivanovmg left a comment

ivanovmg Jul 5, 2021

gcaria Jul 5, 2021 •

edited

Loading

gcaria left a comment

ivanovmg left a comment

gcaria left a comment

ivanovmg commented Jul 6, 2021

pep8speaks commented Jul 9, 2021 •

edited

Loading

gcaria left a comment

github-actions bot commented Aug 19, 2021

gcaria commented Aug 23, 2021

alimcmaster1 commented Aug 31, 2021

alimcmaster1 Sep 7, 2021

jreback Sep 9, 2021

jreback Sep 9, 2021

alimcmaster1 left a comment

ivanovmg left a comment

ivanovmg Sep 8, 2021

ivanovmg Sep 8, 2021

ivanovmg Sep 8, 2021

jreback Sep 9, 2021

jreback Sep 9, 2021

jreback Sep 9, 2021

jreback Sep 9, 2021

jreback Sep 9, 2021

jreback Sep 9, 2021

jreback Sep 9, 2021

jreback commented Oct 4, 2021

alimcmaster1 commented Oct 12, 2021

gcaria commented Oct 13, 2021

mroeschke commented Oct 31, 2021

		@@ -1559,6 +1559,18 @@ def _validate_names(

		return new_names

		def get_default_index_names(self, df: DataFrame, names: str = None):

		@@ -1559,6 +1559,21 @@ def _validate_names(

		return new_names

		def get_default_index_names(

ENH: Rename index when using DataFrame.reset_index #42346

ENH: Rename index when using DataFrame.reset_index #42346

Conversation

gcaria commented Jul 2, 2021 • edited Loading

ivanovmg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gcaria Jul 5, 2021 • edited Loading

Choose a reason for hiding this comment

gcaria left a comment

Choose a reason for hiding this comment

ivanovmg left a comment

Choose a reason for hiding this comment

gcaria left a comment

Choose a reason for hiding this comment

ivanovmg commented Jul 6, 2021

pep8speaks commented Jul 9, 2021 • edited Loading

Comment last updated at 2021-09-09 11:12:17 UTC

gcaria left a comment

Choose a reason for hiding this comment

github-actions bot commented Aug 19, 2021

gcaria commented Aug 23, 2021

alimcmaster1 commented Aug 31, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alimcmaster1 left a comment

Choose a reason for hiding this comment

ivanovmg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 4, 2021

alimcmaster1 commented Oct 12, 2021

gcaria commented Oct 13, 2021

mroeschke commented Oct 31, 2021

gcaria commented Jul 2, 2021 •

edited

Loading

gcaria Jul 5, 2021 •

edited

Loading

pep8speaks commented Jul 9, 2021 •

edited

Loading