Skip to content

ENH: Rename index when using DataFrame.reset_index #42346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.4.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -106,6 +106,7 @@ Other enhancements
- :meth:`read_table` now supports the argument ``storage_options`` (:issue:`39167`)
- Methods that relied on hashmap based algos such as :meth:`DataFrameGroupBy.value_counts`, :meth:`DataFrameGroupBy.count` and :func:`factorize` ignored imaginary component for complex numbers (:issue:`17927`)
- Add :meth:`Series.str.removeprefix` and :meth:`Series.str.removesuffix` introduced in Python 3.9 to remove pre-/suffixes from string-type :class:`Series` (:issue:`36944`)
- :meth:`DataFrame.reset_index` now accepts a ``names`` argument which renames the index names (:issue:`6878`)

.. ---------------------------------------------------------------------------

Expand Down
31 changes: 25 additions & 6 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5542,6 +5542,7 @@ def reset_index(
inplace: Literal[False] = ...,
col_level: Hashable = ...,
col_fill: Hashable = ...,
names: Hashable | Sequence[Hashable] = None,
) -> DataFrame:
...

Expand All @@ -5553,6 +5554,7 @@ def reset_index(
inplace: Literal[True],
col_level: Hashable = ...,
col_fill: Hashable = ...,
names: Hashable | Sequence[Hashable] = None,
) -> None:
...

Expand All @@ -5564,6 +5566,7 @@ def reset_index(
inplace: Literal[True],
col_level: Hashable = ...,
col_fill: Hashable = ...,
names: Hashable | Sequence[Hashable] = None,
) -> None:
...

Expand All @@ -5575,6 +5578,7 @@ def reset_index(
inplace: Literal[True],
col_level: Hashable = ...,
col_fill: Hashable = ...,
names: Hashable | Sequence[Hashable] = None,
) -> None:
...

Expand All @@ -5585,6 +5589,7 @@ def reset_index(
inplace: Literal[True],
col_level: Hashable = ...,
col_fill: Hashable = ...,
names: Hashable | Sequence[Hashable] = None,
) -> None:
...

Expand All @@ -5596,6 +5601,7 @@ def reset_index(
inplace: bool = ...,
col_level: Hashable = ...,
col_fill: Hashable = ...,
names: Hashable | Sequence[Hashable] = None,
) -> DataFrame | None:
...

Expand All @@ -5607,6 +5613,7 @@ def reset_index(
inplace: bool = False,
col_level: Hashable = 0,
col_fill: Hashable = "",
names: Hashable | Sequence[Hashable] = None,
) -> DataFrame | None:
"""
Reset the index, or a level of it.
Expand All @@ -5632,6 +5639,12 @@ def reset_index(
col_fill : object, default ''
If the columns have multiple levels, determines how the other
levels are named. If None then the index name is repeated.
names : str, tuple or list, default None
Using the given string, rename the DataFrame column which contains the
index data. If the DataFrame has a MultiIndex, this has to be a list or
tuple with length equal to the number of levels.

.. versionadded:: 1.4.0

Returns
-------
Expand Down Expand Up @@ -5669,6 +5682,16 @@ class max_speed
2 lion mammal 80.5
3 monkey mammal NaN

Using the `names` parameter, it is possible to choose a name for
the old index column:

>>> df.reset_index(names='name')
name class max_speed
0 falcon bird 389.0
1 parrot bird 24.0
2 lion mammal 80.5
3 monkey mammal NaN

We can use the `drop` parameter to avoid the old index being added as
a column:

Expand Down Expand Up @@ -5767,14 +5790,10 @@ class max type
if not drop:
to_insert: Iterable[tuple[Any, Any | None]]
if isinstance(self.index, MultiIndex):
names = [
(n if n is not None else f"level_{i}")
for i, n in enumerate(self.index.names)
]
names = self.index.get_default_index_names(names)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this must not be hit in tests (as you are incorrectly passing self)

to_insert = zip(self.index.levels, self.index.codes)
else:
default = "index" if "index" not in self else "level_0"
names = [default] if self.index.name is None else [self.index.name]
names = self.index.get_default_index_names(self, names)
to_insert = ((self.index, None),)

multi_col = isinstance(self.columns, MultiIndex)
Expand Down
12 changes: 12 additions & 0 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1559,6 +1559,18 @@ def _validate_names(

return new_names

def get_default_index_names(self, df: DataFrame, names: str = None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you type the output?


if names is not None and not isinstance(names, str):
raise ValueError("Names must be a string")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or list/tuple of strings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah use is_hashable here to be consistent with other cases

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we likley have tests that hit this..but


default = "index" if "index" not in self else "level_0"
if not names:
names = [default] if df.index.name is None else [df.index.name]
else:
names = [names]
return names

def _get_names(self) -> FrozenList:
return FrozenList((self.name,))

Expand Down
19 changes: 19 additions & 0 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -1391,6 +1391,25 @@ def format(
# --------------------------------------------------------------------
# Names Methods

def get_default_index_names(self, names=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you type?


if names is not None and not isinstance(names, (tuple, list)):
raise ValueError("Names must be a tuple or list")

if not names:
names = [
(n if n is not None else f"level_{i}") for i, n in enumerate(self.names)
]
else:
if len(names) != self.nlevels:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can consider returning right away and getting rid of else statement, which will simplify this code:

if not names:
    return [...]

if len(names) != self.nlevels:
    raise ValueError

return names

raise ValueError(
f"The number of provided names "
f"({len(names)}) does not match the number of "
f"MultiIndex levels ({self.nlevels})"
)

return names

def _get_names(self) -> FrozenList:
return FrozenList(self._names)

Expand Down
31 changes: 31 additions & 0 deletions pandas/tests/frame/methods/test_reset_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,37 @@ def test_reset_index_name(self):
assert return_value is None
assert df.index.name is None

def test_reset_index_rename(self, float_frame):
# GH 6878
rdf = float_frame.reset_index(names="new_name")
exp = Series(float_frame.index.values, name="new_name")
tm.assert_series_equal(rdf["new_name"], exp)

with pytest.raises(ValueError, match="Names must be a string"):
float_frame.reset_index(names=1)

def test_reset_index_rename_multiindex(self, float_frame):
# GH 6878
stacked = float_frame.stack()[::2]
stacked = DataFrame({"foo": stacked, "bar": stacked})

names = ["first", "second"]
stacked.index.names = names
deleveled = stacked.reset_index()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that you do not even need deleveled here. Indeed, you do not use names kwarg here. As for assertions, then you probably can do assertions against stacked instead of deleveled.

Copy link
Contributor Author

@gcaria gcaria Jul 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I was trying to test the new renaming feature, by taking advantage of the standard (before this PR) implementation of reset_index, where the level names are simply copied over. So stacked is the starting point from which reset_index creates two DataFrames, which should have exactly the same added columns, except for the names.

I notice now that setting stacked.index.names is not relevant, and could/should be skipped.

deleveled2 = stacked.reset_index(names=["new_first", "new_second"])
tm.assert_series_equal(
deleveled["first"], deleveled2["new_first"], check_names=False
)
tm.assert_series_equal(
deleveled["second"], deleveled2["new_second"], check_names=False
)

with pytest.raises(ValueError, match=r".* number of provided names .*"):
stacked.reset_index(names=["new_first"])

with pytest.raises(ValueError, match="Names must be a tuple or list"):
stacked.reset_index(names={"first": "new_first", "second": "new_second"})

def test_reset_index_level(self):
df = DataFrame([[1, 2, 3, 4], [5, 6, 7, 8]], columns=["A", "B", "C", "D"])

Expand Down