Skip to content

Convert DataFrame.rename to keyword only; simplify axis validation #29140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jan 9, 2020
62 changes: 62 additions & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,68 @@ Backwards incompatible API changes

pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])

``DataFrame.rename`` now only accepts one positional argument
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

- :meth:`DataFrame.rename` would previously accept positional arguments that would lead
to ambiguous or undefined behavior. From pandas 1.0, only the very first argument, which
maps labels to their new names along the default axis, is allowed to be passed by position
(:issue:`29136`).

*pandas 0.25.x*

.. code-block:: ipython

In [1]: df = pd.DataFrame([[1]])
In [2]: df.rename({0: 1}, {0: 2})
FutureWarning: ...Use named arguments to resolve ambiguity...
Out[2]:
2
1 1

*pandas 1.0.0*
.. ipython:: python

df.rename({0: 1}, {0: 2})

Note that errors will now be raised when conflicting or potentially ambiguous arguments are provided.

*pandas 0.25.x*

.. code-block:: ipython

In [1]: df.rename({0: 1}, index={0: 2})
Out[1]:
0
1 1

In [2]: df.rename(mapper={0: 1}, index={0: 2})
Out[2]:
0
2 1

*pandas 1.0.0*

.. ipython:: python

df.rename({0: 1}, index={0: 2})
df.rename(mapper={0: 1}, index={0: 2})

You can still change the axis along which the first positional argument is applied by
supplying the ``axis`` keyword argument.

.. ipython:: python

df.rename({0: 1})
df.rename({0: 1}, axis=1)

If you would like to update both the index and column labels, be sure to use the respective
keywords.

.. ipython:: python

df.rename(index={0: 1}, columns={0: 2})


.. _whatsnew_1000.api.other:

Expand Down
1 change: 1 addition & 0 deletions pandas/_typing.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
FrameOrSeries = TypeVar("FrameOrSeries", bound="NDFrame")
Scalar = Union[str, int, float, bool]
Axis = Union[str, int]
Level = Union[str, int]
Ordered = Optional[bool]

# use Collection after we drop support for py35
Expand Down
32 changes: 24 additions & 8 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,12 @@
import sys
from textwrap import dedent
from typing import (
Callable,
FrozenSet,
Hashable,
Iterable,
List,
Mapping,
Optional,
Sequence,
Set,
Expand Down Expand Up @@ -93,7 +95,7 @@
)
from pandas.core.dtypes.missing import isna, notna

from pandas._typing import Axes, Dtype, FilePathOrBuffer
from pandas._typing import Axis, Axes, Dtype, FilePathOrBuffer, Level
from pandas.core import algorithms, common as com, nanops, ops
from pandas.core.accessor import CachedAccessor
from pandas.core.arrays import Categorical, ExtensionArray
Expand Down Expand Up @@ -4031,7 +4033,25 @@ def drop(
"mapper",
[("copy", True), ("inplace", False), ("level", None), ("errors", "ignore")],
)
def rename(self, *args, **kwargs):
def rename(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change here was to be explicit about what is accepted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to being explicit. The only reason we had *args, **kwargs in the first place was to support the ambiguous case.

self,
mapper: Optional[
Union[Mapping[Hashable, Hashable], Callable[[Hashable], Hashable]]
] = None,
*,
index: Optional[
Union[Mapping[Hashable, Hashable], Callable[[Hashable], Hashable]]
] = None,
columns: Optional[
Union[Mapping[Hashable, Hashable], Callable[[Hashable], Hashable]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth making a name for this in _typing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was thinking the same thing. Any suggestions? LabelReplacement?

] = None,
axis: Optional[Axis] = None,
copy: bool = True,
inplace: bool = False,
level: Optional[Level] = None,
errors: str = "ignore",
) -> "DataFrame":

"""
Alter axes labels.

Expand Down Expand Up @@ -4140,12 +4160,8 @@ def rename(self, *args, **kwargs):
2 2 5
4 3 6
"""
axes = validate_axis_style_args(self, args, kwargs, "mapper", "rename")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note - there is only one other use now of this validate_axis_style_args function (see DataFrame.reindex). The intent here is good, but I think the fact that it mutates the args and kwargs being passed it makes it very difficult to reason about. As a follow up to this would like to replace entirely with something simple that just operates on the axis argument

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything blocking a similar PR for .reindex?

As a follow up to this would like to replace entirely with something simple that just operates on the axis argument

Is this referring to DataFrame.reindex, or validate_axis_style_args? Ideally we can completely remove validate_axis_style_args.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't look before because I was trying to keep this focused, but it might not be too much extra effort. I'll take a look and if it gets out of hand let you know

kwargs.update(axes)
# Pop these, since the values are in `kwargs` under different names
kwargs.pop("axis", None)
kwargs.pop("mapper", None)
return super().rename(**kwargs)
return super().rename(mapper=mapper, index=index, columns=columns, axis=axis, copy=copy,
inplace=inplace, level=level, errors=errors)

@Substitution(**_shared_doc_kwargs)
@Appender(NDFrame.fillna.__doc__)
Expand Down
74 changes: 44 additions & 30 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
FrozenSet,
Hashable,
List,
Mapping,
Optional,
Sequence,
Set,
Expand Down Expand Up @@ -65,7 +66,7 @@
from pandas.core.dtypes.missing import isna, notna

import pandas as pd
from pandas._typing import Dtype, FilePathOrBuffer, Scalar
from pandas._typing import Axis, Dtype, FilePathOrBuffer, Level, Scalar
from pandas.core import missing, nanops
import pandas.core.algorithms as algos
from pandas.core.base import PandasObject, SelectionMixin
Expand Down Expand Up @@ -1013,7 +1014,24 @@ def swaplevel(self, i=-2, j=-1, axis=0):
# ----------------------------------------------------------------------
# Rename

def rename(self, *args, **kwargs):
def rename(
self,
mapper: Optional[
Union[Mapping[Hashable, Hashable], Callable[[Hashable], Hashable]]
] = None,
*,
index: Optional[
Union[Mapping[Hashable, Hashable], Callable[[Hashable], Hashable]]
] = None,
columns: Optional[
Union[Mapping[Hashable, Hashable], Callable[[Hashable], Hashable]]
] = None,
axis: Optional[Axis] = None,
copy: bool = True,
inplace: bool = False,
level: Optional[Level] = None,
errors: str = "ignore",
):
"""
Alter axes input function or functions. Function / dict values must be
unique (1-to-1). Labels not contained in a dict / Series will be left
Expand Down Expand Up @@ -1126,44 +1144,40 @@ def rename(self, *args, **kwargs):

See the :ref:`user guide <basics.rename>` for more.
"""
axes, kwargs = self._construct_axes_from_arguments(args, kwargs)
copy = kwargs.pop("copy", True)
inplace = kwargs.pop("inplace", False)
level = kwargs.pop("level", None)
axis = kwargs.pop("axis", None)
errors = kwargs.pop("errors", "ignore")
if axis is not None:
# Validate the axis
self._get_axis_number(axis)

if kwargs:
raise TypeError(
"rename() got an unexpected keyword "
'argument "{0}"'.format(list(kwargs.keys())[0])
)

if com.count_not_none(*axes.values()) == 0:
if mapper is None and index is None and columns is None:
raise TypeError("must pass an index to rename")

self._consolidate_inplace()
if (index is not None or columns is not None):
if axis is not None:
raise TypeError("Cannot specify both 'axis' and any of 'index' or 'columns'")
elif mapper is not None:
raise TypeError("Cannot specify both 'mapper' and any of 'index' or 'columns'")
else:
# use the mapper argument
if axis in {1, "columns"}:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We accept 3 arguments, mapper, index and columns. mapper can conflict with index or columns if either is provided, which the checks before this take care of. At this point if there aren't any conflicts, I think easiest to just combine the object

Note that this is arguably a little too strict, as this is no longer valid:

df.rename({"key": "val"}, axis=1, index={"key2": "val2"})

But I don't think there is a lot to gain in supporting that given the variety of other options available here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this is arguably a little too strict, as this is no longer valid:

I don't recall if that case was originally discussed, but I think it's OK to break it.

columns = mapper
else:
index = mapper

result = self if inplace else self.copy(deep=copy)

# start in the axis order to eliminate too many copies
for axis in range(self._AXIS_LEN):
v = axes.get(self._AXIS_NAMES[axis])
if v is None:
for axis_no, replacements in enumerate((index, columns)):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately I am trying to get rid of all of the internal _AXIS_* functions and mappings in core.generic, which seem like relics of the panel days

If we buy into just having index and columns being the potential axes in that order things can be greatly simplified

if replacements is None:
continue
f = com.get_rename_function(v)
baxis = self._get_block_manager_axis(axis)

axis = self._get_axis(axis_no)
baxis = self._get_block_manager_axis(axis_no)
f = com.get_rename_function(replacements)

if level is not None:
level = self.axes[axis]._get_level_number(level)
level = axis._get_level_number(level)

# GH 13473
if not callable(v):
indexer = self.axes[axis].get_indexer_for(v)
if not callable(replacements):
indexer = axis.get_indexer_for(replacements)
if errors == "raise" and len(indexer[indexer == -1]):
missing_labels = [
label for index, label in enumerate(v) if indexer[index] == -1
label for index, label in enumerate(replacements) if indexer[index] == -1
]
raise KeyError("{} not found in axis".format(missing_labels))

Expand Down
25 changes: 19 additions & 6 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from io import StringIO
from shutil import get_terminal_size
from textwrap import dedent
from typing import Any, Callable
from typing import Any, Callable, Hashable, Mapping, Optional, Union
import warnings

import numpy as np
Expand Down Expand Up @@ -78,6 +78,7 @@
from pandas.core.strings import StringMethods
from pandas.core.tools.datetimes import to_datetime

from pandas._typing import Axis, Level
import pandas.io.formats.format as fmt
import pandas.plotting

Expand Down Expand Up @@ -4079,7 +4080,18 @@ def align(
broadcast_axis=broadcast_axis,
)

def rename(self, index=None, **kwargs):
def rename(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Series method doesn't completely align with the DataFrame method, but I think this is an easy way to go about it while maintain backwards compat

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value in accepting axis here, if we don't also have mapper and columns?

I don't see an easy way to accept mapper. That's essentially fulfilled by index.

So maybe my question is: should we also accept columns=? (Not arguing for this btw. I don't know what's best here).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we validate axis at all, or just ignore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here axis gets ignored. Looks like this was implemented as part of #18589

So maybe my question is: should we also accept columns=? (Not arguing for this btw. I don't know what's best here).

Yea I'm not sure myself. I was thinking ideally it would have been great to align the signatures better but for where we are that would cause some API churn. So I guess best to leave unless there's a ringing need to do something here (?)

self,
index: Optional[
Union[Hashable, Mapping[Hashable, Hashable], Callable[[Hashable], Hashable]]
] = None,
*,
axis: Optional[Axis] = None,
copy: bool = True,
inplace: bool = False,
level: Optional[Level] = None,
errors: str = "ignore"
) -> "Series":
"""
Alter Series index labels or name.

Expand All @@ -4093,6 +4105,8 @@ def rename(self, index=None, **kwargs):

Parameters
----------
axis : int or str
Unused. Accepted for compatability with DataFrame method only.
index : scalar, hashable sequence, dict-like or function, optional
Functions or dict-like are transformations to apply to
the index.
Expand All @@ -4114,6 +4128,7 @@ def rename(self, index=None, **kwargs):

See Also
--------
DataFrame.rename : Corresponding DataFrame method.
Series.rename_axis : Set the name of the axis.

Examples
Expand All @@ -4140,12 +4155,10 @@ def rename(self, index=None, **kwargs):
5 3
dtype: int64
"""
kwargs["inplace"] = validate_bool_kwarg(kwargs.get("inplace", False), "inplace")

if callable(index) or is_dict_like(index):
return super().rename(index=index, **kwargs)
return super().rename(index, copy=copy, inplace=inplace, level=level, errors=errors)
else:
return self._set_name(index, inplace=kwargs.get("inplace"))
return self._set_name(index, inplace=inplace)

@Substitution(**_shared_doc_kwargs)
@Appender(generic.NDFrame.reindex.__doc__)
Expand Down
Loading