Skip to content

ENH: Add replace method to Index #32542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 54 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
22d53cc
initial coommit & test
oguzhanogreden Mar 2, 2020
47a14fa
Initial commit & test
oguzhanogreden Mar 2, 2020
fbbc745
Third case, not implementing
oguzhanogreden Mar 2, 2020
5cd5d65
replace values list with scalar
oguzhanogreden Mar 4, 2020
734c750
backfill case
oguzhanogreden Mar 4, 2020
becbc09
Add replace_single and regex cases
oguzhanogreden Mar 7, 2020
ae20f64
Not implementing inplace in light of #16529.
oguzhanogreden Mar 8, 2020
4fa5f11
Clean unnecessary errors
oguzhanogreden Mar 8, 2020
4d3ec7e
remove unused inplace parameters/arguments
oguzhanogreden Mar 8, 2020
1737372
remove filter from _replace_single() method()
oguzhanogreden Mar 8, 2020
554ce48
Remove unnecessary comment
oguzhanogreden Mar 8, 2020
09f3a15
Add documentation to replace_list
oguzhanogreden Mar 8, 2020
f81efc1
Fix import
oguzhanogreden Mar 8, 2020
e1711d8
Address minor comments - i
oguzhanogreden Mar 8, 2020
1ebc201
Revert moving imports to import section
oguzhanogreden Mar 8, 2020
f93adb8
Minor comments and parametrized tests
oguzhanogreden Mar 8, 2020
0475c86
Raise NotImplemented for Categorical- and MultiIndex
oguzhanogreden Mar 11, 2020
70896a7
commit test code
oguzhanogreden Jun 19, 2020
46c4712
reuse code & add tests
oguzhanogreden Jun 19, 2020
786208b
Again, add type ignore
oguzhanogreden Jun 20, 2020
6dc5f8b
Move type ignore to correct line...
oguzhanogreden Jun 20, 2020
1512edf
add docstrings
oguzhanogreden Jul 18, 2020
241721a
Merge remote-tracking branch 'upstream/master' into index-replace-bck
oguzhanogreden Jul 18, 2020
be1d0ac
I had removed defaults by mistake, added them again
oguzhanogreden Jul 19, 2020
aeb2759
Add defaults to multiindex as well
oguzhanogreden Jul 19, 2020
db58359
Update whatsnew
oguzhanogreden Jul 20, 2020
a6e01eb
Merge remote-tracking branch 'upstream/master' into index-replace-bck
oguzhanogreden Jul 21, 2020
0970112
Move shared docs to common.shared_docs
oguzhanogreden Jul 27, 2020
1e2efc3
Merge remote-tracking branch 'upstream/master' into index-replace-bck
oguzhanogreden Jul 27, 2020
7bdc7a8
Cache errors not caught by checks
oguzhanogreden Jul 27, 2020
46d5c5a
Merge remote-tracking branch 'upstream/master' into index-replace-bck
oguzhanogreden Aug 1, 2020
cac074f
Merge remote-tracking branch 'upstream/master' into index-replace-bck
oguzhanogreden Aug 6, 2020
ff27ce4
Merge branch 'master' into index-replace-bck
oguzhanogreden Nov 11, 2020
e315717
'public' shared_doc_kwargs
oguzhanogreden Nov 11, 2020
7406775
revert 'public' shared_doc_kwargs adjust validation
oguzhanogreden Nov 11, 2020
7b4a2b0
Move whatsnes
oguzhanogreden Nov 11, 2020
5f86494
Add replace method for categorical index
oguzhanogreden Nov 11, 2020
61b8ac5
Revert changes and a better NotImplementedError message
oguzhanogreden Nov 16, 2020
1bb05f9
Grammar fix in whatsnew
oguzhanogreden Nov 16, 2020
c955338
fix formatting
oguzhanogreden Nov 16, 2020
01ac22f
Revert unintended formatting changes in core/shared_docs.py
oguzhanogreden Nov 22, 2020
e6a4e8d
test case for change identified by the doctest & formatting by black
oguzhanogreden Nov 22, 2020
7d7e5ca
Remove stale documentation.
oguzhanogreden Nov 22, 2020
35d493a
Merge remote-tracking branch 'upstream/master' into index-replace-bck
oguzhanogreden Nov 22, 2020
b73c774
Merge branch 'master' into index-replace
oguzhanogreden Nov 27, 2020
0801abf
Merge branch 'master' of github.com:pandas-dev/pandas into index-repl…
oguzhanogreden Nov 30, 2020
800aef7
Merge branch 'master' into index-replace
oguzhanogreden Dec 13, 2020
eb784b5
Address comments and add tests
oguzhanogreden Dec 13, 2020
a23faaf
Merge branch 'index-replace' of https://github.com/oguzhanogreden/pan…
oguzhanogreden Dec 13, 2020
2a778bb
C408
oguzhanogreden Dec 13, 2020
d4ade69
Merge branch 'master' of github.com:pandas-dev/pandas into index-repl…
oguzhanogreden Jan 15, 2021
7fddde6
revert validate_unwanted_patterns
oguzhanogreden Jan 15, 2021
f4fa3d6
Move what's new and tidy up @docs
oguzhanogreden Jan 15, 2021
62bd25e
Merge branch 'master' into index-replace
oguzhanogreden Jan 18, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.1.rst
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ Other
- Bumped minimum pymysql version to 0.8.1 to avoid test failures (:issue:`38344`)
- Fixed build failure on MacOS 11 in Python 3.9.1 (:issue:`38766`)
- Added reference to backwards incompatible ``check_freq`` arg of :func:`testing.assert_frame_equal` and :func:`testing.assert_series_equal` in :ref:`pandas 1.1.0 whats new <whatsnew_110.api_breaking.testing.check_freq>` (:issue:`34050`)
- :class:`Index` and :class:`MultiIndex` now have a ``replace()`` method (:issue:`19495`).

.. ---------------------------------------------------------------------------

Expand Down
23 changes: 23 additions & 0 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,7 @@
from pandas.core.indexes.frozen import FrozenList
from pandas.core.ops import get_op_result_name
from pandas.core.ops.invalid import make_invalid_op
from pandas.core.shared_docs import _shared_docs
from pandas.core.sorting import ensure_key_mapped, nargsort
from pandas.core.strings import StringMethods

Expand All @@ -124,6 +125,7 @@
"raises_section": "",
"unique": "Index",
"duplicated": "np.ndarray",
"replace_iloc": "",
}
_index_shared_docs = {}
str_t = str
Expand Down Expand Up @@ -1536,6 +1538,27 @@ def rename(self, name, inplace=False):
"""
return self.set_names([name], inplace=inplace)

@doc(
_shared_docs["replace"],
klass=_index_doc_kwargs["klass"],
inplace=_index_doc_kwargs["inplace"],
replace_iloc=_index_doc_kwargs["replace_iloc"],
)
def replace(
self,
to_replace=None,
value=None,
limit=None,
regex=False,
method="pad",
):
new_index = self.to_series().replace(
to_replace=to_replace, value=value, limit=limit, regex=regex, method=method
)
new_index = Index(new_index)

return new_index

# --------------------------------------------------------------------
# Level-Centric Methods

Expand Down
21 changes: 21 additions & 0 deletions pandas/core/indexes/category.py
Original file line number Diff line number Diff line change
Expand Up @@ -628,3 +628,24 @@ def _delegate_method(self, name: str, *args, **kwargs):
if is_scalar(res):
return res
return CategoricalIndex(res, name=self.name)

def replace(
Copy link
Contributor Author

@oguzhanogreden oguzhanogreden Dec 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not adding @doc here since it will require a lot of conditionals due to regex being disabled. We can add it after replace() is sorted out.

self,
to_replace=None,
value=None,
limit=None,
regex=False,
method="pad",
):
if regex is not False:
raise NotImplementedError(
"Regex replace is not yet implemented for CategoricalIndex."
)

new_index = self.to_series().replace(
to_replace=to_replace, value=value, limit=limit, regex=regex, method=method
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remind me why we need to disable this?

Copy link
Contributor Author

@oguzhanogreden oguzhanogreden Nov 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise it hits #37899.

edit: That is, it hits that issue after the proposed changes due to inheritance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So some cases will hit the same bug that affects Series.replace and other cases will work fine (assuming we don't raise here)? If so, then we're better off matching Series behavior and allowing the subset of cases that do work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disabled regex= due to #38447 . Rest is enabled and added tests are added.


new_index = CategoricalIndex(new_index)

return new_index
24 changes: 24 additions & 0 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@
from pandas.core.indexes.frozen import FrozenList
from pandas.core.indexes.numeric import Int64Index
from pandas.core.ops.invalid import make_invalid_op
from pandas.core.shared_docs import _shared_docs
from pandas.core.sorting import (
get_group_index,
indexer_from_factorized,
Expand Down Expand Up @@ -3776,6 +3777,29 @@ def isin(self, values, level=None):
__abs__ = make_invalid_op("__abs__")
__inv__ = make_invalid_op("__inv__")

@doc(
_shared_docs["replace"],
klass=_index_doc_kwargs["klass"],
inplace=_index_doc_kwargs["inplace"],
replace_iloc=_index_doc_kwargs["replace_iloc"],
)
def replace(
self,
to_replace=None,
value=None,
limit=None,
regex=False,
method="pad",
):
names = self.names

result = self.to_frame().replace(
to_replace=to_replace, value=value, limit=limit, regex=regex, method=method
)
new_multi_index = self.from_frame(result, names=names)

return new_multi_index


def _lexsort_depth(codes: List[np.ndarray], nlevels: int) -> int:
"""Count depth (up to a maximum of `nlevels`) with which codes are lexsorted."""
Expand Down
9 changes: 9 additions & 0 deletions pandas/tests/frame/methods/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -1637,6 +1637,15 @@ def test_replace_unicode(self):
expected = DataFrame({"positive": np.ones(3)})
tm.assert_frame_equal(result, expected)

def test_replace_multiple_bool_datetime_type_mismatch(self):
# See https://github.com/pandas-dev/pandas/pull/32542#discussion_r528338117
df = DataFrame({"A": [True, False, True], "B": [False, True, False]})

result = df.replace({"a string": "new value", True: False})
expected = DataFrame({"A": [False, False, False], "B": [False, False, False]})

tm.assert_frame_equal(result, expected)

def test_replace_bytes(self, frame_or_series):
# GH#38900
obj = frame_or_series(["o"]).astype("|S")
Expand Down
77 changes: 77 additions & 0 deletions pandas/tests/indexes/base_class/test_replace.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import pytest

import pandas as pd
import pandas._testing as tm


@pytest.mark.parametrize(
"index, to_replace, value, expected",
[
([1, 2, 3], [1, 3], ["a", "c"], ["a", 2, "c"]),
([1, 2, 3], 1, "a", ["a", 2, 3]),
(
[1, None, 2],
[1, 2],
"a",
["a", None, "a"],
),
],
)
def test_index_replace(index, to_replace, value, expected):
index = pd.Index(index)
expected = pd.Index(expected)

result = index.replace(to_replace=to_replace, value=value)

tm.assert_equal(result, expected)


@pytest.mark.parametrize(
"index, to_replace, value, regex, expected",
[
(
["bat", "foo", "baait", "bar"],
r"^ba.$",
"new",
True,
["new", "foo", "baait", "new"],
),
(
["bat", "foo", "baait", "bar"],
None,
None,
{r"^ba.$": "new", "foo": "xyz"},
["new", "xyz", "baait", "new"],
),
],
)
def test_index_replace_regex(index, to_replace, value, regex, expected):
index = pd.Index(index)
expected = pd.Index(expected)

result = index.replace(to_replace=to_replace, value=value, regex=regex)
tm.assert_equal(expected, result)


def test_index_replace_dict_and_value():
index = pd.Index([1, 2, 3])

msg = "Series.replace cannot use dict-like to_replace and non-None value"
with pytest.raises(ValueError, match=msg):
index.replace({1: "a", 3: "c"}, "x")


def test_index_replace_bfill():
index = pd.Index([0, 1, 2, 3, 4])
expected = pd.Index([0, 3, 3, 3, 4])

result = index.replace([1, 2], method="bfill")
tm.assert_equal(expected, result)


def test_index_name_preserved():
index = pd.Index(range(2), name="foo")
expected = pd.Index([0, 0], name="foo")

result = index.replace(1, 0)
tm.assert_equal(expected, result)
55 changes: 55 additions & 0 deletions pandas/tests/indexes/categorical/test_replace.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import pytest

import pandas as pd
import pandas._testing as tm


@pytest.mark.parametrize(
"index, to_replace, value, expected",
[
([1, 2, 3], 3, "a", [1, 2, "a"]),
(
[1, None, 2],
[1, 2],
"a",
["a", None, "a"],
),
],
)
def test_categorical_index_replace(index, to_replace, value, expected):
index = pd.CategoricalIndex(index)
expected = pd.CategoricalIndex(expected)

result = index.replace(to_replace=to_replace, value=value)

tm.assert_equal(result, expected)


def test_categorical_index_replace_dict_and_value():
index = pd.CategoricalIndex([1, 2, 3])

msg = "Series.replace cannot use dict-like to_replace and non-None value"
with pytest.raises(ValueError, match=msg):
index.replace({1: "a", 3: "c"}, "x")


@pytest.mark.parametrize(
"index, to_replace, value, expected",
[
([1, 2, 3], [2, 3], ["b", "c"], [1, "b", "c"]),
([1, 2, 3], 3, "c", [1, 2, "c"]),
(
[1, None, 2],
[1, 2],
"a",
["a", None, "a"],
),
],
)
def test_index_replace(index, to_replace, value, expected):
index = pd.CategoricalIndex(index)
expected = pd.CategoricalIndex(expected)

result = index.replace(to_replace=to_replace, value=value)

tm.assert_equal(result, expected)
70 changes: 70 additions & 0 deletions pandas/tests/indexes/multi/test_replace.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
import pytest

import pandas as pd
import pandas._testing as tm


@pytest.mark.parametrize(
"names, arrays, to_replace, value, expected_arrays",
[
(
[None, None],
[[1, 1, 2, 2], ["red", "blue", "red", "blue"]],
[1, "red"],
[0, "black"],
[[0, 0, 2, 2], ["black", "blue", "black", "blue"]],
),
# names should be preserved
(
["digits", "colors"],
[[1, 1, 2, 2], ["red", "blue", "red", "blue"]],
1,
0,
[[0, 0, 2, 2], ["red", "blue", "red", "blue"]],
),
(
[None, None],
[[1, 1, 2, 2], ["red", "blue", "red", "blue"]],
1,
0,
[[0, 0, 2, 2], ["red", "blue", "red", "blue"]],
),
(
[None, None],
[[1, 1, 2, 2], ["red", "blue", "red", "blue"]],
[1, 2],
0,
[[0, 0, 0, 0], ["red", "blue", "red", "blue"]],
),
(
[None, None],
[[1, 1, 2, 2], ["red", "blue", "red", "blue"]],
[1, 2],
0,
[[0, 0, 0, 0], ["red", "blue", "red", "blue"]],
),
# nested dicts
(
["digits", "colors"],
[[1, 1, 2, 2], ["red", "blue", "red", "blue"]],
{"digits": {1: 0}, "colors": {"red": "black"}},
None,
[[0, 0, 2, 2], ["black", "blue", "black", "blue"]],
),
# dicts and value
(
["digits", "colors"],
[[1, 1, 2, 2], ["red", "blue", "red", "blue"]],
{"digits": [1], "colors": ["red", "blue"]},
"x",
[["x", "x", 2, 2], ["x", "x", "x", "x"]],
),
],
)
def test_multi_index_replace(names, arrays, to_replace, value, expected_arrays):
multi_index = pd.MultiIndex.from_arrays(arrays, names=names)
expected = pd.MultiIndex.from_arrays(expected_arrays, names=names)

result = multi_index.replace(to_replace=to_replace, value=value)

tm.assert_equal(result, expected)