ENH: Add `replace` method to `Index` #32542

oguzhanogreden · 2020-03-08T14:03:02Z

closes [Feature Request] Add replace method to Index objects #19495
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry
docstring for the new Index.replace() method
won't do: type hints
merge with ENH: commit message

Added a replace method to Index classes, as well as tests.

ShaharNaveh

@oguzhanogreden Thank you so much for taking this!

Overall this looks really good!

Some minor comments :)

pandas/core/indexes/base.py

pandas/tests/indexes/base_class/test_replace.py

pandas/core/indexes/base.py

ShaharNaveh · 2020-03-08T14:56:56Z

Also a question, can any of the new added tests be parameteraize?

oguzhanogreden · 2020-03-08T17:17:37Z

Thanks @MomIsBestFriend, helpful indeed. I'll be waiting for a green flag from the core before I turn to docstrings and type annotations. Mainly since they take a lot of time... 😇 namely (1) figuring out the mechanics of docstrings for different subclasses and (2) correct types for type annotations.

ShaharNaveh · 2020-03-08T20:07:29Z

@oguzhanogreden ATM you can ignore the CI failure of CI/Web And Docs, we haven't yet resolved this issue. so you can ignore it. (for now at least)

Just be sure to merge master every now and then :)

datapythonista

Did you check if your implementation is faster than:

import pandas

def replace(self, *args, **kwargs):
    return pandas.Index(self.to_series().replace(*args, **kwargs))
pandas.Index.replace = replace

idx = pandas.Index([1, 2, 3, 4, 5, 6])
idx.replace(3, 9)

Are there cases when converting to Series is a problem?

Seems like a huge complexity added that we can hopefully avoid.

oguzhanogreden · 2020-03-11T18:25:32Z

Regarding the speed point, this seems faster. At the very least, faster more often. Be warned I'm not on top of what's going on %timeit mechanics. I chose the arguments below since they seemed reasonable from a measurement perspective. If there's a gotcha about performance, do let me know:

# Running on a MacBook Pro from 2017, under some use load
# (browsers with many tabs are open etc.)

import numpy as np

import pandas as pd

idx = pd.Index([1, 2, 3, 4, 5, 6])
%timeit -n 10000 -r 50 idx.replace(3, 9)

# 69.9 µs ± 6.67 µs per loop (mean ± std. dev. of 50 runs, 10000 loops each)

def replace(self, *args, **kwargs):
    return pd.Index(self.to_series().replace(*args, **kwargs))
pd.Index.replace = replace

idx = pd.Index([1, 2, 3, 4, 5, 6])
%timeit -n 10000 -r 50 idx.replace(3, 9)

# 348 µs ± 102 µs per loop (mean ± std. dev. of 50 runs, 10000 loops each)
# ^^^ couldn't get higher precision here, also not a much lower point estimate

Happy to move this to asv if the output is considerably more reliable.

oguzhanogreden · 2020-03-11T18:38:28Z

Regarding the complexity point - agreed. Though I think there's some benefit to having these methods available. I found myself in a situation where I "just wanted to be able to do what I know from other parts of the pandas API". It's up to you folks to judge that against maintenance burden, ultimately.

I'd be interested in a contributing to a refactor around these methods. However I can imagine it'll be pretty tricky to abstract things away. Considering the nitty-gritty involved...

I'll survey properly whether i .to_series() fails to solve anything in the coming days.

Finally, I just added a few NotImplementedErrors where this method, as it stands, shouldn't return anything. Eventually I'm happy to address those, at a cost of some more complexity, if the overall judgement is these are useful to have upstream.

datapythonista · 2020-03-11T19:37:43Z

Just to be clear, my proposal was not that you use that code (with the to_series()) in your own code. But that the implementation in this PR use it instead of all the code you're adding.

So, you'll still get the same functionality in pandas Index.replace(), but instead of adding 300 lines of code here, you'll be adding 30. But seems like the difference in performance is quite big.

oguzhanogreden · 2020-03-11T19:57:04Z

Thanks. That was clear. I made the point about complexity since it's not clear to me how to weigh the performance difference against growing the code base so much :)

I suggest we follow up once I do the following:

check if to_series() can handle most of the requirements,
try and identify the cause of the performance issue, see if I can address and wrap .to_series() to create a solution,

jbrockmendel · 2020-04-23T18:35:15Z

I think we'd be much better off with

def replace(self, whatever):
    result = self.to_series().replace(whatever)
    return Index(result)

mroeschke · 2020-05-23T04:44:58Z

@oguzhanogreden do you have time to merge in master and adapt the implementation @jbrockmendel proposed? Would be beneficial to have a shared implementation to reduce code maintenance.

oguzhanogreden · 2020-05-26T08:29:03Z

Sorry, I didn't get to prioritize this for a while. I'll try to get to it in a few days, with the suggested solution.

jbrockmendel · 2020-11-16T21:56:45Z

pandas/core/indexes/category.py

+    ):
+        raise NotImplementedError(
+            "Replacing values of a CategoricalIndex is not supported."
+        )


remind me why we need to disable this?

Otherwise it hits #37899.

edit: That is, it hits that issue after the proposed changes due to inheritance.

What do you think?

So some cases will hit the same bug that affects Series.replace and other cases will work fine (assuming we don't raise here)? If so, then we're better off matching Series behavior and allowing the subset of cases that do work.

I disabled regex= due to #38447 . Rest is enabled and added tests are added.

pandas/core/shared_docs.py

oguzhanogreden · 2020-11-22T13:47:29Z

pandas/core/shared_docs.py

+
+    >>> df = pd.DataFrame({{'A': [True, False, True],
+    ...                    'B': [False, True, False]}})
+    >>> df.replace({{'a string': 'new value', True: False}})  # raises


This is one of the failing doctests.

L578 doesn't raise an error anymore. The behavior seems to have changed in upstream master as well, I'll change the docstring and add a test case to fix behavior prospectively.

See #32542 (comment)

…ace-bck

oguzhanogreden · 2020-11-30T19:42:58Z

(I can't replicate the build errors locally.)

into index-replace-bck

oguzhanogreden · 2020-12-13T14:05:19Z

pandas/core/indexes/category.py

@@ -655,3 +655,24 @@ def _delegate_method(self, name: str, *args, **kwargs):
        if is_scalar(res):
            return res
        return CategoricalIndex(res, name=self.name)
+
+    def replace(


Not adding @doc here since it will require a lot of conditionals due to regex being disabled. We can add it after replace() is sorted out.

jreback · 2020-12-13T17:47:28Z

@oguzhanogreden can you do a pre-cursor PR that moves things and doesn't change anything (except for imports and so on), no functionailty at all. Its almost impossible to tell what you are changing otherwise.

…ace-bck

oguzhanogreden · 2021-01-16T08:43:10Z

Now that #38561 is done, what's going here is a bit clearer (@jreback). Failing test is also failing on master, not related to changes here.

Let me try to summarise to help with review:

The key change is the addition of .replace() method to Index classes.
pd.Categorical does not allow regex replaces (see ENH: regex replace capacities are missing from pd.Categorical.replace() #38447), therefore we're disabling it.
Due to the same reason, I have not added documentation to CategoricalIndex. This helps keep shared docs simple (i.e. without a conditional for regex stuff). Once the issues with categorical are solved, we can just add the decorator and it's good to go. I'll make an issue and reference it where needed, once this is merged.

Thanks for your patience here! It took too long in part due to my wrong initial approach (huge PR) and in part due to my absence from last March. Looking forward to getting this out of the door and paying back the investment ;)

…andas-dev#38561)

jreback · 2021-02-11T00:36:27Z

ok can you merge master and will look

mroeschke · 2021-04-11T00:29:52Z

This was looking pretty close but looks like this PR has gotten stale. Going to close but if interested in continuing please ping, merge master, and target this PR for the next release (1.3 as of writing this comment).

oguzhanogreden changed the title ~~Index replace~~ ENH: Add replace method to Index Mar 8, 2020

ShaharNaveh suggested changes Mar 8, 2020

View reviewed changes

datapythonista reviewed Mar 10, 2020

View reviewed changes

datapythonista added Enhancement Index Related to the Index class or subclasses labels Mar 10, 2020

oguzhanogreden added 15 commits June 19, 2020 14:59

initial coommit & test

22d53cc

Initial commit & test

47a14fa

Third case, not implementing

fbbc745

replace values list with scalar

5cd5d65

backfill case

734c750

Add replace_single and regex cases

becbc09

Not implementing inplace in light of #16529.

ae20f64

Clean unnecessary errors

4fa5f11

remove unused inplace parameters/arguments

4d3ec7e

remove filter from _replace_single() method()

1737372

Remove unnecessary comment

554ce48

Add documentation to replace_list

09f3a15

Fix import

f81efc1

Address minor comments - i

e1711d8

Revert moving imports to import section

1ebc201

oguzhanogreden added 2 commits November 16, 2020 21:45

Grammar fix in whatsnew

1bb05f9

fix formatting

c955338

jbrockmendel reviewed Nov 16, 2020

View reviewed changes

pandas/core/shared_docs.py Outdated Show resolved Hide resolved

Revert unintended formatting changes in core/shared_docs.py

01ac22f

oguzhanogreden commented Nov 22, 2020

View reviewed changes

oguzhanogreden added 5 commits November 22, 2020 14:56

test case for change identified by the doctest & formatting by black

e6a4e8d

Remove stale documentation.

7d7e5ca

See #32542 (comment)

Merge remote-tracking branch 'upstream/master' into index-replace-bck

35d493a

Merge branch 'master' into index-replace

b73c774

Merge branch 'master' of github.com:pandas-dev/pandas into index-repl…

0801abf

…ace-bck

oguzhanogreden mentioned this pull request Dec 13, 2020

ENH: regex replace capacities are missing from pd.Categorical.replace() #38447

Open

oguzhanogreden added 3 commits December 13, 2020 14:57

Merge branch 'master' into index-replace

800aef7

Address comments and add tests

eb784b5

Merge branch 'index-replace' of https://github.com/oguzhanogreden/pandas

a23faaf

into index-replace-bck

oguzhanogreden commented Dec 13, 2020

View reviewed changes

C408

2a778bb

oguzhanogreden mentioned this pull request Dec 18, 2020

Move docstring of NDFrame.replace in preparation of #32542 #38561

Merged

jreback pushed a commit that referenced this pull request Dec 22, 2020

Move docstring of NDFrame.replace in preparation of #32542 (#38561)

ad4850b

oguzhanogreden added 3 commits January 15, 2021 21:59

Merge branch 'master' of github.com:pandas-dev/pandas into index-repl…

d4ade69

…ace-bck

revert validate_unwanted_patterns

7fddde6

Move what's new and tidy up @docs

f4fa3d6

Merge branch 'master' into index-replace

62bd25e

luckyvs1 pushed a commit to luckyvs1/pandas that referenced this pull request Jan 20, 2021

Move docstring of NDFrame.replace in preparation of pandas-dev#32542 (p…

eb6a2d1

…andas-dev#38561)

mroeschke closed this Apr 11, 2021

Uh oh!

ENH: Add replace method to Index #32542

ENH: Add replace method to Index #32542

Uh oh!

Conversation

oguzhanogreden commented Mar 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ShaharNaveh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ShaharNaveh commented Mar 8, 2020

Uh oh!

oguzhanogreden commented Mar 8, 2020

Uh oh!

ShaharNaveh commented Mar 8, 2020

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

oguzhanogreden commented Mar 11, 2020

Uh oh!

oguzhanogreden commented Mar 11, 2020

Uh oh!

datapythonista commented Mar 11, 2020

Uh oh!

oguzhanogreden commented Mar 11, 2020

Uh oh!

jbrockmendel commented Apr 23, 2020

Uh oh!

mroeschke commented May 23, 2020

Uh oh!

oguzhanogreden commented May 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jbrockmendel Nov 16, 2020

Choose a reason for hiding this comment

Uh oh!

oguzhanogreden Nov 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oguzhanogreden Nov 30, 2020

Choose a reason for hiding this comment

Uh oh!

jbrockmendel Nov 30, 2020

Choose a reason for hiding this comment

Uh oh!

oguzhanogreden Dec 13, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

oguzhanogreden Nov 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

oguzhanogreden commented Nov 30, 2020

Uh oh!

oguzhanogreden Dec 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback commented Dec 13, 2020

Uh oh!

oguzhanogreden commented Jan 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Feb 11, 2021

Uh oh!

mroeschke commented Apr 11, 2021

ENH: Add `replace` method to `Index` #32542

ENH: Add `replace` method to `Index` #32542

oguzhanogreden commented Mar 8, 2020 •

edited

Loading

oguzhanogreden commented May 26, 2020 •

edited

Loading

oguzhanogreden Nov 22, 2020 •

edited

Loading

oguzhanogreden Nov 22, 2020 •

edited

Loading

oguzhanogreden Dec 13, 2020 •

edited

Loading

oguzhanogreden commented Jan 16, 2021 •

edited

Loading