Skip to content

REGR: replace with multivalued regex raising #40604

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 25, 2021

Conversation

mzeitlin11
Copy link
Member

Issue comes from the first pair in the regex arg hitting the simple replace path, which causes the ObjectBlock to be split. From here, the mask indexing will be wrong, but that will not cause an error unless a later pair hits _replace_regex, which actually uses the mask.

@mzeitlin11 mzeitlin11 changed the title Bug/replace list raising REGR: replace with multivalued regex raising Mar 24, 2021
@mzeitlin11 mzeitlin11 added Internals Related to non-user accessible pandas implementation Regression Functionality that used to work in a prior pandas version replace replace method labels Mar 24, 2021
# is not indexable
m = masks[i][
mask_pos : mask_pos + blk.shape[0]
] # type: ignore[index]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think if you define mib = masks[i] or something that might make this type:ignore unnecessary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call!

if blk.ndim == 1:
m = masks[i]
else:
# GH-39338: _replace_coerce can split a block, so we
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it gets split, will it be all-single-column?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question will look into that, would help simplify logic

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adjusted to account for this - split is happening in split_and_operate, which looks to always give single column result

@jreback jreback added this to the 1.2.4 milestone Mar 25, 2021
@jreback jreback merged commit 0354620 into pandas-dev:master Mar 25, 2021

# GH-39338: _replace_coerce can split a block into
# single-column blocks, so track the index so we know
# where to index into the mask
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be hard to backport as this code has changed a lot (though possible this particular section has not).

@jreback
Copy link
Contributor

jreback commented Mar 25, 2021

thanks @mzeitlin11

@jreback
Copy link
Contributor

jreback commented Mar 25, 2021

@meeseeksdev backport 1.2.x

@lumberbot-app

This comment has been minimized.

@mzeitlin11 mzeitlin11 deleted the bug/replace_list_raising branch March 25, 2021 02:07
simonjayhawkins pushed a commit to simonjayhawkins/pandas that referenced this pull request Mar 26, 2021
jreback pushed a commit that referenced this pull request Mar 26, 2021
vladu pushed a commit to vladu/pandas that referenced this pull request Apr 5, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Internals Related to non-user accessible pandas implementation Regression Functionality that used to work in a prior pandas version replace replace method
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: replace fails with IndexError when regex parameter is passed a dictionary with multiple items
4 participants