Skip to content

PERF: avoid copy in replace #34737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 14, 2020
Merged

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Jun 12, 2020

Closes #34136.

Hopefully this preserves the right behavior. I could imagine breaking something if a caller was relying on putmask(., inplace=False) returning a copy.

import pandas as pd
import numpy as np

df = pd.DataFrame({"A": 0, "B": 0}, index=range(4 * 10 ** 7))

# the 1 can be held in self._df.blocks[0], while the inf and -inf cant
%timeit df.replace([np.inf, -np.inf, 1], np.nan, inplace=False)



# 1.0.3
483 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# master
900 ms ± 18.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# PR
490 ms ± 8.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

@TomAugspurger TomAugspurger added the Performance Memory or execution speed performance label Jun 12, 2020
@TomAugspurger TomAugspurger added this to the 1.1 milestone Jun 12, 2020
@jreback jreback merged commit 74a77b3 into pandas-dev:master Jun 14, 2020
@jreback
Copy link
Contributor

jreback commented Jun 14, 2020

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance regression in replace.ReplaceList.time_replace_list_one_match
2 participants