Skip to content

.replace() with a dict about 60x slower than .map().fillna() #12657

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aldanor opened this issue Mar 17, 2016 · 1 comment
Closed

.replace() with a dict about 60x slower than .map().fillna() #12657

aldanor opened this issue Mar 17, 2016 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@aldanor
Copy link
Contributor

aldanor commented Mar 17, 2016

(pandas versions: 0.17.0, 0.18.0)

It seems that Series.replace is orders of magnitude slower than Series.map when called with a dict, and consumes enormous amounts of RAM:

>>> np.random.seed(0)
>>> s = pd.Series(np.random.randint(0, 10000, 1000000))
>>> r = {np.random.randint(0, 10000): np.random.randint(10000) for _ in range(1000)}
>>> assert (s.map(r).fillna(s) == s.replace(r)).all()
>>> %timeit s.replace(r)
1 loop, best of 3: 1.63 s per loop
>>> %timeit s.map(r).fillna(s)
10 loops, best of 3: 26.6 ms per loop

Memory stats not provided here but I've seen it explode (e.g. use 60+ GB RAM).

An old issue, not sure if related: #6697.

@jreback
Copy link
Contributor

jreback commented Mar 17, 2016

yes this is a duplicate

@jreback jreback closed this as completed Mar 17, 2016
@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Duplicate Report Duplicate issue or pull request labels Mar 17, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

2 participants