.replace() with a dict about 60x slower than .map().fillna() #12657

aldanor · 2016-03-17T13:05:42Z

(pandas versions: 0.17.0, 0.18.0)

It seems that Series.replace is orders of magnitude slower than Series.map when called with a dict, and consumes enormous amounts of RAM:

>>> np.random.seed(0)
>>> s = pd.Series(np.random.randint(0, 10000, 1000000))
>>> r = {np.random.randint(0, 10000): np.random.randint(10000) for _ in range(1000)}
>>> assert (s.map(r).fillna(s) == s.replace(r)).all()
>>> %timeit s.replace(r)
1 loop, best of 3: 1.63 s per loop
>>> %timeit s.map(r).fillna(s)
10 loops, best of 3: 26.6 ms per loop

Memory stats not provided here but I've seen it explode (e.g. use 60+ GB RAM).

An old issue, not sure if related: #6697.

The text was updated successfully, but these errors were encountered:

jreback · 2016-03-17T13:20:07Z

yes this is a duplicate

jreback closed this as completed Mar 17, 2016

jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Duplicate Report Duplicate issue or pull request labels Mar 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.replace() with a dict about 60x slower than .map().fillna() #12657

.replace() with a dict about 60x slower than .map().fillna() #12657

aldanor commented Mar 17, 2016

jreback commented Mar 17, 2016

.replace() with a dict about 60x slower than .map().fillna() #12657

.replace() with a dict about 60x slower than .map().fillna() #12657

Comments

aldanor commented Mar 17, 2016

jreback commented Mar 17, 2016