Skip to content

Commit 00353d5

Browse files
authored
Pd.series.map performance (#34948)
1 parent aa6298f commit 00353d5

File tree

2 files changed

+8
-3
lines changed

2 files changed

+8
-3
lines changed

doc/source/whatsnew/v1.2.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -207,6 +207,7 @@ Performance improvements
207207

208208
- Performance improvements when creating Series with dtype `str` or :class:`StringDtype` from array with many string elements (:issue:`36304`, :issue:`36317`)
209209
- Performance improvement in :meth:`GroupBy.agg` with the ``numba`` engine (:issue:`35759`)
210+
- Performance improvements when creating :meth:`pd.Series.map` from a huge dictionary (:issue:`34717`)
210211
- Performance improvement in :meth:`GroupBy.transform` with the ``numba`` engine (:issue:`36240`)
211212

212213
.. ---------------------------------------------------------------------------

pandas/core/series.py

+7-3
Original file line numberDiff line numberDiff line change
@@ -357,15 +357,19 @@ def _init_dict(self, data, index=None, dtype=None):
357357
# Looking for NaN in dict doesn't work ({np.nan : 1}[float('nan')]
358358
# raises KeyError), so we iterate the entire dict, and align
359359
if data:
360-
keys, values = zip(*data.items())
361-
values = list(values)
360+
# GH:34717, issue was using zip to extract key and values from data.
361+
# using generators in effects the performance.
362+
# Below is the new way of extracting the keys and values
363+
364+
keys = tuple(data.keys())
365+
values = list(data.values()) # Generating list of values- faster way
362366
elif index is not None:
363367
# fastpath for Series(data=None). Just use broadcasting a scalar
364368
# instead of reindexing.
365369
values = na_value_for_dtype(dtype)
366370
keys = index
367371
else:
368-
keys, values = [], []
372+
keys, values = tuple([]), []
369373

370374
# Input is now list-like, so rely on "standard" construction:
371375

0 commit comments

Comments
 (0)