Skip to content

PERF: Series.apply is slower on single element dict compared with multi elements dict #56942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
Alexia-I opened this issue Jan 18, 2024 · 1 comment
Closed
2 of 3 tasks
Labels
Apply Apply, Aggregate, Transform, Map Needs Info Clarification about behavior needed to assess issue Performance Memory or execution speed performance

Comments

@Alexia-I
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this issue exists on the latest version of pandas.

  • I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

Hello, I noticed that applying the apply function to a Series with a single element dictionary takes more time than its counterpart with a multi-element dictionary. I'm curious if this is due to something I did wrong.

import pandas as pd
import timeit
import random
import string
import time

# Create dictionary inputs
single_pair_dict = {'a': 42}
# Create a dictionary containing 10 key-value pairs
multi_pair_dict = {letter: random.randint(1, 100) for letter in string.ascii_lowercase[:10]}

# Self-defined apply function
def complex_operation(x):
    return x * x - x + 42

n = 10000
# Create Series and test the execution time of apply() method
times = time.time()
for i in range(n):
    pd.Series(single_pair_dict).apply(complex_operation)
time_now_single = time.time() - times

timem = time.time()
for i in range(n):
    pd.Series(multi_pair_dict).apply(complex_operation)
time_now_multi = time.time() - timem

# Print the results
print(f"Time for apply() on Series with a single key-value pair: {time_now_single} seconds")
print(f"Time for apply() on Series with multiple key-value pairs: {time_now_multi} seconds")
Time for apply() on Series with single key-value pair: 1.1293659210205078 seconds
Time for apply() on Series with multiple key-value pairs: 1.0656580924987793 seconds

Installed Versions

INSTALLED VERSIONS

commit : d4c8d82
python : 3.9.18.final.0
python-bits : 64
OS : Linux
OS-release : 5.15.0-88-generic
Version : #98~20.04.1-Ubuntu SMP Mon Oct 9 16:43:45 UTC 2023
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.0rc0
numpy : 1.26.3
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : 3.0.8
pytest : None
hypothesis : None
...
zstandard : None
tzdata : 2023.4
qtpy : None
pyqt5 : None

Prior Performance

No response

@Alexia-I Alexia-I added Needs Triage Issue that has not been reviewed by a pandas team member Performance Memory or execution speed performance labels Jan 18, 2024
@rhshadrach
Copy link
Member

rhshadrach commented Jan 18, 2024

Thanks for the report, I can't reproduce on main. Can you reliably reproduce the results you posted?

Time for apply() on Series with a single key-value pair: 0.777146577835083 seconds
Time for apply() on Series with multiple key-value pairs: 0.80838942527771 seconds

Also confirmed using timeit on just the apply operation:

# Create dictionary inputs
single_pair_dict = {'a': 42}
# Create a dictionary containing 10 key-value pairs
multi_pair_dict = {letter: random.randint(1, 100) for letter in string.ascii_lowercase[:10]}

# Self-defined apply function
def complex_operation(x):
    return x * x - x + 42

ser1 = pd.Series(single_pair_dict)
%timeit ser1.apply(complex_operation)
# 25 µs ± 94.4 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

ser2 = pd.Series(multi_pair_dict)
%timeit ser2.apply(complex_operation)
# 26.8 µs ± 93.7 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

@rhshadrach rhshadrach added Needs Info Clarification about behavior needed to assess issue Apply Apply, Aggregate, Transform, Map and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Needs Info Clarification about behavior needed to assess issue Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

2 participants