Skip to content

DataFrame.corr(method="kendall") calculation is slow #28329

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dsaxton opened this issue Sep 6, 2019 · 1 comment · Fixed by #39132
Closed

DataFrame.corr(method="kendall") calculation is slow #28329

dsaxton opened this issue Sep 6, 2019 · 1 comment · Fixed by #39132
Assignees
Labels
Performance Memory or execution speed performance
Milestone

Comments

@dsaxton
Copy link
Member

dsaxton commented Sep 6, 2019

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(1000, 300))

df.corr(method="kendall")
# 21.6 s ± 686 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

DataFrame.corr(method="kendall") doesn't scale particularly well, perhaps because it's the only named correlation method that isn't Cythonized at the moment (we just call kendalltau from scipy repeatedly in a Python for loop: https://github.com/pandas-dev/pandas/blob/master/pandas/core/frame.py#L7454). It may be worthwhile to try to implement something more efficient within _libs/algos.pyx.

Relevant discussion: #28151

@WillAyd WillAyd added the Performance Memory or execution speed performance label Sep 7, 2019
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Sep 10, 2019
@lithomas1
Copy link
Member

take

@jreback jreback modified the milestones: Contributions Welcome, 1.3 Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants