Skip to content

df.corr Spearman method performance #28366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Hiill opened this issue Sep 10, 2019 · 2 comments
Closed

df.corr Spearman method performance #28366

Hiill opened this issue Sep 10, 2019 · 2 comments
Labels
Duplicate Report Duplicate issue or pull request

Comments

@Hiill
Copy link

Hiill commented Sep 10, 2019

Code Sample

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(1000, 300))

#Using Spearman method: On my computer about 12.19 seconds
df.corr(method = 'spearman')

#Using detour: On my computer about 0.72 seconds
rank_df = df.rank()
rank_df.corr(method = 'pearson')

Problem description

I found that when I calculate Spearman correlation via the detour method it's way faster. The output (correlation coefficients) differs slightly (minimal) when using data containing nan-values. I'd like to understand this difference in output, and maybe the Spearman method can be faster by re-using the current Pearson implementation or something.

@Liam3851
Copy link
Contributor

Duplicate of #28139, should have been fixed in #28151.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Sep 10, 2019

Thanks for the report @Hiill, and thanks @Liam3851 for noting the other issue. I can confirm that on master the difference is now much smaller.

Closing as a duplicate of #28139 then.

@jorisvandenbossche jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Sep 10, 2019
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Sep 10, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants