df.corr Spearman method performance #28366

Hiill · 2019-09-10T08:33:28Z

Code Sample

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(1000, 300))

#Using Spearman method: On my computer about 12.19 seconds
df.corr(method = 'spearman')

#Using detour: On my computer about 0.72 seconds
rank_df = df.rank()
rank_df.corr(method = 'pearson')

Problem description

I found that when I calculate Spearman correlation via the detour method it's way faster. The output (correlation coefficients) differs slightly (minimal) when using data containing nan-values. I'd like to understand this difference in output, and maybe the Spearman method can be faster by re-using the current Pearson implementation or something.

Liam3851 · 2019-09-10T13:13:38Z

Duplicate of #28139, should have been fixed in #28151.

jorisvandenbossche · 2019-09-10T18:22:10Z

Thanks for the report @Hiill, and thanks @Liam3851 for noting the other issue. I can confirm that on master the difference is now much smaller.

Closing as a duplicate of #28139 then.

jorisvandenbossche closed this as completed Sep 10, 2019

jorisvandenbossche added the Duplicate Report Duplicate issue or pull request label Sep 10, 2019

jorisvandenbossche added this to the No action milestone Sep 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

df.corr Spearman method performance #28366

df.corr Spearman method performance #28366

Hiill commented Sep 10, 2019 •

edited

Loading

Liam3851 commented Sep 10, 2019

jorisvandenbossche commented Sep 10, 2019 •

edited

Loading

df.corr Spearman method performance #28366

df.corr Spearman method performance #28366

Comments

Hiill commented Sep 10, 2019 • edited Loading

Code Sample

Problem description

Liam3851 commented Sep 10, 2019

jorisvandenbossche commented Sep 10, 2019 • edited Loading

Hiill commented Sep 10, 2019 •

edited

Loading

jorisvandenbossche commented Sep 10, 2019 •

edited

Loading