Skip to content

Using corr with callable gives 1 on diagonals where the result should be NaN #25781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
fbnrst opened this issue Mar 19, 2019 · 2 comments
Closed
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@fbnrst
Copy link
Contributor

fbnrst commented Mar 19, 2019

Code Sample, a copy-pastable example if possible

import numpy as np
import pandas as pd
from scipy.stats import pearsonr

df = pd.DataFrame(np.zeros((2,2)))

df.corr(method=lambda x, y: pearsonr(x, y)[0])

gives:

     0    1
0  1.0  NaN
1  NaN  1.0

Problem description

This is a follow up on #25726. In the example above the self-correlation is returned as 1.0. However, it should actually be nan. I would argue the diagonal elements of the correlation matrix should be calculated instead of setting them to 1 here:

pandas/pandas/core/frame.py

Lines 7025 to 7026 in cb00deb

elif i == j:
c = 1.

df.corr() gives the output I expected:

Expected Output

df.corr()
    0   1
0 NaN NaN
1 NaN NaN

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.6.8.final.0 python-bits: 64 OS: Linux OS-release: 4.4.165-81-default machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.24.2
pytest: 4.1.1
pip: 18.1
setuptools: 40.6.3
Cython: 0.29.3
numpy: 1.15.4
scipy: 1.2.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: 1.8.3
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: 3.4.4
numexpr: 2.6.9
feather: None
matplotlib: 3.0.2
openpyxl: 2.4.0-b1
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: 4.3.0
bs4: 4.7.1
html5lib: 1.0.1
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@peterpanmj
Copy link
Contributor

take
I met the same problem #43494, I can come up with a solution.

@jreback jreback added this to the 1.4 milestone Sep 28, 2021
@jreback jreback added Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement and removed Bug labels Sep 28, 2021
@jreback
Copy link
Contributor

jreback commented Nov 29, 2021

closing from discussion in #43569

@jreback jreback closed this as completed Nov 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Enhancement Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants