BUG: DataFrame.corr assume 1s are always on the diagonal line #43494
Labels
Duplicate Report
Duplicate issue or pull request
Numeric Operations
Arithmetic, Comparison, and Logical operations
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
Issue Description
pd.DataFrame.corr assume diagnoal line should always be filled with one (one vector with itself). However, for many distance measures, 0 is the result when comparing two identical vectors. I think, we should allow users to specify whether to use 1 for default value on the diagnoal line, or let the underlying custom function to decide
Expected Behavior
[[0. 0. 0.]
[0. 0. 0.]
[0. 0. 0.]]
Installed Versions
pandas : 1.3.2
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.2
pip : 21.0.1
setuptools : 52.0.0.post20210125
Cython : 0.29.24
pytest : 6.2.4
hypothesis : None
sphinx : 4.0.2
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : 0.9.3
psycopg2 : None
jinja2 : 3.0.1
IPython : 7.26.0
pandas_datareader: 0.9.0
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 2021.07.0
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.2
sqlalchemy : 1.4.22
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1
The text was updated successfully, but these errors were encountered: