BUG: REGRESSION: DataFrame.corr() floating point inaccuracy #45640
Labels
Algos
Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff
Bug
Regression
Functionality that used to work in a prior pandas version
Milestone
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
With pandas 1.4.0,
df.corr()
returns a matrix where the diagonal is not exactly 1 down to floating point precision.In pandas 1.3.5 the diagonal of
df.corr()
was exactly 1.The example above show the difference.
This causes issues when using the
dist = 1 - df.corr().abs()
as a distance matrix for clustering. In particular the call toscipy.spatial.distance.squareform(dist)
raises an error with pandas 1.4.0 when thedist
diagonal is not exactly 0.Expected Behavior
The diagonal of
df.corr()
should be exactly 1 down to floating point accuracyInstalled Versions
Replace this line with the output of pd.show_versions()
The text was updated successfully, but these errors were encountered: