BUG: Cythonized kendall correlation implementation is incorrect #43401

zrait · 2021-09-04T16:15:29Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Problem description

The Cythonized implementation of Kendall correlation implemented here has been completely broken in the case of tied values since its original commit.

Scipy (and most other stats pacakges) implement the b variant of Kendall's tau, not the original version. Rank based correlation algorithms absolutely need to handle ties correctly as ordinal data often includes repeated values in some columns.

Even if we wanted to implement the "a variant" of Kendall's tau (which would be a breaking change from the implementation prior to the Cythonization diff), the correct code would disregard ties entirely, whereas the current implementation is including them in its calculations.

It's non-trivial to implement a Cythonized version of the correct code with tie-handling here, and I think that the Cythonized version should be reverted.

Code Sample, a copy-pastable example

df = pd.DataFrame([[1, 2], [2, 1], [1, 3]])
df[0].corr(df[1],method='kendall') == df.corr(method='kendall')[0][1]
# output
False

Expected Output

True

Change to test case that catches this bug

zrait@94a0eea is a simple change to the correlation test case that ensures that one of the columns used has repeated values and fails with the current implementation but not if the Cythonization is reverted.

More generally, I think the float test data should be updated to include duplicates, as a danger of randomly generated float data is the fact that it won't naturally include any duplicates.

The text was updated successfully, but these errors were encountered:

zrait added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 4, 2021

zrait mentioned this issue Sep 4, 2021

Revert Cythonized Kendall implementation and improve test case to prevent regressions #43403

Merged

4 tasks

simonjayhawkins added this to the 1.3.3 milestone Sep 5, 2021

jreback closed this as completed in #43403 Sep 6, 2021

zrait mentioned this issue Sep 6, 2021

Backport PR #43403: Revert Cythonized Kendall implementation and improve test case to prevent regressions #43431

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Cythonized kendall correlation implementation is incorrect #43401

BUG: Cythonized kendall correlation implementation is incorrect #43401

zrait commented Sep 4, 2021 •

edited

Loading

BUG: Cythonized kendall correlation implementation is incorrect #43401

BUG: Cythonized kendall correlation implementation is incorrect #43401

Comments

zrait commented Sep 4, 2021 • edited Loading

Problem description

Code Sample, a copy-pastable example

Expected Output

Change to test case that catches this bug

zrait commented Sep 4, 2021 •

edited

Loading