Skip to content

ENH: Unhelpful output from assert_frame_equal when indexes differ and check_like=True #37478

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amilbourne opened this issue Oct 29, 2020 · 0 comments · Fixed by #37479
Closed
Labels
Enhancement Testing pandas testing functions or related to the test suite
Milestone

Comments

@amilbourne
Copy link
Contributor

Problem:

Calling testing.assert_frame_equal with mismatched indexes and check_like=True generates unhelpful output.

If you run:

import pandas as pd
df1 = pd.DataFrame({"A": [1.0, 2.0, 3.0], "B": [4.0, 5.0, 6.0]}, index=["a", "b", "c"])
df2 = pd.DataFrame({"A": [1.0, 2.0, 3.0], "B": [4.0, 5.0, 6.0]}, index=["a", "b", "d"])
pd.testing.assert_frame_equal(df1, df2, check_like=True)

The output will be:

AssertionError: DataFrame.iloc[:, 0] (column name="A") are different

DataFrame.iloc[:, 0] (column name="A") values are different (33.33333 %)
[index]: [a, b, d]
[left]:  [1.0, 2.0, nan]
[right]: [1.0, 2.0, 3.0]

The data of the input DataFrames are not actually different (there is no nan), but when check_like=True the code calls left.reindex_like(right) before comparing indexes (and columns), in order to ensure that both frames are ordered the same.
However, if the indexes contain different values (rather than the same values in a different order),
the reindex_like function fills the data values (row or column) for the mismatched index entries with NaNs.
This results in the subsequent index checks passing, but the assert_frame_equals function failing
with a data not equal error (as above).

Even more confusingly, if the values being compared are not floats then you get a dtype not equal error:

AssertionError: Attributes of DataFrame.iloc[:, 0] (column name="A") are different

Attribute "dtype" are different
[left]:  float64
[right]: int64

These messages are quite unhelpful, as the mismatch is in the index, and the error should logically be the same as you would get if you ran with check_like=False.

Applies to:

The code above was run against the latest code from master.

>>> print(pd.__version__)
1.2.0.dev0+950.gd321be6

Solution:

The message for the above assertion failure should be something like:

AssertionError: DataFrame.index are different

DataFrame.index values are different (33.33333 %)
[left]:  Index(['a', 'b', 'c'], dtype='object')
[right]: Index(['a', 'b', 'd'], dtype='object')

Which is what you get if you run with check_like=False.

@amilbourne amilbourne added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 29, 2020
@jreback jreback added this to the 1.2 milestone Oct 31, 2020
@jreback jreback added Testing pandas testing functions or related to the test suite and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants