-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pd.compare does not recognize differences when comparing values with null Int64 data type #48966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: pd.compare does not recognize differences when comparing values with null Int64 data type #48966
Changes from 4 commits
f027ff9
0f3fcef
9c0a19b
0b5ec5c
d72e0e3
4412e33
9b97320
ffbb131
bae0808
4838dc0
3f739dd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -245,7 +245,39 @@ def test_compare_ea_and_np_dtype(): | |
result = df1.compare(df2, keep_shape=True) | ||
expected = pd.DataFrame( | ||
{ | ||
("a", "self"): [4.0, np.nan], | ||
("a", "self"): [4.0, 4], | ||
("a", "other"): pd.Series([1, pd.NA], dtype="Int64"), | ||
("b", "self"): np.nan, | ||
("b", "other"): np.nan, | ||
} | ||
) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
def test_compare_with_equal_null_int64_dtypes(): | ||
# GH #48939 | ||
# two int64 nulls are considered same. | ||
vamsi-verma-s marked this conversation as resolved.
Show resolved
Hide resolved
|
||
df1 = pd.DataFrame({"a": pd.Series([4.0, pd.NA], dtype="Int64"), "b": [1.0, 2]}) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @jorisvandenbossche should pd.NA be considered equal here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The docstring of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think that wasn’t written with pd.NA in mind. So we could decide differently here, if it makes sense in the overall picture There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. compare uses current behavior -
|
||
df2 = pd.DataFrame({"a": pd.Series([3, pd.NA], dtype="Int64"), "b": [1.0, 2]}) | ||
result = df1.compare(df2) | ||
expected = pd.DataFrame( | ||
{ | ||
("a", "self"): pd.Series([4], dtype="Int64"), | ||
("a", "other"): pd.Series([3], dtype="Int64"), | ||
} | ||
) | ||
tm.assert_frame_equal(result, expected) | ||
|
||
|
||
def test_compare_with_unequal_null_int64_dtypes(): | ||
# GH #48939 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You could parametrize the first test, would reduce the duplicated code (test_compare_ea_and_np_dtype) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. parameterized the
Added one more test with 2 cases comparing only int64 dtype frames.
all with |
||
# comparison with int64 null dtype shouldn't obscure result. | ||
df1 = pd.DataFrame({"a": pd.Series([4.0, 4], dtype="Int64"), "b": [1.0, 2]}) | ||
df2 = pd.DataFrame({"a": pd.Series([1, pd.NA], dtype="Int64"), "b": [1.0, 2]}) | ||
result = df1.compare(df2, keep_shape=True) | ||
expected = pd.DataFrame( | ||
{ | ||
("a", "self"): pd.Series([4.0, 4], dtype="Int64"), | ||
("a", "other"): pd.Series([1, pd.NA], dtype="Int64"), | ||
("b", "self"): np.nan, | ||
("b", "other"): np.nan, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when comparing
NA
with value in nullable dtypesThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified the note