Skip to content

Commit 10b1300

Browse files
authored
BUG: compare returning all nan columns when comparing ea and np dtypes (#47994)
1 parent 6ba2a67 commit 10b1300

File tree

4 files changed

+30
-0
lines changed

4 files changed

+30
-0
lines changed

doc/source/whatsnew/v1.5.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -948,6 +948,8 @@ Indexing
948948
- Bug in :meth:`DataFrame.mask` with ``inplace=True`` and ``ExtensionDtype`` columns incorrectly raising (:issue:`45577`)
949949
- Bug in getting a column from a DataFrame with an object-dtype row index with datetime-like values: the resulting Series now preserves the exact object-dtype Index from the parent DataFrame (:issue:`42950`)
950950
- Bug in :meth:`DataFrame.__getattribute__` raising ``AttributeError`` if columns have ``"string"`` dtype (:issue:`46185`)
951+
- Bug in :meth:`DataFrame.compare` returning all ``NaN`` column when comparing extension array dtype and numpy dtype (:issue:`44014`)
952+
- Bug in :meth:`DataFrame.where` setting wrong values with ``"boolean"`` mask for numpy dtype (:issue:`44014`)
951953
- Bug in indexing on a :class:`DatetimeIndex` with a ``np.str_`` key incorrectly raising (:issue:`45580`)
952954
- Bug in :meth:`CategoricalIndex.get_indexer` when index contains ``NaN`` values, resulting in elements that are in target but not present in the index to be mapped to the index of the NaN element, instead of -1 (:issue:`45361`)
953955
- Bug in setting large integer values into :class:`Series` with ``float32`` or ``float16`` dtype incorrectly altering these values instead of coercing to ``float64`` dtype (:issue:`45844`)

pandas/core/internals/blocks.py

+2
Original file line numberDiff line numberDiff line change
@@ -1062,6 +1062,8 @@ def where(self, other, cond, _downcast="infer") -> list[Block]:
10621062

10631063
transpose = self.ndim == 2
10641064

1065+
cond = extract_bool_array(cond)
1066+
10651067
# EABlocks override where
10661068
values = cast(np.ndarray, self.values)
10671069
orig_other = other

pandas/tests/frame/indexing/test_where.py

+10
Original file line numberDiff line numberDiff line change
@@ -1046,3 +1046,13 @@ def test_where_mask_deprecated(frame_or_series):
10461046

10471047
with tm.assert_produces_warning(FutureWarning):
10481048
obj.mask(mask, -1, errors="raise")
1049+
1050+
1051+
def test_where_producing_ea_cond_for_np_dtype():
1052+
# GH#44014
1053+
df = DataFrame({"a": Series([1, pd.NA, 2], dtype="Int64"), "b": [1, 2, 3]})
1054+
result = df.where(lambda x: x.apply(lambda y: y > 1, axis=1))
1055+
expected = DataFrame(
1056+
{"a": Series([pd.NA, pd.NA, 2], dtype="Int64"), "b": [np.nan, 2, 3]}
1057+
)
1058+
tm.assert_frame_equal(result, expected)

pandas/tests/frame/methods/test_compare.py

+16
Original file line numberDiff line numberDiff line change
@@ -236,3 +236,19 @@ def test_invalid_input_result_names(result_names):
236236
),
237237
):
238238
df1.compare(df2, result_names=result_names)
239+
240+
241+
def test_compare_ea_and_np_dtype():
242+
# GH#44014
243+
df1 = pd.DataFrame({"a": [4.0, 4], "b": [1.0, 2]})
244+
df2 = pd.DataFrame({"a": pd.Series([1, pd.NA], dtype="Int64"), "b": [1.0, 2]})
245+
result = df1.compare(df2, keep_shape=True)
246+
expected = pd.DataFrame(
247+
{
248+
("a", "self"): [4.0, np.nan],
249+
("a", "other"): pd.Series([1, pd.NA], dtype="Int64"),
250+
("b", "self"): np.nan,
251+
("b", "other"): np.nan,
252+
}
253+
)
254+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)