Skip to content

BUG: Fixed bug when creating new column with missing values when setting a single string value #56321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Dec 20, 2023
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.1.4.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ including other versions of pandas.

Fixed regressions
~~~~~~~~~~~~~~~~~
- Fixed bug when creating new column with missing values when setting a single string value (:issue:`56204`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put this into the bug column?

- Fixed regression when trying to read a pickled pandas :class:`DataFrame` from pandas 1.3 (:issue:`55137`)
-

Expand Down
8 changes: 6 additions & 2 deletions pandas/core/dtypes/missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -624,7 +624,7 @@ def array_equals(left: ArrayLike, right: ArrayLike) -> bool:
return array_equivalent(left, right, dtype_equal=True)


def infer_fill_value(val):
def infer_fill_value(val, length: int):
"""
infer the fill value for the nan/NaT from the provided
scalar/ndarray/list-like if we are a NaT, return the correct dtyped
Expand All @@ -643,7 +643,11 @@ def infer_fill_value(val):
return np.array("NaT", dtype=TD64NS_DTYPE)
return np.array(np.nan, dtype=object)
elif val.dtype.kind == "U":
return np.array(np.nan, dtype=val.dtype)
if get_option("future.infer_string"):
from pandas.core.construction import array as pd_array

return pd_array([np.nan] * length, dtype="string[pyarrow_numpy]")
return None
return np.nan


Expand Down
4 changes: 3 additions & 1 deletion pandas/core/indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1879,7 +1879,9 @@ def _setitem_with_indexer(self, indexer, value, name: str = "iloc"):

else:
# FIXME: GH#42099#issuecomment-864326014
self.obj[key] = infer_fill_value(value)
self.obj[key] = infer_fill_value(
value, length=len(self.obj)
)

new_indexer = convert_from_missing_indexer_tuple(
indexer, self.obj.axes
Expand Down
22 changes: 22 additions & 0 deletions pandas/tests/frame/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -1922,6 +1922,28 @@ def test_adding_new_conditional_column() -> None:
tm.assert_frame_equal(df, expected)


def test_adding_new_conditional_column_with_string() -> None:
# https://github.com/pandas-dev/pandas/issues/56204
df = DataFrame({"a": [1, 2], "b": [3, 4]})
df.loc[lambda x: x.a == 1, "c"] = "1"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it important that this be a lambda?

expected = DataFrame({"a": [1, 2], "b": [3, 4], "c": ["1", None]}).astype(
{"a": "int64", "b": "int64", "c": "object"}
)
tm.assert_frame_equal(df, expected)


def test_adding_new_conditional_column_with_infer_string() -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this one test and parametrize? in 3.0 we'll only want to end up with one test here right?

# https://github.com/pandas-dev/pandas/issues/56204
pytest.importorskip("pyarrow")
df = DataFrame({"a": [1, 2], "b": [3, 4]})
with pd.option_context("future.infer_string", True):
df.loc[lambda x: x.a == 1, "c"] = "1"
expected = DataFrame({"a": [1, 2], "b": [3, 4], "c": ["1", None]}).astype(
{"a": "int64", "b": "int64", "c": "string[pyarrow_numpy]"}
)
tm.assert_frame_equal(df, expected)


def test_add_new_column_infer_string():
# GH#55366
pytest.importorskip("pyarrow")
Expand Down