Skip to content

DEPR: Disallow dtype inference when setting Index into DataFrame #56102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 9, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v2.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -455,6 +455,7 @@ Other Deprecations
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_xml` except ``path_or_buffer``. (:issue:`54229`)
- Deprecated allowing passing :class:`BlockManager` objects to :class:`DataFrame` or :class:`SingleBlockManager` objects to :class:`Series` (:issue:`52419`)
- Deprecated downcasting behavior in :meth:`Series.where`, :meth:`DataFrame.where`, :meth:`Series.mask`, :meth:`DataFrame.mask`, :meth:`Series.clip`, :meth:`DataFrame.clip`; in a future version these will not infer object-dtype columns to non-object dtype, or all-round floats to integer dtype. Call ``result.infer_objects(copy=False)`` on the result for object inference, or explicitly cast floats to ints. To opt in to the future version, use ``pd.set_option("future.no_silent_downcasting", True)`` (:issue:`53656`)
- Deprecated dtype inference when setting a :class:`Index` into a :class:`DataFrame`, cast explicitly instead (:issue:`56102`)
- Deprecated including the groups in computations when using :meth:`.DataFrameGroupBy.apply` and :meth:`.DataFrameGroupBy.resample`; pass ``include_groups=False`` to exclude the groups (:issue:`7155`)
- Deprecated indexing an :class:`Index` with a boolean indexer of length zero (:issue:`55820`)
- Deprecated not passing a tuple to :class:`.DataFrameGroupBy.get_group` or :class:`.SeriesGroupBy.get_group` when grouping by a length-1 list-like (:issue:`25971`)
Expand Down
16 changes: 15 additions & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -5219,7 +5219,21 @@ def _sanitize_column(self, value) -> tuple[ArrayLike, BlockValuesRefs | None]:

if is_list_like(value):
com.require_length_match(value, self.index)
return sanitize_array(value, self.index, copy=True, allow_2d=True), None
arr = sanitize_array(value, self.index, copy=True, allow_2d=True)
if (
isinstance(value, Index)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be more performant to do the Index check before the sanitize_array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? We have to sanitise anyway?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you know you have an Index (and no dtype), sanitize_array boils down to:

if len(value) != len(index): raise ...
return value._values

so a lot of sanitize_array may be made unnecessary in this case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I want in the future, but we go through maybe_infer_to_datetimelike at the moment which changes the dtype, so we can't shortcut this before we enforce the deprecation

and value.dtype == "object"
and arr.dtype != value.dtype
): #
# TODO: Remove kludge in sanitize_array for string mode when enforcing
# this deprecation
warnings.warn(
"Setting an Index with object dtype into a DataFrame will no longer "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is "will no longer" clear enough that this refers to a future version/deprecation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I'll put up a PR to clarify

"infer another dtype. Cast the Index explicitly before setting.",
FutureWarning,
stacklevel=find_stack_level(),
)
return arr, None

@property
def _series(self):
Expand Down
18 changes: 18 additions & 0 deletions pandas/tests/frame/indexing/test_setitem.py
Original file line number Diff line number Diff line change
Expand Up @@ -786,6 +786,24 @@ def test_loc_setitem_ea_dtype(self):
df.iloc[:, 0] = Series([11], dtype="Int64")
tm.assert_frame_equal(df, expected)

def test_setitem_object_inferring(self):
# GH#56102
idx = Index([Timestamp("2019-12-31")], dtype=object)
df = DataFrame({"a": [1]})
with tm.assert_produces_warning(FutureWarning, match="infer"):
df.loc[:, "b"] = idx
with tm.assert_produces_warning(FutureWarning, match="infer"):
df["c"] = idx

expected = DataFrame(
{
"a": [1],
"b": Series([Timestamp("2019-12-31")], dtype="datetime64[ns]"),
"c": Series([Timestamp("2019-12-31")], dtype="datetime64[ns]"),
}
)
tm.assert_frame_equal(df, expected)


class TestSetitemTZAwareValues:
@pytest.fixture
Expand Down