Skip to content

CLN: enforce deprecation of interpolate with object dtype #57820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -201,6 +201,7 @@ Removal of prior version deprecations/changes
- Changed the default value of ``observed`` in :meth:`DataFrame.groupby` and :meth:`Series.groupby` to ``True`` (:issue:`51811`)
- Enforced deprecation disallowing parsing datetimes with mixed time zones unless user passes ``utc=True`` to :func:`to_datetime` (:issue:`57275`)
- Enforced deprecation of :meth:`.DataFrameGroupBy.get_group` and :meth:`.SeriesGroupBy.get_group` allowing the ``name`` argument to be a non-tuple when grouping by a list of length 1 (:issue:`54155`)
- Enforced deprecation of :meth:`Series.interpolate` and :meth:`DataFrame.interpolate` for object-dtype (:issue:`57820`)
- Enforced deprecation of ``axis=None`` acting the same as ``axis=0`` in the DataFrame reductions ``sum``, ``prod``, ``std``, ``var``, and ``sem``, passing ``axis=None`` will now reduce over both axes; this is particularly the case when doing e.g. ``numpy.sum(df)`` (:issue:`21597`)
- Enforced deprecation of passing a dictionary to :meth:`SeriesGroupBy.agg` (:issue:`52268`)
- Enforced deprecation of string ``AS`` denoting frequency in :class:`YearBegin` and strings ``AS-DEC``, ``AS-JAN``, etc. denoting annual frequencies with various fiscal year starts (:issue:`57793`)
Expand Down
32 changes: 15 additions & 17 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -7845,16 +7845,21 @@ def interpolate(
obj, should_transpose = self, False
else:
obj, should_transpose = (self.T, True) if axis == 1 else (self, False)
if np.any(obj.dtypes == object):
# GH#53631
if not (obj.ndim == 2 and np.all(obj.dtypes == object)):
# don't warn in cases that already raise
warnings.warn(
f"{type(self).__name__}.interpolate with object dtype is "
"deprecated and will raise in a future version. Call "
"obj.infer_objects(copy=False) before interpolating instead.",
FutureWarning,
stacklevel=find_stack_level(),
# GH#53631
if obj.ndim == 1 and obj.dtype == object:
raise TypeError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest doing this check inside Block.interpolate

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's a TODO(3.0) comment related to this inside Block.interpolate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you, I found the comment. In #58083 I moved raising TypeError from NDFrame to Block.

I think maybe it's better to change the error message from "NumpyBlock cannot interpolate with object dtype." to "Can not interpolate with object dtype."?

f"{type(self).__name__} cannot interpolate with object dtype."
)
if obj.ndim == 2:
if np.all(obj.dtypes == object):
raise TypeError(
"Cannot interpolate with all object-dtype columns "
"in the DataFrame. Try setting at least one "
"column to a numeric dtype."
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can now simplify things and remove this check. The reason: we raise

TypeError(
    f"{type(self).__name__} cannot interpolate with object dtype."
)

in case np.any(obj.dtypes == object) below and there is no need to raise Type Error if np.all(obj.dtypes == object). Should I remove raising

if np.all(obj.dtypes == object):
    raise TypeError(
        "Cannot interpolate with all object-dtype columns "
        "in the DataFrame. Try setting at least one "
        "column to a numeric dtype."
    )

?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Yeah try that out

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the messaging will need to probably change, this could be a Series

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I don't understand why is it a Series? I thought it's a DataFrame, because we are in the block
if obj.ndim == 2:

I suggest to use the message f"{type(self).__name__} cannot interpolate with object dtype.", the same one we use in the block below

elif np.any(obj.dtypes == object):
    raise TypeError(
        f"{type(self).__name__} cannot interpolate with object dtype."
    )

I think we can just remove the block with the old message:

if obj.ndim == 2:
    if np.all(obj.dtypes == object):
...

and change the old message in tests to the new one "DataFrame cannot interpolate with object dtype"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry I had misunderstood your initial suggestion.

You can also go a step further and replace All of these checks with

if np.any(obj.dtypes == object):
    raise TypeError(f"{type(self).__name__} cannot interpolate with object dtype.")

Since Series.dtypes and DataFrame.dtypes can be checked at the same time

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, my wording wasn't clear.

My idea was to remove the second part of the error message:

"Cannot interpolate with all object-dtype columns in the DataFrame. Try setting at least one column to a numeric dtype."

"Try setting at least one column to a numeric dtype" won't work from now on because of the enforcing in this PR. Then I thought why not to replace the first part: "Cannot interpolate with all object-dtype columns in the DataFrame." with a new msg: "... cannot interpolate with object dtype."
If we do this we can remove one check block.

I made changes. Could you please take a look at it?

elif np.any(obj.dtypes == object):
raise TypeError(
f"{type(self).__name__} cannot interpolate with object dtype."
)

if method in fillna_methods and "fill_value" in kwargs:
Expand All @@ -7871,13 +7876,6 @@ def interpolate(

limit_direction = missing.infer_limit_direction(limit_direction, method)

if obj.ndim == 2 and np.all(obj.dtypes == object):
raise TypeError(
"Cannot interpolate with all object-dtype columns "
"in the DataFrame. Try setting at least one "
"column to a numeric dtype."
)

if method.lower() in fillna_methods:
# TODO(3.0): remove this case
# TODO: warn/raise on limit_direction or kwargs which are ignored?
Expand Down
16 changes: 4 additions & 12 deletions pandas/tests/copy_view/test_interp_fillna.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,20 +111,12 @@ def test_interp_fill_functions_inplace(func, dtype):
assert view._mgr._has_no_reference(0)


def test_interpolate_cleaned_fill_method():
# Check that "method is set to None" case works correctly
def test_interpolate_cannot_with_object_dtype():
df = DataFrame({"a": ["a", np.nan, "c"], "b": 1})
df_orig = df.copy()

msg = "DataFrame.interpolate with object dtype"
with tm.assert_produces_warning(FutureWarning, match=msg):
result = df.interpolate(method="linear")

assert np.shares_memory(get_array(result, "a"), get_array(df, "a"))
result.iloc[0, 0] = Timestamp("2021-12-31")

assert not np.shares_memory(get_array(result, "a"), get_array(df, "a"))
tm.assert_frame_equal(df, df_orig)
msg = "DataFrame cannot interpolate with object dtype"
with pytest.raises(TypeError, match=msg):
df.interpolate()


def test_interpolate_object_convert_no_op():
Expand Down
41 changes: 14 additions & 27 deletions pandas/tests/frame/methods/test_interpolate.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,29 +76,14 @@ def test_interp_basic(self):
"D": list("abcd"),
}
)
expected = DataFrame(
{
"A": [1.0, 2.0, 3.0, 4.0],
"B": [1.0, 4.0, 9.0, 9.0],
"C": [1, 2, 3, 5],
"D": list("abcd"),
}
)
msg = "DataFrame.interpolate with object dtype"
with tm.assert_produces_warning(FutureWarning, match=msg):
result = df.interpolate()
tm.assert_frame_equal(result, expected)
msg = "DataFrame cannot interpolate with object dtype"
with pytest.raises(TypeError, match=msg):
df.interpolate()

# check we didn't operate inplace GH#45791
cvalues = df["C"]._values
dvalues = df["D"].values
assert np.shares_memory(cvalues, result["C"]._values)
assert np.shares_memory(dvalues, result["D"]._values)

with tm.assert_produces_warning(FutureWarning, match=msg):
res = df.interpolate(inplace=True)
assert res is None
tm.assert_frame_equal(df, expected)
with pytest.raises(TypeError, match=msg):
df.interpolate(inplace=True)

# check we DID operate inplace
assert np.shares_memory(df["C"]._values, cvalues)
Expand All @@ -117,14 +102,16 @@ def test_interp_basic_with_non_range_index(self, using_infer_string):
}
)

msg = "DataFrame.interpolate with object dtype"
warning = FutureWarning if not using_infer_string else None
with tm.assert_produces_warning(warning, match=msg):
msg = "DataFrame cannot interpolate with object dtype"
if not using_infer_string:
with pytest.raises(TypeError, match=msg):
df.set_index("C").interpolate()
else:
result = df.set_index("C").interpolate()
expected = df.set_index("C")
expected.loc[3, "A"] = 3
expected.loc[5, "B"] = 9
tm.assert_frame_equal(result, expected)
expected = df.set_index("C")
expected.loc[3, "A"] = 3
expected.loc[5, "B"] = 9
tm.assert_frame_equal(result, expected)

def test_interp_empty(self):
# https://github.com/pandas-dev/pandas/issues/35598
Expand Down
6 changes: 3 additions & 3 deletions pandas/tests/series/methods/test_interpolate.py
Original file line number Diff line number Diff line change
Expand Up @@ -846,10 +846,10 @@ def test_interpolate_unsorted_index(self, ascending, expected_values):

def test_interpolate_asfreq_raises(self):
ser = Series(["a", None, "b"], dtype=object)
msg2 = "Series.interpolate with object dtype"
msg2 = "Series cannot interpolate with object dtype"
msg = "Invalid fill method"
with pytest.raises(ValueError, match=msg):
with tm.assert_produces_warning(FutureWarning, match=msg2):
with pytest.raises(TypeError, match=msg2):
with pytest.raises(ValueError, match=msg):
ser.interpolate(method="asfreq")

def test_interpolate_fill_value(self):
Expand Down
Loading