Skip to content

BUG: Respect errors="ignore" during extension astype #35979

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Sep 6, 2020
Merged
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ Bug fixes
- Bug in :meth:`DataFrame.eval` with ``object`` dtype column binary operations (:issue:`35794`)
- Bug in :class:`Series` constructor raising a ``TypeError`` when constructing sparse datetime64 dtypes (:issue:`35762`)
- Bug in :meth:`DataFrame.apply` with ``result_type="reduce"`` returning with incorrect index (:issue:`35683`)
- Bug in :meth:`Series.astype` and :meth:`DataFrame.astype` not respecting the ``errors`` argument when set to ``"ignore"`` for extension dtypes (:issue:`35471`)
- Bug in :meth:`DateTimeIndex.format` and :meth:`PeriodIndex.format` with ``name=True`` setting the first item to ``"None"`` where it should be ``""`` (:issue:`35712`)
- Bug in :meth:`Float64Index.__contains__` incorrectly raising ``TypeError`` instead of returning ``False`` (:issue:`35788`)
- Bug in :class:`Series` constructor incorrectly raising a ``TypeError`` when passed an ordered set (:issue:`36044`)
Expand Down
9 changes: 7 additions & 2 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -581,8 +581,13 @@ def astype(self, dtype, copy: bool = False, errors: str = "raise"):

# force the copy here
if self.is_extension:
# TODO: Should we try/except this astype?
values = self.values.astype(dtype)
try:
values = self.values.astype(dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of this
add in this keyword to the EA astype

it's a little more code (as likely you need to refactor into an _astype method

but better i think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add this, although it becomes a lot more code. I was able to pull a tiny bit into a helper function, but mostly each method seems to have its own idiosyncrasies that makes it hard to abstract away details. This became quite messy so let me know if I should revert and maybe leave as a follow-up enhancement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, Changing the signature on the base extension array is an api change. I'm not so sure if this is a good idea since this only fixes pandas EAs and third party EAs that have overridden astype will need to update their code to support this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems too large of a change for a bug fix, went ahead and reverted

except (ValueError, TypeError):
if errors == "ignore":
values = self.values
else:
raise
else:
if issubclass(dtype.type, str):

Expand Down
22 changes: 22 additions & 0 deletions pandas/tests/frame/methods/test_astype.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
CategoricalDtype,
DataFrame,
DatetimeTZDtype,
Interval,
IntervalDtype,
NaT,
Series,
Expand Down Expand Up @@ -565,3 +566,24 @@ def test_astype_empty_dtype_dict(self):
result = df.astype(dict())
tm.assert_frame_equal(result, df)
assert result is not df

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have tests for these same types for errors='raise' (the default) and errors='coerce'?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so for all, can add those here

@pytest.mark.parametrize(
"df",
[
DataFrame(Series(["x", "y", "z"], dtype="string")),
DataFrame(Series(["x", "y", "z"], dtype="category")),
DataFrame(Series(3 * [Timestamp("2020-01-01", tz="UTC")])),
DataFrame(Series(3 * [Interval(0, 1)])),
],
)
@pytest.mark.parametrize("errors", ["raise", "ignore"])
def test_astype_ignores_errors_for_extension_dtypes(self, df, errors):
# https://github.com/pandas-dev/pandas/issues/35471
if errors == "ignore":
expected = df
result = df.astype(float, errors=errors)
tm.assert_frame_equal(result, expected)
else:
msg = "(Cannot cast)|(could not convert)"
with pytest.raises((ValueError, TypeError), match=msg):
df.astype(float, errors=errors)
25 changes: 24 additions & 1 deletion pandas/tests/series/methods/test_astype.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
from pandas import Series, date_range
import pytest

from pandas import Interval, Series, Timestamp, date_range
import pandas._testing as tm


Expand All @@ -23,3 +25,24 @@ def test_astype_dt64tz_to_str(self):
dtype=object,
)
tm.assert_series_equal(result, expected)

@pytest.mark.parametrize(
"values",
[
Series(["x", "y", "z"], dtype="string"),
Series(["x", "y", "z"], dtype="category"),
Series(3 * [Timestamp("2020-01-01", tz="UTC")]),
Series(3 * [Interval(0, 1)]),
],
)
@pytest.mark.parametrize("errors", ["raise", "ignore"])
def test_astype_ignores_errors_for_extension_dtypes(self, values, errors):
# https://github.com/pandas-dev/pandas/issues/35471
if errors == "ignore":
expected = values
result = values.astype(float, errors="ignore")
tm.assert_series_equal(result, expected)
else:
msg = "(Cannot cast)|(could not convert)"
with pytest.raises((ValueError, TypeError), match=msg):
values.astype(float, errors=errors)