Skip to content

BUG: Series.where not casting None to nan #39761

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.3.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -450,6 +450,7 @@ Other
- Bug in constructing a :class:`Series` from a list and a :class:`PandasDtype` (:issue:`39357`)
- Bug in :class:`Styler` which caused CSS to duplicate on multiple renders. (:issue:`39395`)
- ``inspect.getmembers(Series)`` no longer raises an ``AbstractMethodError`` (:issue:`38782`)
- Bug in :meth:`Series.where` with numeric dtype and ``other = None`` not casting to ``nan`` (:issue:`39761`)
- :meth:`Index.where` behavior now mirrors :meth:`Index.putmask` behavior, i.e. ``index.where(mask, other)`` matches ``index.putmask(~mask, other)`` (:issue:`39412`)
- Bug in :func:`pandas.testing.assert_series_equal`, :func:`pandas.testing.assert_frame_equal`, :func:`pandas.testing.assert_index_equal` and :func:`pandas.testing.assert_extension_array_equal` incorrectly raising when an attribute has an unrecognized NA type (:issue:`39461`)
- Bug in :class:`Styler` where ``subset`` arg in methods raised an error for some valid multiindex slices (:issue:`33562`)
Expand Down
5 changes: 4 additions & 1 deletion pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
)
from pandas.core.dtypes.dtypes import CategoricalDtype, ExtensionDtype, PandasDtype
from pandas.core.dtypes.generic import ABCDataFrame, ABCIndex, ABCPandasArray, ABCSeries
from pandas.core.dtypes.missing import isna
from pandas.core.dtypes.missing import is_valid_na_for_dtype, isna

import pandas.core.algorithms as algos
from pandas.core.array_algos.putmask import (
Expand Down Expand Up @@ -1298,6 +1298,9 @@ def where(self, other, cond, errors="raise", axis: int = 0) -> List[Block]:

cond = _extract_bool_array(cond)

if is_valid_na_for_dtype(other, self.dtype) and not self.is_object:
other = self.fill_value

if cond.ravel("K").all():
result = values
else:
Expand Down
47 changes: 0 additions & 47 deletions pandas/tests/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -837,53 +837,6 @@ def test_label_indexing_on_nan(self):
assert result2 == expected


class TestSeriesNoneCoercion:
EXPECTED_RESULTS = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the general rationale to decide which tests go in /tests/indexing and which in /tests/series/indexing ?

Here, it are tests that have a Series version and a DataFrame version (just below here). Since those tests are very similar, IMO it makes sense to keep them together?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What the general rationale to decide which tests go in /tests/indexing and which in /tests/series/indexing ?

Some of this is laid out in test_writing.rst. There is no doubt room for improvement in that file if you have suggestions.

The over-arching MO is to organize indexing tests by method. So anything for Series.__setitem__ would go in tests/series/indexing/test_setitem.py. Under this theory, all of tests(/frame|/series)?/indexing/test_indexing.py files and all of the tests(/frame|/series)?/indexing/test_{dtype}.py files would eventually have their tests moved to a method-specific file. Many of the remaining tests in these files are hard to categorize as being for a single method, so do not (yet?) conform to this pattern.

The MO specific to this PR and recent similar ones is to get more Series.__setitem__ tests to use SetitemCastingEquivalents which is proving very useful for finding inconsistencies. This cuts against the method-specific organization so the final location of it may be reconsidered at some point.

I expect we will either extend SetitemCastingEquivalents to include DataFrame cases or implement similar classes that handle cases SetitemCastingEquivalents can't.

# For numeric series, we should coerce to NaN.
([1, 2, 3], [np.nan, 2, 3]),
([1.0, 2.0, 3.0], [np.nan, 2.0, 3.0]),
# For datetime series, we should coerce to NaT.
(
[datetime(2000, 1, 1), datetime(2000, 1, 2), datetime(2000, 1, 3)],
[NaT, datetime(2000, 1, 2), datetime(2000, 1, 3)],
),
# For objects, we should preserve the None value.
(["foo", "bar", "baz"], [None, "bar", "baz"]),
]

@pytest.mark.parametrize("start_data,expected_result", EXPECTED_RESULTS)
def test_coercion_with_setitem(self, start_data, expected_result):
start_series = Series(start_data)
start_series[0] = None

expected_series = Series(expected_result)
tm.assert_series_equal(start_series, expected_series)

@pytest.mark.parametrize("start_data,expected_result", EXPECTED_RESULTS)
def test_coercion_with_loc_setitem(self, start_data, expected_result):
start_series = Series(start_data)
start_series.loc[0] = None

expected_series = Series(expected_result)
tm.assert_series_equal(start_series, expected_series)

@pytest.mark.parametrize("start_data,expected_result", EXPECTED_RESULTS)
def test_coercion_with_setitem_and_series(self, start_data, expected_result):
start_series = Series(start_data)
start_series[start_series == start_series[0]] = None

expected_series = Series(expected_result)
tm.assert_series_equal(start_series, expected_series)

@pytest.mark.parametrize("start_data,expected_result", EXPECTED_RESULTS)
def test_coercion_with_loc_and_series(self, start_data, expected_result):
start_series = Series(start_data)
start_series.loc[start_series == start_series[0]] = None

expected_series = Series(expected_result)
tm.assert_series_equal(start_series, expected_series)


class TestDataframeNoneCoercion:
EXPECTED_SINGLE_ROW_RESULTS = [
# For numeric series, we should coerce to NaN.
Expand Down
70 changes: 68 additions & 2 deletions pandas/tests/series/indexing/test_setitem.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from datetime import date
from datetime import date, datetime

import numpy as np
import pytest
Expand Down Expand Up @@ -297,7 +297,12 @@ def _check_inplace(self, is_inplace, orig, arr, obj):
# We are not (yet) checking whether setting is inplace or not
pass
elif is_inplace:
assert obj._values is arr
if arr.dtype.kind in ["m", "M"]:
# We may not have the same DTA/TDA, but will have the same
# underlying data
assert arr._data is obj._values._data
else:
assert obj._values is arr
else:
# otherwise original array should be unchanged
tm.assert_equal(arr, orig._values)
Expand Down Expand Up @@ -635,6 +640,37 @@ def is_inplace(self):
return True


class TestSetitemNATimedelta64Dtype(SetitemCastingEquivalents):
# some nat-like values should be cast to timedelta64 when inserting
# into a timedelta64 series. Others should coerce to object
# and retain their dtypes.

@pytest.fixture
def obj(self):
return Series([0, 1, 2], dtype="m8[ns]")

@pytest.fixture(
params=[NaT, np.timedelta64("NaT", "ns"), np.datetime64("NaT", "ns")]
)
def val(self, request):
return request.param

@pytest.fixture
def is_inplace(self, val):
# cast to object iff val is datetime64("NaT")
return val is NaT or val.dtype.kind == "m"

@pytest.fixture
def expected(self, obj, val, is_inplace):
dtype = obj.dtype if is_inplace else object
expected = Series([val] + list(obj[1:]), dtype=dtype)
return expected

@pytest.fixture
def key(self):
return 0


class TestSetitemMismatchedTZCastsToObject(SetitemCastingEquivalents):
# GH#24024
@pytest.fixture
Expand All @@ -659,3 +695,33 @@ def expected(self):
dtype=object,
)
return expected


@pytest.mark.parametrize(
"obj,expected",
[
# For numeric series, we should coerce to NaN.
(Series([1, 2, 3]), Series([np.nan, 2, 3])),
(Series([1.0, 2.0, 3.0]), Series([np.nan, 2.0, 3.0])),
# For datetime series, we should coerce to NaT.
(
Series([datetime(2000, 1, 1), datetime(2000, 1, 2), datetime(2000, 1, 3)]),
Series([NaT, datetime(2000, 1, 2), datetime(2000, 1, 3)]),
),
# For objects, we should preserve the None value.
(Series(["foo", "bar", "baz"]), Series([None, "bar", "baz"])),
],
)
class TestSeriesNoneCoercion(SetitemCastingEquivalents):
@pytest.fixture
def key(self):
return 0

@pytest.fixture
def val(self):
return None

@pytest.fixture
def is_inplace(self, obj):
# This is specific to the 4 cases currently implemented for this class.
return obj.dtype.kind != "i"