Skip to content

BUG: fix inserting tz-aware datetime to Series, closes #12862 #27322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 11, 2019
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1051,6 +1051,7 @@ Indexing
- Bug in :class:`Categorical` and :class:`CategoricalIndex` with :class:`Interval` values when using the ``in`` operator (``__contains``) with objects that are not comparable to the values in the ``Interval`` (:issue:`23705`)
- Bug in :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` on a :class:`DataFrame` with a single timezone-aware datetime64[ns] column incorrectly returning a scalar instead of a :class:`Series` (:issue:`27110`)
- Bug in :class:`CategoricalIndex` and :class:`Categorical` incorrectly raising ``ValueError`` instead of ``TypeError`` when a list is passed using the ``in`` operator (``__contains__``) (:issue:`21729`)
- Bug in :class:`Series` setting a new key (``__setitem__``) with a timezone-aware datetime incorrectly raising ``ValueError`` (:issue:`12862`)
-

Missing
Expand Down
7 changes: 7 additions & 0 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -1230,6 +1230,13 @@ def setitem(key, value):
if _is_unorderable_exception(e):
raise IndexError(key)

if is_scalar(key) and not is_integer(key) and key not in self.index:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would trace further

you are adding a new case whereby it’s likely an existing path is just incorrect somewhere

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the path here is above when we call self._set_with_engine(key, value), _set_with_engine calls self.index._engine.set_value(self._values, key, value). The engine expects self._values to be an ndarray, but here it is a DatetimeArray. Eventually it might be worth addressing this at the low-level engine level, but definitely out of scope for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, that's where I got stuck trying to fix this same bug. I tried converting it to an ndarray before passing it into the engine but a lot of other routines failed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to _set_with and its better (after the cases are narrowed down)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

# GH#12862 adding an new key to the Series
# Note: ave to exclude integers because that is ambiguously
# position-based
self.loc[key] = value
return

if com.is_bool_indexer(key):
key = check_bool_indexer(self.index, key)
try:
Expand Down
17 changes: 16 additions & 1 deletion pandas/tests/series/indexing/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -523,7 +523,22 @@ def test_setitem_with_tz_dst():
tm.assert_series_equal(s, exp)


def test_categorial_assigning_ops():
def test_setitem_new_key_tz():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a file for timezones and timeseries tests under indexing

prefer one of those

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. for next time, when does a test go in tests.indexing vs tests.series.indexing vs tests.frame.test_indexing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both tests.series.indexing.test_datetime and tests.indexing.test_datetime are about objects with DatetimeIndex, not datetime values

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well test_indexing is really big, just like to split out when dealing with specific types (may not be perfect but that's the idea); this should clearly be there.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think figuring this out merits a dedicated pass(es) similar to tests.arithmetic and tests.reductions

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see my comment below, I am sure we test this for .loc, so this belongs with it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Do you have a suggestion as to where that might be? I'm not eager to manually sort through 50k+ tests. (and there aren't any foo.loc setitem calls in either tests.series.test_timezones or tests.series.test_timeseries)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_loc_coerceion in pandas/tests/indexing/test_loc.py

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything in that file is testing loc. you want to mix non-loc tests in with that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved

# GH#12862 should not raise on assigning the second value
vals = [
pd.to_datetime(42).tz_localize("UTC"),
pd.to_datetime(666).tz_localize("UTC"),
]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also test using .loc which give the same result

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may actually have a test for this already (so good to combine)

ser = pd.Series()
ser["foo"] = vals[0]
ser["bar"] = vals[1]

expected = pd.Series(vals, index=["foo", "bar"])
tm.assert_series_equal(ser, expected)


def test_categorical_assigning_ops():
orig = Series(Categorical(["b", "b"], categories=["a", "b"]))
s = orig.copy()
s[:] = "a"
Expand Down