Skip to content

BUG: Fix issue with datetime[ns, tz] input in Block.setitem GH32395 #32479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
May 25, 2020
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
198474d
BUG: Fix issue with datetime[ns, tz] input in Block.setitem GH32395
h-vishal Mar 5, 2020
6eba1c9
Added tests for series, moved tests to test_loc.py
h-vishal Mar 6, 2020
11b4f29
Add dtype to series object in tests to suppress warning
h-vishal Mar 6, 2020
1b3cba1
Fix whatsnew
h-vishal Mar 6, 2020
d83cff9
Merge remote-tracking branch 'remotes/upstream/master' into issue-32395
h-vishal Mar 6, 2020
4e66228
Merge remote-tracking branch 'remotes/upstream/master' into issue-32395
h-vishal Mar 7, 2020
ac85aa3
Merge remote-tracking branch 'remotes/upstream/master' into issue-32395
h-vishal Mar 9, 2020
2d25c7a
Merge remote-tracking branch 'remotes/upstream/master' into issue-32395
h-vishal Mar 27, 2020
3e78985
Move whatsnew to 1.1.0
h-vishal Mar 27, 2020
7e600c7
Genralise fix for other extension array types
h-vishal Apr 2, 2020
05a788c
Merge remote-tracking branch 'remotes/upstream/master' into issue-32395
h-vishal Apr 2, 2020
b136ce3
Add new whatsnew
h-vishal Apr 3, 2020
d433d7c
Fix doc
h-vishal Apr 3, 2020
00199f6
Fix version number in whatsnew
h-vishal Apr 3, 2020
01f01fd
Merge remote-tracking branch 'remotes/upstream/master' into issue-32395
h-vishal Apr 3, 2020
8cfa045
TST: Fix precision test in tests.computation.test_eval.check_alignment()
h-vishal Apr 3, 2020
4f8fccd
Revert "TST: Fix precision test in
h-vishal Apr 4, 2020
0f8a913
CLN: Test style tests.indexing.test_loc
h-vishal Apr 4, 2020
9d4d1df
Merge remote-tracking branch 'remotes/upstream/master' into issue-32395
h-vishal Apr 4, 2020
a01a676
Modify tests after review
h-vishal Apr 6, 2020
26f8ed3
Merge remote-tracking branch 'remotes/upstream/master' into issue-32395
h-vishal Apr 6, 2020
3caf161
Merge remote-tracking branch 'upstream/master' into issue-32395
simonjayhawkins May 7, 2020
8d9e6d7
Merge remote-tracking branch 'upstream/master' into issue-32395
simonjayhawkins May 20, 2020
fbc31c8
nits
simonjayhawkins May 20, 2020
c7e7cf5
update tests
simonjayhawkins May 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ Version 1.0
.. toctree::
:maxdepth: 2

v1.0.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

v1.0.3
v1.0.2
v1.0.1
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.0.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,4 +26,4 @@ Bug fixes
Contributors
~~~~~~~~~~~~

.. contributors:: v1.0.2..v1.0.3|HEAD
.. contributors:: v1.0.2..v1.0.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

28 changes: 28 additions & 0 deletions doc/source/whatsnew/v1.0.4.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@

.. _whatsnew_104:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove this and all references to 1.0.4, just put a note in 1.1


What's new in 1.0.4 (April 15, 2020)
------------------------------------

These are the changes in pandas 1.0.4. See :ref:`release` for a full changelog
including other versions of pandas.

{{ header }}

.. ---------------------------------------------------------------------------

.. _whatsnew_104.regressions:

Fixed regressions
~~~~~~~~~~~~~~~~~
- Fixed regression in :meth:`DataFrame.loc`, :meth:`Series.loc` throwing an error when a ``datetime64[ns, tz]`` value is provided (:issue:`32395`)

.. _whatsnew_104.bug_fixes:

Bug fixes
~~~~~~~~~

Contributors
~~~~~~~~~~~~

.. contributors:: v1.0.3..v1.0.4|HEAD
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,7 @@ Indexing
- Fix to preserve the ability to index with the "nearest" method with xarray's CFTimeIndex, an :class:`Index` subclass (`pydata/xarray#3751 <https://github.com/pydata/xarray/issues/3751>`_, :issue:`32905`).
- Bug in :class:`Index` constructor where an unhelpful error message was raised for ``numpy`` scalars (:issue:`33017`)
- Bug in :meth:`DataFrame.lookup` incorrectly raising an ``AttributeError`` when ``frame.index`` or ``frame.columns`` is not unique; this will now raise a ``ValueError`` with a helpful error message (:issue:`33041`)
- Fixed regression in :meth:`DataFrame.loc`, :meth:`Series.loc` throwing an error when a ``datetime64[ns, tz]`` value is provided (:issue:`32395`)
- Bug in :meth:`DataFrame.iloc.__setitem__` creating a new array instead of overwriting ``Categorical`` values in-place (:issue:`32831`)

Missing
^^^^^^^
Expand Down
9 changes: 7 additions & 2 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -818,8 +818,6 @@ def setitem(self, indexer, value):

# coerce if block dtype can store value
values = self.values
if is_object_dtype(values.dtype) and isinstance(value, DatetimeArray):
value = value.astype(object)

if self._can_hold_element(value):
# We only get here for non-Extension Blocks, so _try_coerce_args
Expand Down Expand Up @@ -852,8 +850,10 @@ def setitem(self, indexer, value):
if is_extension_array_dtype(getattr(value, "dtype", None)):
# We need to be careful not to allow through strings that
# can be parsed to EADtypes
is_ea_value = True
arr_value = value
else:
is_ea_value = False
arr_value = np.array(value)

if transpose:
Expand Down Expand Up @@ -881,6 +881,11 @@ def setitem(self, indexer, value):
values[indexer] = value
return self.make_block(Categorical(self.values, dtype=arr_value.dtype))

elif exact_match and is_ea_value:
# GH#?32395 if we're going to replace the values entirely, just
# substitute in the new array
return self.make_block(arr_value)

# if we are an exact match (ex-broadcasting),
# then use the resultant dtype
elif exact_match:
Expand Down
5 changes: 4 additions & 1 deletion pandas/tests/computation/test_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,10 @@ def check_alignment(self, result, nlhs, ghs, op):

# direct numpy comparison
expected = self.ne.evaluate(f"nlhs {op} ghs")
tm.assert_numpy_array_equal(result.values, expected)
if np.issubdtype(result.values.dtype, np.floating):
tm.assert_almost_equal(result.values, expected)
else:
tm.assert_numpy_array_equal(result.values, expected)

# modulus, pow, and floor division require special casing

Expand Down
139 changes: 128 additions & 11 deletions pandas/tests/indexing/test_loc.py
Original file line number Diff line number Diff line change
Expand Up @@ -1098,14 +1098,45 @@ def test_loc_datetimelike_mismatched_dtypes():
df["a"].loc[tdi]


def test_loc_setitem_df_datetime64tz_column_with_index():
def test_loc_with_period_index_indexer():
# GH#4125
idx = pd.period_range("2002-01", "2003-12", freq="M")
df = pd.DataFrame(np.random.randn(24, 10), index=idx)
tm.assert_frame_equal(df, df.loc[idx])
tm.assert_frame_equal(df, df.loc[list(idx)])
tm.assert_frame_equal(df, df.loc[list(idx)])
tm.assert_frame_equal(df.iloc[0:5], df.loc[idx[0:5]])
tm.assert_frame_equal(df, df.loc[list(idx)])


def test_loc_setitem_df_datetime64tz_full_column_with_index():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this need to move to pandas/tests/extension/setitem

BUT you need to use the fixtures, so you are like writing ONE test.

df = pd.DataFrame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment with the issue number

pd.date_range("2020-01-01", "2020-01-06", 6, tz=timezone.utc), columns=["data"]
)
df2 = pd.DataFrame(index=df.index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

result =
expected =

as its not really clear what you are comparing here

df2.loc[df.index, "data"] = df["data"]
tm.assert_numpy_array_equal(np.array(df.data), np.array(df2.data))
assert df2.data.dtype == np.object
tm.assert_frame_equal(df, df2)


def test_loc_setitem_df_datetime64tz_slice():
df = pd.DataFrame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment

pd.date_range("2020-01-01", "2020-01-06", 6, tz=timezone.utc), columns=["data"]
)
df2 = pd.DataFrame(index=df.index)
expected = pd.DataFrame(
[
pd.Timestamp("2020-01-01", tz=timezone.utc),
pd.Timestamp("2020-01-02", tz=timezone.utc),
pd.Timestamp("2020-01-03", tz=timezone.utc),
pd.NaT,
pd.NaT,
pd.NaT,
],
columns=["data"],
dtype="object",
)
df2.loc[df.index[:3], "data"] = df["data"][:3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use
result =
expected =

tm.assert_frame_equal(df2, expected)


def test_loc_setitem_df_datetime64tz_column_without_index():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to repeat things like this, instead we aleady have generic tests in : pandas/tests/extension//base/setitem.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-vishal this is the key comment. Please do not repeat things like this. We already have fixtures for all of these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I have pushed a commit with the requested changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like anything has changed. there should be NO tests here, rather in pandas/tests/extension/base/setitem

Expand All @@ -1114,24 +1145,110 @@ def test_loc_setitem_df_datetime64tz_column_without_index():
)
df2 = pd.DataFrame(index=df.index)
df2.loc[:, "data"] = df["data"]
tm.assert_series_equal(df.data, df2.data)
tm.assert_frame_equal(df, df2)


def test_loc_setitem_series_datetime64tz_with_index():
def test_loc_setitem_series_datetime64tz_full_with_index():
s1 = pd.Series(
pd.date_range("2020-01-01", "2020-01-06", 6, tz=timezone.utc), name="data"
)
s2 = pd.Series(index=s1.index, dtype=np.object, name="data")
s2 = pd.Series(index=s1.index, dtype="object", name="data")
s2.loc[s1.index] = s1
tm.assert_numpy_array_equal(np.array(s1), np.array(s2))
assert s2.dtype == np.object
tm.assert_series_equal(s1, s2)


def test_loc_setitem_series_datetime64tz_slice():
s1 = pd.Series(
pd.date_range("2020-01-01", "2020-01-06", 6, tz=timezone.utc), name="data"
)
s2 = pd.Series(index=s1.index, dtype="object", name="data")
expected = pd.Series(
[
*pd.date_range("2020-01-01", "2020-01-03", 3, tz=timezone.utc),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the "*"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess do

dti = pd.date_range("2020-01-01", "2020-01-03", periods=6, tz=timezone.utc)
expected = pd.Series(dti)
expected[-3:] = pd.NaT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel okay, if that is preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel please note the object dtype for the result of a slice assignement

np.nan,
np.nan,
np.nan,
],
name="data",
dtype="object",
)
s2.loc[s1.index[:3]] = s1[:3]
tm.assert_series_equal(expected, s2)


def test_loc_setitem_series_datetime64tz_without_index():
s1 = pd.Series(
pd.date_range("2020-01-01", "2020-01-06", 6, tz=timezone.utc), name="data"
)
s2 = pd.Series(index=s1.index, dtype=np.object, name="data")
s2 = pd.Series(index=s1.index, dtype="object", name="data")
s2.loc[:] = s1
tm.assert_numpy_array_equal(np.array(s1), np.array(s2))
assert s2.dtype == np.object
tm.assert_series_equal(s1, s2)


def test_loc_setitem_series_timedelta64_full_with_index():
s1 = pd.Series(pd.timedelta_range(start="1 day", periods=4), name="data")
s2 = pd.Series(index=s1.index, dtype="object", name="data")
s2.loc[s1.index] = s1
tm.assert_series_equal(s1, s2)


def test_loc_setitem_df_period_full_column_with_index():
df = pd.DataFrame(pd.period_range(start="2020Q1", periods=5), columns=["data"])
df2 = pd.DataFrame(index=df.index)
df2.loc[df.index, "data"] = df["data"]
tm.assert_frame_equal(df, df2)


def test_loc_setitem_series_interval_full_with_index():
s1 = pd.Series(pd.interval_range(start=0, end=5), name="data")
s2 = pd.Series(index=s1.index, dtype="object", name="data")
s2.loc[s1.index] = s1
tm.assert_series_equal(s1, s2)


def test_loc_setitem_df_sparse_full_column_with_index():
df = pd.DataFrame(np.random.randn(100), columns=["data"])
df.iloc[5:95] = np.nan
df = df.astype(pd.SparseDtype("int", np.nan))
df2 = pd.DataFrame(index=df.index)
df2.loc[df.index, "data"] = df["data"]
tm.assert_frame_equal(df, df2)


def test_loc_setitem_series_categorical_full_with_index():
s1 = pd.Series(pd.Categorical([1] * 10 + [2] * 5 + [3] * 10), name="data")
s2 = pd.Series(index=s1.index, dtype="object", name="data")
s2.loc[s1.index] = s1
tm.assert_series_equal(s1, s2)


def test_loc_setitem_series_categorical_slice():
s1 = pd.Series(pd.Categorical([1] * 10 + [2] * 5 + [3] * 10), name="data")
s2 = pd.Series(index=s1.index, dtype="object", name="data")
expected = pd.Series(
pd.Categorical([1] * 10 + [2] * 5 + [np.nan] * 10), name="data", dtype=object
)
s2.loc[s1.index[:15]] = s1[:15]
tm.assert_series_equal(expected, s2)


def test_loc_setitem_df_interval_slice():
df = pd.DataFrame(pd.interval_range(start=0, end=5), columns=["data"])
df2 = pd.DataFrame(index=df.index)
expected = pd.DataFrame(
[*pd.interval_range(start=0, end=3), np.nan, np.nan], columns=["data"]
)
df2.loc[df.index[:3], "data"] = df["data"][:3]
tm.assert_frame_equal(df2, expected)


def test_loc_setitem_series_period_slice():
s1 = pd.Series(pd.period_range(start="2020Q1", periods=5), name="data")
s2 = pd.Series(index=s1.index, dtype="object", name="data")
expected = pd.Series(
[*pd.period_range(start="2020Q1", periods=3), np.nan, np.nan],
name="data",
dtype=object,
)
s2.loc[s1.index[:3]] = s1[:3]
tm.assert_series_equal(expected, s2)