Skip to content

BUG: Fix issue with datetime[ns, tz] input in Block.setitem GH32395 #32479

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
May 25, 2020

Conversation

h-vishal
Copy link
Contributor

@h-vishal h-vishal commented Mar 6, 2020

@jreback
Copy link
Contributor

jreback commented Mar 11, 2020

#29334 might actually fix this (or if not is a bit cleaner)

@TomAugspurger
Copy link
Contributor

#29334 doesn't quite seem to fix this (some of the new tests here fail on that branch).

@TomAugspurger
Copy link
Contributor

FYI, I'm currently planning to push this to the next release.

@h-vishal
Copy link
Contributor Author

Any thoughts on - #32479 (comment)?

@simonjayhawkins
Copy link
Member

#29334 doesn't quite seem to fix this (some of the new tests here fail on that branch).

#29334 has been merged and did not fix.

if this has stalled, is reverting #29139 an option?

@h-vishal can you merge master to resolve conflicts and move whatsnew to 1.1.0

@jbrockmendel
Copy link
Member

I've been working on Block.setitem, will see if I can find a nice option that doesnt entail reverting #29139.

# Conflicts:
#	doc/source/whatsnew/v1.0.2.rst
#	pandas/tests/indexing/test_loc.py
@h-vishal
Copy link
Contributor Author

@simonjayhawkins I have updated this, please look at alternate solutions discussed in #32479 (comment) and #32479 (comment) if one of those is preferred I will push another update.

@jbrockmendel let me know if there is an update on this.

@jbrockmendel
Copy link
Member

@h-vishal the thing to do here is to add a clause into Block.setitem just after the elif exact_match and is_categorical_dtype(arr_value.dtype) check:

        elif exact_match and is_extension_array_dtype(arr_value.dtype):
            # GH#32395 if we're going to replace the values entirely, just
            #  substitute in the new array
            return self.make_block(arr_value)

Then update the tests to use tm.assert_series_equal and tm.assert_frame_equal as others have suggested.

caveat 1: i havent run the whole test suite on this yet
caveat 2: this works if we are setting obj.loc[:] = other but I havent looked at obj.loc[:halfway] = other[:halfway]

@h-vishal
Copy link
Contributor Author

@jbrockmendel this will create a column with the corresponding datetime64[ns, tz] dtype when a full column assignment occurs so the current tests can be modified to use assert_frame_equals, partial column assignments to datetime64[ns, tz] data would create object dtype columns for which I can add additional tests.

Would it be better to use is_datetime64tz_dtype(arr_value.dtype) instead of is_extension_array_dtype(arr_value.dtype)?

s2 = pd.Series(index=s1.index, dtype="object", name="data")
expected = pd.Series(
[
*pd.date_range("2020-01-01", "2020-01-03", 3, tz=timezone.utc),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the "*"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess do

dti = pd.date_range("2020-01-01", "2020-01-03", periods=6, tz=timezone.utc)
expected = pd.Series(dti)
expected[-3:] = pd.NaT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel okay, if that is preferred.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel please note the object dtype for the result of a slice assignement

@jbrockmendel
Copy link
Member

While writing the new tests I came across a bug with a slice assignment of a sparse column

does this PR fix the newly found bug?

@h-vishal
Copy link
Contributor Author

h-vishal commented Apr 4, 2020

While writing the new tests I came across a bug with a slice assignment of a sparse column

does this PR fix the newly found bug?

@jbrockmendel no, it's unrelated, confirmed in version 0.25 as well, better to create a new issue for it.

@@ -0,0 +1,28 @@

.. _whatsnew_104:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove this and all references to 1.0.4, just put a note in 1.1

@@ -24,6 +24,7 @@ Version 1.0
.. toctree::
:maxdepth: 2

v1.0.4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this

@@ -26,4 +26,4 @@ Bug fixes
Contributors
~~~~~~~~~~~~

.. contributors:: v1.0.2..v1.0.3|HEAD
.. contributors:: v1.0.2..v1.0.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this



def test_loc_setitem_df_datetime64tz_full_column_with_index():
df = pd.DataFrame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment with the issue number

df = pd.DataFrame(
pd.date_range("2020-01-01", "2020-01-06", 6, tz=timezone.utc), columns=["data"]
)
df2 = pd.DataFrame(index=df.index)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

result =
expected =

as its not really clear what you are comparing here



def test_loc_setitem_df_datetime64tz_slice():
df = pd.DataFrame(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment

columns=["data"],
dtype="object",
)
df2.loc[df.index[:3], "data"] = df["data"][:3]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use
result =
expected =

tm.assert_frame_equal(df2, expected)


def test_loc_setitem_df_datetime64tz_column_without_index():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to repeat things like this, instead we aleady have generic tests in : pandas/tests/extension//base/setitem.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@h-vishal this is the key comment. Please do not repeat things like this. We already have fixtures for all of these.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I have pushed a commit with the requested changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't look like anything has changed. there should be NO tests here, rather in pandas/tests/extension/base/setitem

@h-vishal h-vishal requested a review from jreback April 8, 2020 09:35
@@ -1106,3 +1107,133 @@ def test_loc_with_period_index_indexer():
tm.assert_frame_equal(df, df.loc[list(idx)])
tm.assert_frame_equal(df.iloc[0:5], df.loc[idx[0:5]])
tm.assert_frame_equal(df, df.loc[list(idx)])


def test_loc_setitem_df_datetime64tz_full_column_with_index():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of this need to move to pandas/tests/extension/setitem

BUT you need to use the fixtures, so you are like writing ONE test.

@jbrockmendel
Copy link
Member

@h-vishal can you rebase and address comments

@simonjayhawkins simonjayhawkins mentioned this pull request May 25, 2020
@jreback jreback added this to the 1.0.4 milestone May 25, 2020
@jreback jreback merged commit 0f2ca37 into pandas-dev:master May 25, 2020
@jreback
Copy link
Contributor

jreback commented May 25, 2020

thanks @h-vishal

@lumberbot-app
Copy link

lumberbot-app bot commented May 25, 2020

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
$ git checkout 1.0.x
$ git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
$ git cherry-pick -m1 0f2ca378a2b692edba7baab19ea138b887838cc0
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
$ git commit -am 'Backport PR #32479: BUG: Fix issue with datetime[ns, tz] input in Block.setitem GH32395'
  1. Push to a named branch :
git push YOURFORK 1.0.x:auto-backport-of-pr-32479-on-1.0.x
  1. Create a PR against branch 1.0.x, I would have named this PR:

"Backport PR #32479 on branch 1.0.x"

And apply the correct labels and milestones.

Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!

If these instruction are inaccurate, feel free to suggest an improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

assignment with datetime64[ns, UTC] raises TypeError
6 participants