BUG: DTA/TDA/PA setitem incorrectly allowing i8 #33717

jbrockmendel · 2020-04-21T23:26:39Z

simonjayhawkins · 2020-04-22T14:18:03Z

pandas/core/arrays/datetimelike.py

-            value = self._unbox_scalar(value)
+            value = array(value)
+            if is_string_dtype(value.dtype):
+                # We got a StringArray


is this assumption important or is value pre-validated

>>> value = [1, 2, object()] >>> value = pd.array(value) >>> value <PandasArray> [1, 2, <object object at 0x0000015F4B631DA0>] Length: 3, dtype: object >>> >>> pd.core.dtypes.common.is_string_dtype(value.dtype) True >>>

might be worth adding type annotations to _validate_setitem_value

yikes, that looks like a problem with is_string_dtype

@jbrockmendel addressing here? or followup?

That's not a "problem" with is_string_dtype, because that function simply checks for object dtype. That's how it has been for a long time, and not sure if it is easy to solve, as actually checking for strings would mean to infer the dtypes from the values each time. We have #15585 about this.

(it's a problem in practice of course, since it makes this function rather useless. But the solution is probably improving the new "string" dtype so this can become the default string dtype in the future)

right actually I would say this is a bug in is_string_dtype which is actually just checking for object dtype and not an inferred string itself.

you can't infer string values from a dtype object, though ..

sure you can, this is excatly what infer_from_dtype does; we just don't do it inside this function because it can be non-cheap

updated to punt on is_string_dtype by instead checking is_dtype_equal(value.dtype, "string")

jorisvandenbossche · 2020-04-23T18:51:34Z

pandas/core/arrays/datetimelike.py

+            if is_string_dtype(value.dtype):
+                # We got a StringArray
+                try:
+                    # TODO: Could use from_sequence_of_strings if implemented


So related to my comment above, this TODO item is not correct (also the "We got a StringArray" comment above is thus not correct)

Do you have a suggestion for what to do here? _from_sequence is clearly too permissive

Why is _from_sequence too permissive for this purpose? It seems that whathever is allowed to create an array of this dtype, can also be allowed to set values?

this gets back to your point of from_sequence being too permissive for DTA/TDA/PA; it would let through e.g. float64 or int64 which shouldnt allow

This then becomes a bit ambiguous on what to allow and what not. Why allow ints to create but not to set, while allow strings in both creating and setting?
So why not disallow setting with string as well?

I suppose the answer is: that is the current behaviour .. But so to come back to your question what to use here: if you only want to allow actual strings, and not eg integers in an object dtype array, you can actually use infer_dtype to check it are strings or datetime objects?

Im assuming thats a rhetorical question; LMK if you actually are looking for an explanation.

So I'm clear: are you advocating for the status quo, or do you have something else in mind?

Well, it's not really a rhetorical question, I actually find it inconsistent to allow certain data types in creation and others in setting (I am fine with being more strict in setting compared to creation, but then we should be strict altogether and also not allow strings in setting, IMO).
But it's a question that doesn't need to be solved in this PR, that for sure!

To be clear: I am fine with the code, I only commented about the comment not really being correct. But now you updated the dtype check, I assume the comment is actually correct.

I actually find it inconsistent to allow certain data types in creation and others in setting

Yes. The upshot is that the preferred way of minimizing inconsistencies is to make from_sequence more strict. Let's leave that for another thread.

…like-helpers-2

jreback

looks fine, can you rebase.

@jorisvandenbossche ok here?

jreback · 2020-04-30T13:21:10Z

I supppose this is worth a whatsnew note as well

jorisvandenbossche · 2020-04-30T13:26:17Z

Yes, all good

jreback · 2020-04-30T13:44:45Z

thanks @jbrockmendel

if you can add a whatsnew in a followup

BUG: DTA/TDA/PA setitem incorrectly allowing i8

4fa0fd0

jbrockmendel added the Bug label Apr 21, 2020

simonjayhawkins reviewed Apr 22, 2020

View reviewed changes

jreback added the Datetime Datetime data dtype label Apr 23, 2020

jorisvandenbossche reviewed Apr 23, 2020

View reviewed changes

jreback added this to the 1.1 milestone Apr 23, 2020

jbrockmendel added 2 commits April 23, 2020 15:19

Merge branch 'master' of https://github.com/pandas-dev/pandas into dt…

e20a2f7

…like-helpers-2

stricter check

6f46bde

jreback approved these changes Apr 30, 2020

View reviewed changes

jreback merged commit 085752f into pandas-dev:master Apr 30, 2020

jbrockmendel deleted the dtlike-helpers-2 branch May 4, 2020 20:03

jbrockmendel mentioned this pull request May 4, 2020

DOC: whatsnew for 33717 #33977

Merged

rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request May 10, 2020

BUG: DTA/TDA/PA setitem incorrectly allowing i8 (pandas-dev#33717)

f1f541b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DTA/TDA/PA setitem incorrectly allowing i8 #33717

BUG: DTA/TDA/PA setitem incorrectly allowing i8 #33717

jbrockmendel commented Apr 21, 2020

simonjayhawkins Apr 22, 2020 •

edited

Loading

jbrockmendel Apr 22, 2020

jreback Apr 23, 2020

jbrockmendel Apr 23, 2020

jorisvandenbossche Apr 23, 2020

jreback Apr 23, 2020

jorisvandenbossche Apr 23, 2020

jreback Apr 23, 2020

jbrockmendel Apr 23, 2020

jorisvandenbossche Apr 23, 2020

jbrockmendel Apr 23, 2020

jorisvandenbossche Apr 23, 2020

jbrockmendel Apr 23, 2020

jorisvandenbossche Apr 23, 2020

jbrockmendel Apr 23, 2020

jorisvandenbossche Apr 24, 2020

jbrockmendel Apr 24, 2020

jreback left a comment

jreback commented Apr 30, 2020

jorisvandenbossche commented Apr 30, 2020

jreback commented Apr 30, 2020

BUG: DTA/TDA/PA setitem incorrectly allowing i8 #33717

BUG: DTA/TDA/PA setitem incorrectly allowing i8 #33717

Conversation

jbrockmendel commented Apr 21, 2020

simonjayhawkins Apr 22, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback commented Apr 30, 2020

jorisvandenbossche commented Apr 30, 2020

jreback commented Apr 30, 2020

simonjayhawkins Apr 22, 2020 •

edited

Loading