Skip to content

BUG/API: DTI/TDI/PI.insert cast to object on failure #39068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jan 21, 2021

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Jan 9, 2021

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

ATM DTI/TDI/PI/insert are pretty idiosyncratic. This changes them to match the base Index.insert by a) always trying _validate_fill_value (this changes the behavior for strings that can be parsed to datetime/timedelta/period) and b) always fall back to object in cases where this validation fails (this changes behavior for non-strings where the validation fails).

AFAICT the current behavior was implemented in #5819 and the only discussion I see about castable strings or non-castable non-strings is this comment, which I don't think is relevant any longer.

One corner case is mismatched tzs, which raise in master. In the PR they cast to object. They could also reasonably cast to the index's tz (xref #37605)

cc @rhshadrach IIRC you had a question about the insert tests in test_coercion. I think this addresses what you were asking about. Let me know if I missed something.

@jbrockmendel
Copy link
Member Author

xref #37774

@jreback jreback added the Datetime Datetime data dtype label Jan 9, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a more general question here. why are we trying to accomodate this at all with a non-conforming value? shouldn't we just plain raise? and leave it to the caller to astype(object)?

of course this might be a moar breaking change that what is here.

@jbrockmendel
Copy link
Member Author

a more general question here. why are we trying to accomodate this at all with a non-conforming value? shouldn't we just plain raise? and leave it to the caller to astype(object)?

if we were starting from scratch that'd be a reasonable option, but implementing that consistently would break a lot of existing code. The best we can do is try to be internally consistent.

@jorisvandenbossche
Copy link
Member

a more general question here. why are we trying to accomodate this at all with a non-conforming value? shouldn't we just plain raise? and leave it to the caller to astype(object)?

Agreed with this, and that is also why I have been arguing for this behaviour for the new extension dtypes.

I certainly follow the goal to be consistent internally, but if we prefer the strict behavior, and if we do that for the new nullable extension dtypes (both "if"s to discuss), we will have some inconsistency for a period of time anyway. In which case it might make sense to not fix the remaining current inconsistencies as done here.

@jbrockmendel
Copy link
Member Author

why are we trying to accomodate this at all with a non-conforming value? shouldn't we just plain raise? and leave it to the caller to astype(object)?

This comes up in at least two fairly-common contexts:

  1. df[new_col] = foo when new_col doesn't match df.columns.dtype
  2. obj.reset_index()

It seems awkward to ask users to do df.columns = df.columns.astype(object) before that kind of operation, especially since the reset_index can happen somewhat deep inside other functions.

and if we do that for the new nullable extension dtypes (both "if"s to discuss), we will have some inconsistency for a period of time anyway

I'm not sure i follow. This is about Index.insert, so nullable dtypes are still aspirational.

@jorisvandenbossche
Copy link
Member

Ah yes, sorry, I had "arrays" in my head when writing that, and not "indexes" (we still need to have the discussion for arrays, will need to open a dedicated issue about that at some point).

Fully agreed that those use cases you mention are important for Index objects where automatic upcasting is useful.

AFAICT the current behavior was implemented in #5819 and the only discussion I see about castable strings or non-castable non-strings is this comment, which I don't think is relevant any longer.

No longer relevant in the sense that we no longer check if a string is datetime-like ? Or ..?

One corner case is mismatched tzs, which raise in master. In the PR they cast to object. They could also reasonably cast to the index's tz (xref #37605)

Yes, and similarly as being discussed on that issue, my preference would be to not cast to object for this case, but preserve the original tz

@jbrockmendel
Copy link
Member Author

i think the earlier non-consensus was a misunderstanding and outstanding issues have been resolved?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you construct a whatsnew note about this. maybe should be a sub-section, though ok i think with just a listing of what specific types this will now upcast to object rather than raise.

@jbrockmendel
Copy link
Member Author

whatsnew added + green

@jreback jreback added this to the 1.3 milestone Jan 20, 2021
@jreback jreback added the Dtype Conversions Unexpected or buggy dtype conversions label Jan 20, 2021
@@ -283,6 +283,7 @@ Indexing
- Bug in :meth:`DataFrame.loc` dropping levels of :class:`MultiIndex` when :class:`DataFrame` used as input has only one row (:issue:`10521`)
- Bug in setting ``timedelta64`` values into numeric :class:`Series` failing to cast to object dtype (:issue:`39086`)
- Bug in setting :class:`Interval` values into a :class:`Series` or :class:`DataFrame` with mismatched :class:`IntervalDtype` incorrectly casting the new values to the existing dtype (:issue:`39120`)
- Bug in incorrectly raising when setting a new column that cannot be held in the existing ``frame.columns`` instead of casting to a compatible dtype (:issue:`39068`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm isn't this directly Index.insert ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, could flesh this out (.reset_index is also affected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think its worth being more explict here as this is a non-trivial change

@jreback jreback merged commit 412554b into pandas-dev:master Jan 21, 2021
@jreback
Copy link
Contributor

jreback commented Jan 21, 2021

thanks @jbrockmendel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants