-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: append tz-aware DataFrame with tz-naive values #36115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: append tz-aware DataFrame with tz-naive values #36115
Conversation
pandas/core/dtypes/concat.py
Outdated
@@ -152,11 +152,11 @@ def is_nonempty(x) -> bool: | |||
target_dtype = find_common_type([x.dtype for x in to_concat]) | |||
to_concat = [_cast_to_common_type(arr, target_dtype) for arr in to_concat] | |||
|
|||
if isinstance(to_concat[0], ExtensionArray) and axis == 0: | |||
if isinstance(to_concat[0], ExtensionArray): # and axis == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to #35038 (comment), testing if this is actually needed (need to clean this up, of course)
pandas/core/internals/concat.py
Outdated
if not isinstance(concat_values, ExtensionArray): | ||
if not isinstance( | ||
concat_values, ExtensionArray | ||
) or concat_values.dtype == np.dtype("M8[ns]"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel what's the best way to check for naive DatetimeArray? (so only numpy datetime64[ns]) Eg is_datetime64_ns_dtype
still gives True for datetime64[ns, tz]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isinstance(arr, DatetimeArray) and arr.tz is None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is that necessarily better than what I did above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine by me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't really have a preference, mostly want to be consistent with other code, so if the snippet you showed is used elsewhere, will change it
# https://github.com/pandas-dev/pandas/issues/35460 | ||
df = pd.DataFrame(columns=["a"]).astype("datetime64[ns, UTC]") | ||
|
||
# pd.NaT gets inferred as tz-naive, so append result is tz-naive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it obvious this is the correct behavior? id expect object dtype result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, good point. Yes, it should indeed give object dtype (at least that's how we concat tz-naive/tz-aware otherwise).
Will need to check why it becomes datetime64 here ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so the reason is that we drop the empty arrays in concat_compat
:
pandas/pandas/core/dtypes/concat.py
Lines 139 to 141 in 1a21836
non_empties = [x for x in to_concat if is_nonempty(x)] | |
if non_empties and axis == 0: | |
to_concat = non_empties |
so the empty tz-aware part is dropped, and the concat further only sees the tz-naive NaT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel thoughts on this? This PR is for sure fixing an AssertionError that otherwise bubbled to the user (and which was a regression), which should not happen, but not fully sure what we should do with the expected result.
|
pandas/core/dtypes/concat.py
Outdated
@@ -148,15 +148,18 @@ def is_nonempty(x) -> bool: | |||
any_ea = any(is_extension_array_dtype(x.dtype) for x in to_concat) | |||
|
|||
if any_ea: | |||
# we ignore axis here, as internally concatting with EAs is always | |||
# for axis=0 | |||
# assert axis == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would remove the assert as its commented (or actually uncomment ok either way)
thanks @jorisvandenbossche |
@meeseeksdev backport 1.1.x |
Closes #35460