Skip to content

REGR: append tz-aware DataFrame with tz-naive values #36115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jorisvandenbossche
Copy link
Member

Closes #35460

@jorisvandenbossche jorisvandenbossche added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Regression Functionality that used to work in a prior pandas version labels Sep 4, 2020
@jorisvandenbossche jorisvandenbossche added this to the 1.1.2 milestone Sep 4, 2020
@@ -152,11 +152,11 @@ def is_nonempty(x) -> bool:
target_dtype = find_common_type([x.dtype for x in to_concat])
to_concat = [_cast_to_common_type(arr, target_dtype) for arr in to_concat]

if isinstance(to_concat[0], ExtensionArray) and axis == 0:
if isinstance(to_concat[0], ExtensionArray): # and axis == 0:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to #35038 (comment), testing if this is actually needed (need to clean this up, of course)

if not isinstance(concat_values, ExtensionArray):
if not isinstance(
concat_values, ExtensionArray
) or concat_values.dtype == np.dtype("M8[ns]"):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel what's the best way to check for naive DatetimeArray? (so only numpy datetime64[ns]) Eg is_datetime64_ns_dtype still gives True for datetime64[ns, tz]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isinstance(arr, DatetimeArray) and arr.tz is None

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is that necessarily better than what I did above?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine by me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't really have a preference, mostly want to be consistent with other code, so if the snippet you showed is used elsewhere, will change it

@jorisvandenbossche
Copy link
Member Author

cc @simonjayhawkins

# https://github.com/pandas-dev/pandas/issues/35460
df = pd.DataFrame(columns=["a"]).astype("datetime64[ns, UTC]")

# pd.NaT gets inferred as tz-naive, so append result is tz-naive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it obvious this is the correct behavior? id expect object dtype result

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Yes, it should indeed give object dtype (at least that's how we concat tz-naive/tz-aware otherwise).

Will need to check why it becomes datetime64 here ..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so the reason is that we drop the empty arrays in concat_compat:

non_empties = [x for x in to_concat if is_nonempty(x)]
if non_empties and axis == 0:
to_concat = non_empties

so the empty tz-aware part is dropped, and the concat further only sees the tz-naive NaT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel thoughts on this? This PR is for sure fixing an AssertionError that otherwise bubbled to the user (and which was a regression), which should not happen, but not fully sure what we should do with the expected result.

@jreback
Copy link
Contributor

jreback commented Sep 5, 2020

#[error]doc/source/whatsnew/v1.1.2.rst:19:- Fix regression in :meth:DataFrame.append mixing tz-aware and tz-naive datetime columns (:issue:35460) <- trailing whitespaces found

@@ -148,15 +148,18 @@ def is_nonempty(x) -> bool:
any_ea = any(is_extension_array_dtype(x.dtype) for x in to_concat)

if any_ea:
# we ignore axis here, as internally concatting with EAs is always
# for axis=0
# assert axis == 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would remove the assert as its commented (or actually uncomment ok either way)

@jreback jreback merged commit 337f3ca into pandas-dev:master Sep 6, 2020
@jreback
Copy link
Contributor

jreback commented Sep 6, 2020

thanks @jorisvandenbossche

@simonjayhawkins
Copy link
Member

@meeseeksdev backport 1.1.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: AssertionError: Number of Block dimensions (1) must equal number of axes (2) when typing a column
4 participants