Skip to content

REGR: append tz-aware DataFrame with tz-naive values #36115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Fixed regressions
~~~~~~~~~~~~~~~~~
- Regression in :meth:`DatetimeIndex.intersection` incorrectly raising ``AssertionError`` when intersecting against a list (:issue:`35876`)
- Fix regression in updating a column inplace (e.g. using ``df['col'].fillna(.., inplace=True)``) (:issue:`35731`)
- Fix regression in :meth:`DataFrame.append` mixing tz-aware and tz-naive datetime columns (:issue:`35460`)
- Performance regression for :meth:`RangeIndex.format` (:issue:`35712`)
- Regression in :meth:`DataFrame.replace` where a ``TypeError`` would be raised when attempting to replace elements of type :class:`Interval` (:issue:`35931`)
- Fix regression in pickle roundtrip of the ``closed`` attribute of :class:`IntervalIndex` (:issue:`35658`)
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/dtypes/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,15 +148,17 @@ def is_nonempty(x) -> bool:
any_ea = any(is_extension_array_dtype(x.dtype) for x in to_concat)

if any_ea:
# we ignore axis here, as internally concatting with EAs is always
# for axis=0
if not single_dtype:
target_dtype = find_common_type([x.dtype for x in to_concat])
to_concat = [_cast_to_common_type(arr, target_dtype) for arr in to_concat]

if isinstance(to_concat[0], ExtensionArray) and axis == 0:
if isinstance(to_concat[0], ExtensionArray):
cls = type(to_concat[0])
return cls._concat_same_type(to_concat)
else:
return np.concatenate(to_concat, axis=axis)
return np.concatenate(to_concat)

elif _contains_datetime or "timedelta" in typs:
return concat_datetime(to_concat, axis=axis, typs=typs)
Expand Down
8 changes: 6 additions & 2 deletions pandas/core/internals/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
from pandas.core.dtypes.missing import isna

import pandas.core.algorithms as algos
from pandas.core.arrays import ExtensionArray
from pandas.core.arrays import DatetimeArray, ExtensionArray
from pandas.core.internals.blocks import make_block
from pandas.core.internals.managers import BlockManager

Expand Down Expand Up @@ -335,9 +335,13 @@ def _concatenate_join_units(join_units, concat_axis, copy):
# the non-EA values are 2D arrays with shape (1, n)
to_concat = [t if isinstance(t, ExtensionArray) else t[0, :] for t in to_concat]
concat_values = concat_compat(to_concat, axis=0)
if not isinstance(concat_values, ExtensionArray):
if not isinstance(concat_values, ExtensionArray) or (
isinstance(concat_values, DatetimeArray) and concat_values.tz is None
):
# if the result of concat is not an EA but an ndarray, reshape to
# 2D to put it a non-EA Block
# special case DatetimeArray, which *is* an EA, but is put in a
# consolidated 2D block
concat_values = np.atleast_2d(concat_values)
else:
concat_values = concat_compat(to_concat, axis=concat_axis)
Expand Down
17 changes: 17 additions & 0 deletions pandas/tests/reshape/test_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -1110,6 +1110,23 @@ def test_append_empty_frame_to_series_with_dateutil_tz(self):
result = df.append([s, s], ignore_index=True)
tm.assert_frame_equal(result, expected)

def test_append_empty_tz_frame_with_datetime64ns(self):
# https://github.com/pandas-dev/pandas/issues/35460
df = pd.DataFrame(columns=["a"]).astype("datetime64[ns, UTC]")

# pd.NaT gets inferred as tz-naive, so append result is tz-naive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it obvious this is the correct behavior? id expect object dtype result

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good point. Yes, it should indeed give object dtype (at least that's how we concat tz-naive/tz-aware otherwise).

Will need to check why it becomes datetime64 here ..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, so the reason is that we drop the empty arrays in concat_compat:

non_empties = [x for x in to_concat if is_nonempty(x)]
if non_empties and axis == 0:
to_concat = non_empties

so the empty tz-aware part is dropped, and the concat further only sees the tz-naive NaT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel thoughts on this? This PR is for sure fixing an AssertionError that otherwise bubbled to the user (and which was a regression), which should not happen, but not fully sure what we should do with the expected result.

result = df.append({"a": pd.NaT}, ignore_index=True)
expected = pd.DataFrame({"a": [pd.NaT]}).astype("datetime64[ns]")
tm.assert_frame_equal(result, expected)

# also test with typed value to append
df = pd.DataFrame(columns=["a"]).astype("datetime64[ns, UTC]")
result = df.append(
pd.Series({"a": pd.NaT}, dtype="datetime64[ns]"), ignore_index=True
)
expected = pd.DataFrame({"a": [pd.NaT]}).astype("datetime64[ns]")
tm.assert_frame_equal(result, expected)


class TestConcatenate:
def test_concat_copy(self):
Expand Down