Skip to content

BUG: DataFrame.append with empty DataFrame and Series with tz-aware datetime value allocated object column #35038

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 16, 2020
4 changes: 2 additions & 2 deletions pandas/core/dtypes/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,11 +152,11 @@ def is_nonempty(x) -> bool:
target_dtype = find_common_type([x.dtype for x in to_concat])
to_concat = [_cast_to_common_type(arr, target_dtype) for arr in to_concat]

if isinstance(to_concat[0], ExtensionArray):
if isinstance(to_concat[0], ExtensionArray) and axis == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins do you remember why is was needed to add this axis=0 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cls = type(to_concat[0])
return cls._concat_same_type(to_concat)
else:
return np.concatenate(to_concat)
return np.concatenate(to_concat, axis=axis)

elif _contains_datetime or "timedelta" in typs:
return concat_datetime(to_concat, axis=axis, typs=typs)
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/internals/concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,7 @@ def _concatenate_join_units(join_units, concat_axis, copy):
# concatting with at least one EA means we are concatting a single column
# the non-EA values are 2D arrays with shape (1, n)
to_concat = [t if isinstance(t, ExtensionArray) else t[0, :] for t in to_concat]
concat_values = concat_compat(to_concat, axis=concat_axis)
concat_values = concat_compat(to_concat, axis=0)
if not isinstance(concat_values, ExtensionArray):
# if the result of concat is not an EA but an ndarray, reshape to
# 2D to put it a non-EA Block
Expand Down
7 changes: 1 addition & 6 deletions pandas/tests/reshape/test_concat.py
Original file line number Diff line number Diff line change
Expand Up @@ -1088,19 +1088,14 @@ def test_append_empty_frame_to_series_with_dateutil_tz(self):
s = Series({"date": date, "a": 1.0, "b": 2.0})
df = DataFrame(columns=["c", "d"])
result = df.append(s, ignore_index=True)
# n.b. it's not clear to me that expected is correct here.
# It's possible that the `date` column should have
# datetime64[ns, tz] dtype for both result and expected.
# that would be more consistent with new columns having
# their own dtype (float for a and b, datetime64ns, tz for date).
expected = DataFrame(
[[np.nan, np.nan, 1.0, 2.0, date]],
columns=["c", "d", "a", "b", "date"],
dtype=object,
)
# These columns get cast to object after append
expected["a"] = expected["a"].astype(float)
expected["b"] = expected["b"].astype(float)
expected["date"] = pd.to_datetime(expected["date"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just fix expected to do this directly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh, the comment # These columns get cast to object after append now makes more sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add tests for empty EA types (period, interval, categorical), we might already have them but this seems like a good place to put them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as additional values in the Series? a scalar Period and scalar Interval. not sure what to do for categorical.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't work for interval or period. will raise an issue. This was fixed for datetime64[ns, tzutc()] 'by accident', when ensuring consistency between np.concatenate and concat_compat, see #35032 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a different issue for period and interval since concatenating list of Series doesn't work either, whereas for tz-aware datetime works on master. see OP

on master

>>> interval = pd.Interval(0, 1, closed="right")
>>> interval
Interval(0, 1, closed='right')
>>>
>>> period = pd.Period("2012", freq="A-DEC")
>>> period
Period('2012', 'A-DEC')
>>>
>>> s = pd.Series({"period": period, "interval": interval})
>>> s
period        2012
interval    (0, 1]
dtype: object
>>>
>>> df = pd.DataFrame(columns=["c"])
>>> df
Empty DataFrame
Columns: [c]
Index: []
>>>
>>> result = df.append([s, s], ignore_index=True)
>>> result
     c period interval
0  NaN   2012   (0, 1]
1  NaN   2012   (0, 1]
>>>
>>> result.dtypes
c           object
period      object
interval    object
dtype: object
>>>
>>> df = pd.DataFrame([s, s])
>>> df
  period interval
0   2012   (0, 1]
1   2012   (0, 1]
>>>
>>> df.dtypes
period        period[A-DEC]
interval    interval[int64]
dtype: object
>>>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for period and interval, the issue is not limited to an empty dataframe

master

>>> df = pd.DataFrame([1, 2], columns=["c"])
>>> df
   c
0  1
1  2
>>>
>>> result = df.append([s], ignore_index=True)
>>> result
     c period interval
0  1.0    NaN      NaN
1  2.0    NaN      NaN
2  NaN   2012   (0, 1]
>>>
>>> result.dtypes
c           float64
period       object
interval     object
dtype: object
>>>
>>> result = df.append([s, s], ignore_index=True)
>>> result
     c period interval
0  1.0    NaN      NaN
1  2.0    NaN      NaN
2  NaN   2012   (0, 1]
3  NaN   2012   (0, 1]
>>>
>>> result.dtypes
c           float64
period       object
interval     object
dtype: object
>>>

will raise an issue.

don't think necessary. I think covered by #22994 and #22957 and others linked in those issues.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kk, yeah sometimes we have scattered tests and hard to tell whether we have exhaustive cases.

tm.assert_frame_equal(result, expected)


Expand Down