BUG: DataFrame.append with empty DataFrame and Series with tz-aware datetime value allocated object column #35038

simonjayhawkins · 2020-06-28T10:59:27Z

broken off #35032

This PR fixes situation where appending a series consecutively to a empty dataframe produces different result from appending the series in one step.

on master

>>> import pandas as pd
>>>
>>> pd.__version__
'1.1.0.dev0+1974.g0159cba6e'
>>>
>>> import dateutil
>>>
>>> date = pd.Timestamp("2018-10-24 07:30:00", tz=dateutil.tz.tzutc())
>>> date
Timestamp('2018-10-24 07:30:00+0000', tz='tzutc()')
>>>
>>> s = pd.Series({"date": date, "a": 1.0, "b": 2.0})
>>> s
date    2018-10-24 07:30:00+00:00
a                               1
b                               2
dtype: object
>>>
>>> df = pd.DataFrame(columns=["c", "d"])
>>> df
Empty DataFrame
Columns: [c, d]
Index: []
>>>
>>> result_a = df.append(s, ignore_index=True)
>>> result_a
     c    d    a    b                       date
0  NaN  NaN  1.0  2.0  2018-10-24 07:30:00+00:00
>>>
>>> result_a.dtypes
c        object
d        object
a       float64
b       float64
date     object
dtype: object
>>>
>>> result_b = result_a.append(s, ignore_index=True)
>>> result_b
     c    d    a    b                       date
0  NaN  NaN  1.0  2.0  2018-10-24 07:30:00+00:00
1  NaN  NaN  1.0  2.0  2018-10-24 07:30:00+00:00
>>>
>>> result_b.dtypes
c        object
d        object
a       float64
b       float64
date     object
dtype: object
>>>
>>> result = df.append([s, s], ignore_index=True)
>>> result
     c    d                      date    a    b
0  NaN  NaN 2018-10-24 07:30:00+00:00  1.0  2.0
1  NaN  NaN 2018-10-24 07:30:00+00:00  1.0  2.0
>>>
>>> result.dtypes
c                        object
d                        object
date    datetime64[ns, tzutc()]
a                       float64
b                       float64
dtype: object
>>>

…atetime value allocated object column

simonjayhawkins · 2020-06-28T14:37:09Z

Linux py37_np_dev test failures unrelated, xref #35041

pandas/core/internals/concat.py

jreback · 2020-06-29T12:58:44Z

pandas/tests/reshape/test_concat.py

        expected["a"] = expected["a"].astype(float)
        expected["b"] = expected["b"].astype(float)
+        expected["date"] = pd.to_datetime(expected["date"])


can you just fix expected to do this directly

ahh, the comment # These columns get cast to object after append now makes more sense.

jreback · 2020-06-29T12:59:35Z

pandas/tests/reshape/test_concat.py

        expected["a"] = expected["a"].astype(float)
        expected["b"] = expected["b"].astype(float)
+        expected["date"] = pd.to_datetime(expected["date"])


can you add tests for empty EA types (period, interval, categorical), we might already have them but this seems like a good place to put them.

as additional values in the Series? a scalar Period and scalar Interval. not sure what to do for categorical.

doesn't work for interval or period. will raise an issue. This was fixed for datetime64[ns, tzutc()] 'by accident', when ensuring consistency between np.concatenate and concat_compat, see #35032 (comment)

it's a different issue for period and interval since concatenating list of Series doesn't work either, whereas for tz-aware datetime works on master. see OP

on master

>>> interval = pd.Interval(0, 1, closed="right") >>> interval Interval(0, 1, closed='right') >>> >>> period = pd.Period("2012", freq="A-DEC") >>> period Period('2012', 'A-DEC') >>> >>> s = pd.Series({"period": period, "interval": interval}) >>> s period 2012 interval (0, 1] dtype: object >>> >>> df = pd.DataFrame(columns=["c"]) >>> df Empty DataFrame Columns: [c] Index: [] >>> >>> result = df.append([s, s], ignore_index=True) >>> result c period interval 0 NaN 2012 (0, 1] 1 NaN 2012 (0, 1] >>> >>> result.dtypes c object period object interval object dtype: object >>> >>> df = pd.DataFrame([s, s]) >>> df period interval 0 2012 (0, 1] 1 2012 (0, 1] >>> >>> df.dtypes period period[A-DEC] interval interval[int64] dtype: object >>>

for period and interval, the issue is not limited to an empty dataframe

master

>>> df = pd.DataFrame([1, 2], columns=["c"]) >>> df c 0 1 1 2 >>> >>> result = df.append([s], ignore_index=True) >>> result c period interval 0 1.0 NaN NaN 1 2.0 NaN NaN 2 NaN 2012 (0, 1] >>> >>> result.dtypes c float64 period object interval object dtype: object >>> >>> result = df.append([s, s], ignore_index=True) >>> result c period interval 0 1.0 NaN NaN 1 2.0 NaN NaN 2 NaN 2012 (0, 1] 3 NaN 2012 (0, 1] >>> >>> result.dtypes c float64 period object interval object dtype: object >>>

will raise an issue.

don't think necessary. I think covered by #22994 and #22957 and others linked in those issues.

kk, yeah sometimes we have scattered tests and hard to tell whether we have exhaustive cases.

jreback · 2020-06-29T23:12:37Z

pandas/tests/reshape/test_concat.py

-            [[np.nan, np.nan, 1.0, 2.0, date]],
-            columns=["c", "d", "a", "b", "date"],
-            dtype=object,
+            [[np.nan, np.nan, 1.0, 2.0, date]], columns=["c", "d", "a", "b", "date"]


does this fully replicate the OP test?

extended test

jreback

does this need a whatsnew note? was this a change from 1.0.x?

simonjayhawkins · 2020-06-30T13:01:12Z

I didn't add one since no issue had been raised. I will add a release note and reference this PR?

jreback · 2020-06-30T14:26:13Z

I didn't add one since no issue had been raised. I will add a release note and reference this PR?

yep that's good (and generally policy)

simonjayhawkins · 2020-06-30T14:58:26Z

release note added

simonjayhawkins · 2020-07-16T13:40:00Z

@jreback anything more to do here.

as an aside and not wanting to weigh in on the 2d EA debate; in order to have a Categorical scalar, would we also need a 0d EA? I wondered this before when working on #33846 where IIRC I though at the time the special casing would not have been necessary with 0d EA. Also, while working on #35032, I think it is starting to make sense to not restrict EAs to 1d.

jreback · 2020-07-16T22:50:43Z

thanks @simonjayhawkins

jreback · 2020-07-16T22:51:28Z

@jreback anything more to do here.

as an aside and not wanting to weigh in on the 2d EA debate; in order to have a Categorical scalar, would we also need a 0d EA? I wondered this before when working on #33846 where IIRC I though at the time the special casing would not have been necessary with 0d EA. Also, while working on #35032, I think it is starting to make sense to not restrict EAs to 1d.

can-o-worms. well i don't actually think you need a 0-d categorical scalar, a scalar will just do, of course then you cannot infer this to a categorical w/o context but that's ok.

jorisvandenbossche · 2020-09-04T13:25:35Z

pandas/core/dtypes/concat.py

@@ -152,11 +152,11 @@ def is_nonempty(x) -> bool:
            target_dtype = find_common_type([x.dtype for x in to_concat])
            to_concat = [_cast_to_common_type(arr, target_dtype) for arr in to_concat]

-        if isinstance(to_concat[0], ExtensionArray):
+        if isinstance(to_concat[0], ExtensionArray) and axis == 0:


@simonjayhawkins do you remember why is was needed to add this axis=0 ?

see #35032 (comment)

BUG: DataFrame.append with empty DataFrame and Series with tz-aware d…

e20cd55

…atetime value allocated object column

simonjayhawkins added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype labels Jun 28, 2020

Merge remote-tracking branch 'upstream/master' into precursor

f762000

simonjayhawkins mentioned this pull request Jun 29, 2020

API: implement __array_function__ for ExtensionArray #35032

Closed

jreback requested changes Jun 29, 2020

View reviewed changes

change expected as per comment

df2e135

jreback reviewed Jun 29, 2020

View reviewed changes

simonjayhawkins added 2 commits June 30, 2020 09:31

Merge remote-tracking branch 'upstream/master' into precursor

5c9619b

extend test

20c3c73

jreback reviewed Jun 30, 2020

View reviewed changes

simonjayhawkins added 2 commits June 30, 2020 14:38

Merge remote-tracking branch 'upstream/master' into precursor

a7ee95f

add release note

662f7ef

simonjayhawkins added 2 commits July 1, 2020 09:06

Merge remote-tracking branch 'upstream/master' into precursor

aa08441

Merge remote-tracking branch 'upstream/master' into precursor

a45517e

jreback added this to the 1.1 milestone Jul 16, 2020

jreback approved these changes Jul 16, 2020

View reviewed changes

jreback merged commit 6509028 into pandas-dev:master Jul 16, 2020

simonjayhawkins deleted the precursor branch July 17, 2020 08:03

simonjayhawkins mentioned this pull request Jul 29, 2020

BUG: AssertionError: Number of Block dimensions (1) must equal number of axes (2) when typing a column #35460

Closed

3 tasks

jorisvandenbossche reviewed Sep 4, 2020

View reviewed changes

jorisvandenbossche mentioned this pull request Sep 4, 2020

REGR: append tz-aware DataFrame with tz-naive values #36115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DataFrame.append with empty DataFrame and Series with tz-aware datetime value allocated object column #35038

BUG: DataFrame.append with empty DataFrame and Series with tz-aware datetime value allocated object column #35038

simonjayhawkins commented Jun 28, 2020

simonjayhawkins commented Jun 28, 2020

jreback Jun 29, 2020

simonjayhawkins Jun 29, 2020

jreback Jun 29, 2020

simonjayhawkins Jun 29, 2020

simonjayhawkins Jun 29, 2020

simonjayhawkins Jun 29, 2020

simonjayhawkins Jun 29, 2020

jreback Jun 29, 2020

jreback Jun 29, 2020

simonjayhawkins Jun 30, 2020

jreback left a comment

simonjayhawkins commented Jun 30, 2020

jreback commented Jun 30, 2020

simonjayhawkins commented Jun 30, 2020

simonjayhawkins commented Jul 16, 2020

jreback commented Jul 16, 2020

jreback commented Jul 16, 2020

jorisvandenbossche Sep 4, 2020

simonjayhawkins Sep 4, 2020

BUG: DataFrame.append with empty DataFrame and Series with tz-aware datetime value allocated object column #35038

BUG: DataFrame.append with empty DataFrame and Series with tz-aware datetime value allocated object column #35038

Conversation

simonjayhawkins commented Jun 28, 2020

simonjayhawkins commented Jun 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

simonjayhawkins commented Jun 30, 2020

jreback commented Jun 30, 2020

simonjayhawkins commented Jun 30, 2020

simonjayhawkins commented Jul 16, 2020

jreback commented Jul 16, 2020

jreback commented Jul 16, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment