BUG: fillna() for tz-aware timestamps causes an exception #15855

chrisaycock · 2017-03-31T19:34:58Z

Suppose I have a DataFrame of timestamps:

df = pd.DataFrame({'A': [pd.Timestamp('2012-11-11 00:00:00'),
                         pd.Timestamp('2012-11-11 00:00:00')]})

I can attempt to fill any nulls:

In [23]: df.fillna(method='pad')
Out[23]:
           A
0 2012-11-11
1 2012-11-11

Now suppose my timestsamps are tz-aware:

df = pd.DataFrame({'A': [pd.Timestamp('2012-11-11 00:00:00+00:00'),
                         pd.Timestamp('2012-11-11 00:00:00+00:00')]})

This causes an error:

In [25]: df.fillna(method='pad')
...
ValueError: total size of new array must be unchanged

This does not appear to be fixed by #14960.

jreback · 2017-03-31T19:52:47Z

hmm, does look buggy. These should be coerced to/from i8, filled, then re-localized.

chrisaycock · 2017-04-03T17:48:36Z

Also, this works as expected:

pd.DataFrame({'test': [pd.Timestamp('2012-01-01 13:00:00'),
                       pd.Timestamp('2012-01-01 13:00:00')]}) == -1

However, this raises an exception:

pd.DataFrame({'test': [pd.Timestamp('2012-01-01 13:00:00+00:00'),
                       pd.Timestamp('2012-01-01 13:00:00+00:00')]}) == -1

Furthermore, this works as expected:

pd.DataFrame({'time': [pd.Timestamp('2012-01-01 13:00:00+00:00')],
              'A': [3]}).groupby('A', as_index=False).head(1)

However, this loses the timezone:

pd.DataFrame({'time': [pd.Timestamp('2012-01-01 13:00:00+00:00')],
              'A': [3]}).groupby('A', as_index=False).first()

jreback · 2017-04-03T18:22:08Z

pd.DataFrame({'test': [pd.Timestamp('2012-01-01 13:00:00+00:00'),
                       pd.Timestamp('2012-01-01 13:00:00+00:00')]}) == -1

is the same exact bug (the issue is it can't reshape properly), it should have a try/except around that .reshape. same issue as in #15869

jreback · 2017-04-03T18:24:37Z

can you make a new issue for:

pd.DataFrame({'time': [pd.Timestamp('2012-01-01 13:00:00+00:00')],
              'A': [3]}).groupby('A', as_index=False).first()

I thought we had an issue for this. But can't seem to find it.

chrisaycock · 2017-04-03T22:21:32Z

I have split-out that issue as #15884.

chrisaycock · 2017-04-06T17:22:57Z

This is something that fixes the fillna() issue:

diff --git a/pandas/core/internals.py b/pandas/core/internals.py
index 8db801f..5736188 100644
--- a/pandas/core/internals.py
+++ b/pandas/core/internals.py
@@ -2475,7 +2475,7 @@ class DatetimeTZBlock(NonConsolidatableMixIn, DatetimeBlock):
         if isinstance(result, np.ndarray):
             # allow passing of > 1dim if its trivial
             if result.ndim > 1:
-                result = result.reshape(len(result))
+                result = result.reshape(np.prod(result.shape))
             result = self.values._shallow_copy(result)
 
         return result

I.e., use the total size of the result instead of just the length of one dimension. Here's an example of it at work:

In [12]: df = pd.DataFrame({'A': [pd.Timestamp('2012-11-11 00:00:00+01:00'),
    ...:                          pd.NaT]})
    ...: df.fillna(method='pad')
    ...: 
Out[12]: 
                          A
0 2012-11-11 00:00:00+01:00
1 2012-11-11 00:00:00+01:00

Does this make sense as a fix, or is there more to it than that?

jreback · 2017-04-06T17:26:01Z

in other cases like this, we have a try/except around reshape (in internals). but this is ok too. does it pass everything?

chrisaycock · 2017-04-06T17:31:00Z

It passes

$ find . -name test_missing.py | xargs nosetest

I can package this as a proper PR and see what happens with the CI.

…#15924) * BUG: use entire size of DatetimeTZBlock when coercing result (#15855) * Moved test * Removed unnecessary 'self' * Removed unnecessary 'self', again

chrisaycock · 2017-04-10T14:04:06Z

I've split-out the other issue I reported as its own item: #15966

jreback added Bug Difficulty Intermediate Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timezones Timezone data dtype labels Mar 31, 2017

jreback added this to the Next Minor Release milestone Mar 31, 2017

chrisaycock mentioned this issue Apr 6, 2017

BUG: use entire size of DatetimeTZBlock when coercing result (#15855) #15924

Merged

4 tasks

jreback modified the milestones: 0.20.0, Next Minor Release Apr 7, 2017

jreback closed this as completed in #15924 Apr 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: fillna() for tz-aware timestamps causes an exception #15855

BUG: fillna() for tz-aware timestamps causes an exception #15855

chrisaycock commented Mar 31, 2017

jreback commented Mar 31, 2017

chrisaycock commented Apr 3, 2017

jreback commented Apr 3, 2017

jreback commented Apr 3, 2017

chrisaycock commented Apr 3, 2017

chrisaycock commented Apr 6, 2017

jreback commented Apr 6, 2017

chrisaycock commented Apr 6, 2017

chrisaycock commented Apr 10, 2017

BUG: fillna() for tz-aware timestamps causes an exception #15855

BUG: fillna() for tz-aware timestamps causes an exception #15855

Comments

chrisaycock commented Mar 31, 2017

jreback commented Mar 31, 2017

chrisaycock commented Apr 3, 2017

jreback commented Apr 3, 2017

jreback commented Apr 3, 2017

chrisaycock commented Apr 3, 2017

chrisaycock commented Apr 6, 2017

jreback commented Apr 6, 2017

chrisaycock commented Apr 6, 2017

chrisaycock commented Apr 10, 2017