Skip to content

BUG: fillna() for tz-aware timestamps causes an exception #15855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
chrisaycock opened this issue Mar 31, 2017 · 9 comments · Fixed by #15924
Closed

BUG: fillna() for tz-aware timestamps causes an exception #15855

chrisaycock opened this issue Mar 31, 2017 · 9 comments · Fixed by #15924
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timezones Timezone data dtype
Milestone

Comments

@chrisaycock
Copy link
Contributor

Suppose I have a DataFrame of timestamps:

df = pd.DataFrame({'A': [pd.Timestamp('2012-11-11 00:00:00'),
                         pd.Timestamp('2012-11-11 00:00:00')]})

I can attempt to fill any nulls:

In [23]: df.fillna(method='pad')
Out[23]:
           A
0 2012-11-11
1 2012-11-11

Now suppose my timestsamps are tz-aware:

df = pd.DataFrame({'A': [pd.Timestamp('2012-11-11 00:00:00+00:00'),
                         pd.Timestamp('2012-11-11 00:00:00+00:00')]})

This causes an error:

In [25]: df.fillna(method='pad')
...
ValueError: total size of new array must be unchanged

This does not appear to be fixed by #14960.

@jreback jreback added Bug Difficulty Intermediate Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timezones Timezone data dtype labels Mar 31, 2017
@jreback jreback added this to the Next Minor Release milestone Mar 31, 2017
@jreback
Copy link
Contributor

jreback commented Mar 31, 2017

hmm, does look buggy. These should be coerced to/from i8, filled, then re-localized.

@chrisaycock
Copy link
Contributor Author

Also, this works as expected:

pd.DataFrame({'test': [pd.Timestamp('2012-01-01 13:00:00'),
                       pd.Timestamp('2012-01-01 13:00:00')]}) == -1

However, this raises an exception:

pd.DataFrame({'test': [pd.Timestamp('2012-01-01 13:00:00+00:00'),
                       pd.Timestamp('2012-01-01 13:00:00+00:00')]}) == -1

Furthermore, this works as expected:

pd.DataFrame({'time': [pd.Timestamp('2012-01-01 13:00:00+00:00')],
              'A': [3]}).groupby('A', as_index=False).head(1)

However, this loses the timezone:

pd.DataFrame({'time': [pd.Timestamp('2012-01-01 13:00:00+00:00')],
              'A': [3]}).groupby('A', as_index=False).first()

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

pd.DataFrame({'test': [pd.Timestamp('2012-01-01 13:00:00+00:00'),
                       pd.Timestamp('2012-01-01 13:00:00+00:00')]}) == -1

is the same exact bug (the issue is it can't reshape properly), it should have a try/except around that .reshape. same issue as in #15869

@jreback
Copy link
Contributor

jreback commented Apr 3, 2017

can you make a new issue for:

pd.DataFrame({'time': [pd.Timestamp('2012-01-01 13:00:00+00:00')],
              'A': [3]}).groupby('A', as_index=False).first()

I thought we had an issue for this. But can't seem to find it.

@chrisaycock
Copy link
Contributor Author

I have split-out that issue as #15884.

@chrisaycock
Copy link
Contributor Author

This is something that fixes the fillna() issue:

diff --git a/pandas/core/internals.py b/pandas/core/internals.py
index 8db801f..5736188 100644
--- a/pandas/core/internals.py
+++ b/pandas/core/internals.py
@@ -2475,7 +2475,7 @@ class DatetimeTZBlock(NonConsolidatableMixIn, DatetimeBlock):
         if isinstance(result, np.ndarray):
             # allow passing of > 1dim if its trivial
             if result.ndim > 1:
-                result = result.reshape(len(result))
+                result = result.reshape(np.prod(result.shape))
             result = self.values._shallow_copy(result)
 
         return result

I.e., use the total size of the result instead of just the length of one dimension. Here's an example of it at work:

In [12]: df = pd.DataFrame({'A': [pd.Timestamp('2012-11-11 00:00:00+01:00'),
    ...:                          pd.NaT]})
    ...: df.fillna(method='pad')
    ...: 
Out[12]: 
                          A
0 2012-11-11 00:00:00+01:00
1 2012-11-11 00:00:00+01:00

Does this make sense as a fix, or is there more to it than that?

@jreback
Copy link
Contributor

jreback commented Apr 6, 2017

in other cases like this, we have a try/except around reshape (in internals). but this is ok too. does it pass everything?

@chrisaycock
Copy link
Contributor Author

It passes

$ find . -name test_missing.py | xargs nosetest

I can package this as a proper PR and see what happens with the CI.

@jreback jreback modified the milestones: 0.20.0, Next Minor Release Apr 7, 2017
jreback pushed a commit that referenced this issue Apr 7, 2017
…#15924)

* BUG: use entire size of DatetimeTZBlock when coercing result (#15855)

* Moved test

* Removed unnecessary 'self'

* Removed unnecessary 'self', again
@chrisaycock
Copy link
Contributor Author

I've split-out the other issue I reported as its own item: #15966

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants