-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: timezone lost in groupby-agg with cython functions #15426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you try it out on a more recent version of pandas, or add a copy-pastable example so that someone else can check? Might have been fixed already. |
further use |
Actually, here's a repro: In [63]: ts = pd.Series(pd.date_range('2016', periods=12, freq='H').tz_localize("UTC").tz_convert("US/Eastern"))
In [64]: ts
Out[64]:
0 2015-12-31 19:00:00-05:00
1 2015-12-31 20:00:00-05:00
2 2015-12-31 21:00:00-05:00
3 2015-12-31 22:00:00-05:00
4 2015-12-31 23:00:00-05:00
...
7 2016-01-01 02:00:00-05:00
8 2016-01-01 03:00:00-05:00
9 2016-01-01 04:00:00-05:00
10 2016-01-01 05:00:00-05:00
11 2016-01-01 06:00:00-05:00
dtype: datetime64[ns, US/Eastern]
In [65]: ts.groupby(level=0).agg(np.min)
Out[65]:
0 2016-01-01 00:00:00-05:00
1 2016-01-01 01:00:00-05:00
2 2016-01-01 02:00:00-05:00
3 2016-01-01 03:00:00-05:00
4 2016-01-01 04:00:00-05:00
...
7 2016-01-01 07:00:00-05:00
8 2016-01-01 08:00:00-05:00
9 2016-01-01 09:00:00-05:00
10 2016-01-01 10:00:00-05:00
11 2016-01-01 11:00:00-05:00
dtype: datetime64[ns, US/Eastern]
In [66]: ts.groupby(level=0).min()
Out[66]:
0 2016-01-01 00:00:00-05:00
1 2016-01-01 01:00:00-05:00
2 2016-01-01 02:00:00-05:00
3 2016-01-01 03:00:00-05:00
4 2016-01-01 04:00:00-05:00
...
7 2016-01-01 07:00:00-05:00
8 2016-01-01 08:00:00-05:00
9 2016-01-01 09:00:00-05:00
10 2016-01-01 10:00:00-05:00
11 2016-01-01 11:00:00-05:00
dtype: datetime64[ns, US/Eastern]
my thought too, but |
I think the expected output there is identical to the input (since the index is already unique). |
|
dupe of this: #10668 though I like this example. |
actually, let's leave this one open instead. |
@munierSalem if you'd like to debug would be great! The groupby tz support is a bit buggy. Basically since these are converted to i8 undert the hood to actually do the operations, need to:
roughtly here: |
@jreback I can fix in my local repo, but I'll need to wait to do so from home to push back ... working behind a draconian corporate firewall :( |
sure np |
closes pandas-dev#15426 Author: Stephen Rauch <[email protected]> Closes pandas-dev#15433 from stephenrauch/tz-lost-in-groupby-agg and squashes the following commits: 64a84ca [Stephen Rauch] BUG: GH15426 timezone lost in groupby-agg with cython functions
xref #10668 (for more examples)
Hello!
I'm running into some odd behavior trying to group rows of a pandas dataframe by ID and then selecting out max/min datetimes (w/ timezones). This is with python 2.7, pandas 0.18.1 and numpy 1.11.1 (I saw in earlier posts a similar problem was apparently fixed w/ pandas 0.15).
Specifically, if I try:
print orders.groupby('OrderID')['start_time'].agg(np.min).iloc[:5]
I get:
Where the raw data had times closer to 8 am (US/Eastern). In other words, it reverted back to UTC times, even though it says it's eastern times, and has UTC-4 offset.
But if I instead try:
print orders.groupby('OrderID')['start_time'].agg(lambda x: np.min(x)).iloc[:5]
I now get:
Which is the behavior I intended. This second method is vastly slower, and I would have assumed the two approaches would yield identical results ...
The text was updated successfully, but these errors were encountered: