-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Avoid AmbiguousTimeError on groupby #14683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ | |
|
||
import pandas as pd | ||
from pandas.core.base import AbstractMethodError, GroupByMixin | ||
from pandas.core.config_init import pc_ambiguous_as_wide_doc | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why are you importing this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IDE mischieve, I'll remove. |
||
|
||
from pandas.core.groupby import (BinGrouper, Grouper, _GroupBy, GroupBy, | ||
SeriesGroupBy, groupby, PanelGroupBy) | ||
|
@@ -1284,8 +1285,13 @@ def _adjust_dates_anchored(first, last, offset, closed='right', base=0): | |
# See https://github.com/pandas-dev/pandas/issues/8683 | ||
|
||
first_tzinfo = first.tzinfo | ||
last_tzinfo = last.tzinfo | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. put the issue number as a comment and a brief expl. future readers will appreciate! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will |
||
first_dst = bool(first.dst()) | ||
last_dst = bool(last.dst()) | ||
|
||
first = first.tz_localize(None) | ||
last = last.tz_localize(None) | ||
|
||
start_day_nanos = first.normalize().value | ||
|
||
base_nanos = (base % offset.n) * offset.nanos // offset.n | ||
|
@@ -1320,11 +1326,8 @@ def _adjust_dates_anchored(first, last, offset, closed='right', base=0): | |
else: | ||
lresult = last.value + offset.nanos | ||
|
||
# return (Timestamp(fresult, tz=first.tz), | ||
# Timestamp(lresult, tz=last.tz)) | ||
|
||
return (Timestamp(fresult).tz_localize(first_tzinfo), | ||
Timestamp(lresult).tz_localize(first_tzinfo)) | ||
return (Timestamp(fresult).tz_localize(first_tzinfo, ambiguous=first_dst), | ||
Timestamp(lresult).tz_localize(last_tzinfo, ambiguous=last_dst)) | ||
|
||
|
||
def asfreq(obj, freq, method=None, how=None, normalize=False): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1912,7 +1912,18 @@ def test_resample_size(self): | |
right = Series(val, index=ix) | ||
assert_series_equal(left, right) | ||
|
||
def test_resmaple_dst_anchor(self): | ||
def test_resample_across_dst(self): | ||
#14682 | ||
df1 = DataFrame([1477786980, 1477790580], columns=['ts']) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. give a 1-liner about what you are testing here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I will. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One question... where's the whatsnew file I need to add the description to? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is a bug, so you can put in 0.19.2 |
||
df2 = DataFrame([1477785600, 1477789200], columns=['ts']) | ||
dti1 = DatetimeIndex(pd.to_datetime(df1.ts, unit='s').dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid')) | ||
dti2 = DatetimeIndex(pd.to_datetime(df2.ts, unit='s').dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid')) | ||
df = DataFrame([5, 5], index=dti1) | ||
assert_frame_equal( | ||
df.resample(rule='H').sum(), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok |
||
DataFrame([5, 5], index=dti2)) | ||
|
||
def test_resample_dst_anchor(self): | ||
# 5172 | ||
dti = DatetimeIndex([datetime(2012, 11, 4, 23)], tz='US/Eastern') | ||
df = DataFrame([5], index=dti) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this should not be necessary; str comparison already does this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get an exception if I don't... apparently the timezone with and without dst are not the same thing:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, can you repro that directly via a DatetimeIndex construction call?
does it break anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I've traced the issue to the tz of input index:
I do:
Now, this dti1 index has a funny tz attribute:
<DstTzInfo 'Europe/Paris' LMT+0:09:00 STD>
When resampling, the index tz is used in creating the new index and it is where the assert comes.
That probably means I don't need this change... but the fix is needed elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad.... To reproduce:
I've also found that the same check is done in pandas.tseries.tools.py:
I'll submit now the changes and see if this time they are ok.