-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Avoid AmbiguousTimeError on groupby #14683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Could you add your example from the issue as a test? Also a release note in the whatsnew file. |
it seems I've broken something.... I'll try to fix. |
@@ -5,6 +5,7 @@ | |||
|
|||
import pandas as pd | |||
from pandas.core.base import AbstractMethodError, GroupByMixin | |||
from pandas.core.config_init import pc_ambiguous_as_wide_doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you importing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDE mischieve, I'll remove.
def test_resmaple_dst_anchor(self): | ||
def test_resample_across_dst(self): | ||
#14682 | ||
df1 = DataFrame([1477786980, 1477790580], columns=['ts']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
give a 1-liner about what you are testing here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One question... where's the whatsnew file I need to add the description to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bug, so you can put in 0.19.2
@@ -1284,8 +1285,13 @@ def _adjust_dates_anchored(first, last, offset, closed='right', base=0): | |||
# See https://github.com/pandas-dev/pandas/issues/8683 | |||
|
|||
first_tzinfo = first.tzinfo | |||
last_tzinfo = last.tzinfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put the issue number as a comment and a brief expl. future readers will appreciate!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will
@@ -439,7 +439,7 @@ def _generate(cls, start, end, periods, name, offset, | |||
tz = tz.localize(date.replace(tzinfo=None)).tzinfo | |||
|
|||
if tz is not None and inferred_tz is not None: | |||
if not inferred_tz == tz: | |||
if not tslib.get_timezone(inferred_tz) == tslib.get_timezone(tz): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this should not be necessary; str comparison already does this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get an exception if I don't... apparently the timezone with and without dst are not the same thing:
Traceback (most recent call last):
File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\tests\test_resample.py", line 1923, in test_resample_across_dst
df.resample(rule='H').sum(),
File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 541, in f
return self._downsample(_method)
File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 672, in _downsample
self._set_binner()
File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 238, in _set_binner
self.binner, self.grouper = self._get_binner()
File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 246, in _get_binner
binner, bins, binlabels = self._get_binner_for_time()
File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 661, in _get_binner_for_time
return self.groupby._get_time_bins(self.ax)
File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 1130, in _get_time_bins
name=ax.name)
File "C:\Users\jsantand\Code\video\be\pandas\pandas\util\decorators.py", line 91, in wrapper
return func(*args, **kwargs)
File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\index.py", line 301, in __new__
ambiguous=ambiguous)
File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\index.py", line 444, in _generate
raise AssertionError("Inferred time zone not equal to passed "
AssertionError: Inferred time zone not equal to passed time zone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, can you repro that directly via a DatetimeIndex construction call?
does it break anything else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I've traced the issue to the tz of input index:
I do:
df1 = DataFrame([1477786980, 1477790580], columns=['ts'])
dti1 = DatetimeIndex(pd.to_datetime(df1.ts, unit='s').dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid'))
Now, this dti1 index has a funny tz attribute: <DstTzInfo 'Europe/Paris' LMT+0:09:00 STD>
When resampling, the index tz is used in creating the new index and it is where the assert comes.
That probably means I don't need this change... but the fix is needed elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my bad.... To reproduce:
t1=Timestamp(2016,10,30,2,23).tz_localize('Europe/Madrid',ambiguous=True)
t2=Timestamp(2016,10,30,2,13).tz_localize('Europe/Madrid',ambiguous=False)
dti=DatetimeIndex(start=t1,end=t2,freq='H',tz='Europe/Madrid')
I've also found that the same check is done in pandas.tseries.tools.py:
def _infer_tzinfo(start, end):
def _infer(a, b):
tz = a.tzinfo
if b and b.tzinfo:
if not (tslib.get_timezone(tz) == tslib.get_timezone(b.tzinfo)):
raise AssertionError('Inputs must both have the same timezone,'
' {0} != {1}'.format(tz, b.tzinfo))
return tz
tz = None
if start is not None:
tz = _infer(start, end)
elif end is not None:
tz = _infer(end, start)
return tz
I'll submit now the changes and see if this time they are ok.
Current coverage is 85.20% (diff: 100%)@@ master #14683 diff @@
==========================================
Files 140 143 +3
Lines 50706 50790 +84
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43248 43275 +27
- Misses 7458 7515 +57
Partials 0 0
|
@@ -43,3 +43,4 @@ Bug Fixes | |||
|
|||
|
|||
- Explicit check in ``to_stata`` and ``StataWriter`` for out-of-range values when writing doubles (:issue:`14618`) | |||
- Fix AmbiguousTimeError exception when resampling a DatetimeIndex in local TZ, covering a DST change (:issue:`14682`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug in Ambiguous....
(use double-backticks around exception types and DatetimeIndex
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll change
.dt.tz_convert('Europe/Madrid')) | ||
df = DataFrame([5, 5], index=dti1) | ||
assert_frame_equal( | ||
df.resample(rule='H').sum(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do
result = df.resample(....)
expected = ....
assert_frame_equal(result, expected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
lgtm. cc @rockg |
Looks fine. |
thanks @j-santander keep em' coming! |
closes pandas-dev#14682 Author: Julian Santander <[email protected]> Author: Julian Santander <[email protected]> Closes pandas-dev#14683 from j-santander/master and squashes the following commits: d90afaf [Julian Santander] Addressing additional code inspection comments 817ed97 [Julian Santander] Addressing code inspections comments 99a5367 [Julian Santander] Fix unittest error and lint warning 940fb22 [Julian Santander] Avoid AmbiguousTimeError on groupby (cherry picked from commit 9f2e453)
closes #14682