Skip to content

BUG: Avoid AmbiguousTimeError on groupby #14683

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

j-santander
Copy link

@j-santander j-santander commented Nov 17, 2016

closes #14682

@TomAugspurger
Copy link
Contributor

Could you add your example from the issue as a test? Also a release note in the whatsnew file.

@j-santander
Copy link
Author

it seems I've broken something.... I'll try to fix.

@jreback jreback added Groupby Timezones Timezone data dtype labels Nov 18, 2016
@@ -5,6 +5,7 @@

import pandas as pd
from pandas.core.base import AbstractMethodError, GroupByMixin
from pandas.core.config_init import pc_ambiguous_as_wide_doc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you importing this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDE mischieve, I'll remove.

def test_resmaple_dst_anchor(self):
def test_resample_across_dst(self):
#14682
df1 = DataFrame([1477786980, 1477790580], columns=['ts'])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give a 1-liner about what you are testing here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question... where's the whatsnew file I need to add the description to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bug, so you can put in 0.19.2

@@ -1284,8 +1285,13 @@ def _adjust_dates_anchored(first, last, offset, closed='right', base=0):
# See https://github.com/pandas-dev/pandas/issues/8683

first_tzinfo = first.tzinfo
last_tzinfo = last.tzinfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the issue number as a comment and a brief expl. future readers will appreciate!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will

@@ -439,7 +439,7 @@ def _generate(cls, start, end, periods, name, offset,
tz = tz.localize(date.replace(tzinfo=None)).tzinfo

if tz is not None and inferred_tz is not None:
if not inferred_tz == tz:
if not tslib.get_timezone(inferred_tz) == tslib.get_timezone(tz):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this should not be necessary; str comparison already does this

Copy link
Author

@j-santander j-santander Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get an exception if I don't... apparently the timezone with and without dst are not the same thing:

Traceback (most recent call last):
  File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\tests\test_resample.py", line 1923, in test_resample_across_dst
    df.resample(rule='H').sum(),
  File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 541, in f
    return self._downsample(_method)
  File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 672, in _downsample
    self._set_binner()
  File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 238, in _set_binner
    self.binner, self.grouper = self._get_binner()
  File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 246, in _get_binner
    binner, bins, binlabels = self._get_binner_for_time()
  File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 661, in _get_binner_for_time
    return self.groupby._get_time_bins(self.ax)
  File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\resample.py", line 1130, in _get_time_bins
    name=ax.name)
  File "C:\Users\jsantand\Code\video\be\pandas\pandas\util\decorators.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\index.py", line 301, in __new__
    ambiguous=ambiguous)
  File "C:\Users\jsantand\Code\video\be\pandas\pandas\tseries\index.py", line 444, in _generate
    raise AssertionError("Inferred time zone not equal to passed "
AssertionError: Inferred time zone not equal to passed time zone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, can you repro that directly via a DatetimeIndex construction call?

does it break anything else?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I've traced the issue to the tz of input index:

I do:

df1 = DataFrame([1477786980, 1477790580], columns=['ts'])
dti1 = DatetimeIndex(pd.to_datetime(df1.ts, unit='s').dt.tz_localize('UTC').dt.tz_convert('Europe/Madrid'))

Now, this dti1 index has a funny tz attribute: <DstTzInfo 'Europe/Paris' LMT+0:09:00 STD>
When resampling, the index tz is used in creating the new index and it is where the assert comes.

That probably means I don't need this change... but the fix is needed elsewhere.

Copy link
Author

@j-santander j-santander Nov 18, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my bad.... To reproduce:

t1=Timestamp(2016,10,30,2,23).tz_localize('Europe/Madrid',ambiguous=True)
t2=Timestamp(2016,10,30,2,13).tz_localize('Europe/Madrid',ambiguous=False)
dti=DatetimeIndex(start=t1,end=t2,freq='H',tz='Europe/Madrid')

I've also found that the same check is done in pandas.tseries.tools.py:

def _infer_tzinfo(start, end):
    def _infer(a, b):
        tz = a.tzinfo
        if b and b.tzinfo:
            if not (tslib.get_timezone(tz) == tslib.get_timezone(b.tzinfo)):
                raise AssertionError('Inputs must both have the same timezone,'
                                     ' {0} != {1}'.format(tz, b.tzinfo))
        return tz

    tz = None
    if start is not None:
        tz = _infer(start, end)
    elif end is not None:
        tz = _infer(end, start)
    return tz

I'll submit now the changes and see if this time they are ok.

@codecov-io
Copy link

codecov-io commented Nov 18, 2016

Current coverage is 85.20% (diff: 100%)

Merging #14683 into master will decrease coverage by 0.08%

@@             master     #14683   diff @@
==========================================
  Files           140        143     +3   
  Lines         50706      50790    +84   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43248      43275    +27   
- Misses         7458       7515    +57   
  Partials          0          0          

Powered by Codecov. Last update 45543ec...d90afaf

@@ -43,3 +43,4 @@ Bug Fixes


- Explicit check in ``to_stata`` and ``StataWriter`` for out-of-range values when writing doubles (:issue:`14618`)
- Fix AmbiguousTimeError exception when resampling a DatetimeIndex in local TZ, covering a DST change (:issue:`14682`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug in Ambiguous.... (use double-backticks around exception types and DatetimeIndex)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change

.dt.tz_convert('Europe/Madrid'))
df = DataFrame([5, 5], index=dti1)
assert_frame_equal(
df.resample(rule='H').sum(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do

result = df.resample(....)
expected = ....
assert_frame_equal(result, expected)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@jreback jreback added the Bug label Nov 21, 2016
@jreback jreback added this to the 0.19.2 milestone Nov 21, 2016
@jreback jreback changed the title Avoid AmbiguousTimeError on groupby BUG: Avoid AmbiguousTimeError on groupby Nov 21, 2016
@jreback
Copy link
Contributor

jreback commented Nov 21, 2016

lgtm.

cc @rockg

@rockg
Copy link
Contributor

rockg commented Nov 21, 2016

Looks fine.

@jreback jreback closed this in 9f2e453 Nov 22, 2016
@jreback
Copy link
Contributor

jreback commented Nov 22, 2016

thanks @j-santander

keep em' coming!

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Dec 14, 2016
closes pandas-dev#14682

Author: Julian Santander <[email protected]>
Author: Julian Santander <[email protected]>

Closes pandas-dev#14683 from j-santander/master and squashes the following commits:

d90afaf [Julian Santander] Addressing additional code inspection comments
817ed97 [Julian Santander] Addressing code inspections comments
99a5367 [Julian Santander] Fix unittest error and lint warning
940fb22 [Julian Santander] Avoid AmbiguousTimeError on groupby

(cherry picked from commit 9f2e453)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AmbiguousTimeError on groupby when including a DST change
6 participants