Skip to content

BUG/API: master issue for resample on DST days #5172

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 of 7 tasks
dflatow opened this issue Oct 10, 2013 · 6 comments
Closed
5 of 7 tasks

BUG/API: master issue for resample on DST days #5172

dflatow opened this issue Oct 10, 2013 · 6 comments
Labels
API Design Bug Master Tracker High level tracker for similar issues Resample resample method Timezones Timezone data dtype
Milestone

Comments

@dflatow
Copy link

dflatow commented Oct 10, 2013

df = pd.DataFrame([0], index=[datetime(2012, 11, 4, 23, 0, 0)])
df = df.tz_localize('America/New_York')
df.resample(rule='D', how='sum')

raises a AmbiguousTimeError even though datetime(2012, 11, 4, 23, 0, 0) is not ambiguous.

idx = date_range('2014-10-08 00:00','2014-10-09 00:00', freq='D', tz='Europe/Berlin')
pd.Series(5, idx).resample('MS')
index = pd.to_datetime(pd.Series([
'2014-10-26 07:35:49',
'2014-10-26 07:45:08',
'2014-10-26 08:04:58'
]))

df = pd.DataFrame(np.arange(len(index)), index=index)
df = df.tz_localize('Asia/Krasnoyarsk', ambiguous='NaT')
df.resample('D')

AmbiguousTimeError: Cannot infer dst time from Timestamp('2014-10-26 01:00:00'), try using the 'ambiguous' argument
import datetime
import pytz as tz
import pandas as pd

rome = tz.timezone('Europe/Rome')

dr = []
for i in range(2):
    dp = datetime.datetime(2014, 10, 25) + datetime.timedelta(days=i)
    dr.append(rome.localize(dp))

series = {}
for i, ddr in enumerate(dr):
    series[ddr] = i * 10

s1 = pd.Series(series)
s1 = s1.resample('D', how='mean')
@jreback
Copy link
Contributor

jreback commented Oct 10, 2013

cc @rockg

can you take a look?

@rockg
Copy link
Contributor

rockg commented Oct 17, 2013

A little background...what is done behind the scenes is to make a DatetimeIndex that has the passed frequency that forms the basis for the resample. In this case, it's a daily curve. Unfortunately, the AmbiguousTime s is related to what we've seen with other things (e.g., #5175). Basically your date is normalized (_adjust_dates_anchored in tseries.resample), but it returns 2012-11-04 00:00:00-05:00 when it should return 2012-11-04 00:00:00-04:00. The dates are normalized using datetime.replace() by setting components to zero, but the tzoffset does not change. I believe the fix is to localize. Let me do some digging on possible solutions. @dflatow, in the meantime a simple fix would be to use UTC or not localize at all.

@dflatow dflatow closed this as completed Oct 30, 2013
@dflatow dflatow reopened this Oct 30, 2013
@dflatow
Copy link
Author

dflatow commented Oct 30, 2013

My workaround is to use a groupby instead of resample (the timezones matter for the analysis I'm doing - so UTC won't work). Any word on a fix though?

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Apr 9, 2014
@jreback jreback modified the milestones: 0.15.2, 0.16.0 Nov 6, 2014
@jreback jreback changed the title resample issue on DST days? BUG/API: master issue for resample on DST days Nov 6, 2014
@rockg
Copy link
Contributor

rockg commented Nov 8, 2014

Good news is that I have all of these figured out. One issue I'm having is related to #8601. Basically when you take a time in non-DST and apply the +MS rule, it goes to the last hour of the month because of DST (see #5175) for more explanation. In that commit the logic was applied to DateOffset, but all the other offsets need to remove the timezone, apply the offset, and then put it back. This seems pretty ugly and I'm looking for suggestions to handle this in a uniform way rather than duplicating the logic for every single offset.

import pandas as pd
ts = pd.Timestamp('11/2/2012', tz='US/Eastern')
ts + pd.offsets.MonthBegin()
Out[4]: Timestamp('2012-11-30 23:00:00-0500', tz='US/Eastern') #Should be 2012-12-01 00

The below basically needs to be added to apply for each offset.

        tzinfo = getattr(other, 'tzinfo', None)
        if tzinfo is not None:
            other = other.replace(tzinfo=None)

        # Apply offset, e.g., other = other + relativedelta(...)
        if tzinfo is not None:
            other = tslib._localize_pydatetime(other, tzinfo)

The other issue was caused by _adjust_dates_anchored and the assumption made about the hours in a day. This assumption fails when you are on a DST day, but it was a pretty simple fix (basically move everything to UTC, calculate the anchor, and then relocalize).

Another problem is that tslib.normalize_date doesn't handle DST properly. I'm inclined to change this so that when you normalize an hour on the DST day and after the offset change, it returns the right offset.

Wrong:

import pandas as pd
ts = pd.Timestamp('11/4/2012 23:00', tz='US/Eastern')
ts
Out[4]: Timestamp('2012-11-04 23:00:00-0500', tz='US/Eastern')
pd.tseries.tools.normalize_date(ts)
Out[5]: Timestamp('2012-11-04 00:00:00-0500', tz='US/Eastern') #Should be -0400

Right:

import pandas as pd
ts = pd.Timestamp('11/4/2012 23:00', tz='US/Eastern')
ts
Out[4]: Timestamp('2012-11-04 23:00:00-0500', tz='US/Eastern')
pd.tseries.tools.normalize_date(ts)
Out[5]: Timestamp('2012-11-04 00:00:00-0400', tz='US/Eastern')

@rockg
Copy link
Contributor

rockg commented Nov 8, 2014

I guess the removal of tzinfofrom the offset should really happen in apply_wraps which there is some logic for but the tzinfo is not removed from other.

@jreback jreback modified the milestones: 0.16.0, 0.15.2 Dec 4, 2014
@jreback jreback added the Master Tracker High level tracker for similar issues label Mar 6, 2015
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 10, 2015
@jreback
Copy link
Contributor

jreback commented Mar 11, 2015

closed by #9623

@jreback jreback closed this as completed Mar 11, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Master Tracker High level tracker for similar issues Resample resample method Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants