Skip to content

BUG: tslib.tz_convert and tslib.tz_convert_single may output different result in DST #7798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 3, 2014

Conversation

sinhrks
Copy link
Member

@sinhrks sinhrks commented Jul 18, 2014

These functions may return different result in case of DST. There seems to be 2 problems:

  • tslib.tz_convert checks DST and change deltas by adjusting pos by 1. If input has time-gaps more than 2 DST spans, result will be incorrect.
  • tslib.tz_convert_single results incorrect if input is just on DST edge.
import pandas as pd
import numpy as np
import datetime
import pytz

idx = pd.date_range('2014-03-01', '2015-01-10', freq='H')

f = lambda x: pd.tslib.tz_convert_single(x, pd.tslib.maybe_get_tz('US/Eastern'), 'UTC')
result = pd.tslib.tz_convert(idx.asi8, pd.tslib.maybe_get_tz('US/Eastern'), 'UTC')
result_single = np.vectorize(f)(idx.asi8)
result[result != result_single]
# [1394370000000000000 1394373600000000000 1394377200000000000 ...,
#  1414918800000000000 1414922400000000000 1414926000000000000]

Note

Additionally, it was modifed to close #7880 also.

@sinhrks
Copy link
Member Author

sinhrks commented Jul 18, 2014

Perf result. I think it increases a performance of inferred_freq for tz-aware data a bit.

...
write_store                                  |  11.6420 |   9.7557 |   1.1934 |
reindex_fillna_pad                           |   1.0404 |   0.8606 |   1.2089 |
-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------

Ratio < 1.0 means the target commit is faster then the baseline.
Seed used: 1234

Target [7a61a96] : BUG: tslib.tzconvert handles DST incorrectly
Base   [30764bd] : Merge pull request #7783 from lexual/docs_categorical_tidyup

@jreback
Copy link
Contributor

jreback commented Jul 18, 2014

can you run the perf test here: #7652 (and the 'test' in #7633,
which couldn't figure out how to put in a vbench, its basically a str(df) with a big period index).

just to make sure the perf is still ok

(your fix looks good though)

@jreback jreback added this to the 0.15.0 milestone Jul 18, 2014
@sinhrks
Copy link
Member Author

sinhrks commented Jul 19, 2014

OK, done.

-------------------------------------------------------------------------------
Test name                                    | head[ms] | base[ms] |  ratio   |
-------------------------------------------------------------------------------
datetimeindex_normalize                      |   5.6077 |   5.7670 |   0.9724 |

Target [3c8e2b3] : BUG: tslib.tzconvert handles DST incorrectly
Base   [8cd3dd6] : Merge pull request #7652 from jreback/dst_transitions

@sinhrks
Copy link
Member Author

sinhrks commented Jul 19, 2014

And test in #7633 takes few seconds.

import pandas as pd

import numpy as np
period = 1.0 / 2048 * 1e9
freq = pd.datetools.Nano(period)
df = pd.DataFrame({'a': np.random.randn(6e6)}, index=pd.date_range('now', periods=6e6, freq=freq, tz='EST'))
print(df.a)
...
# 2014-07-19 03:08:12.685023438-05:00    0.650118
# 2014-07-19 03:08:12.685511719-05:00   -0.683781
# Freq: 488281N, Name: a, Length: 6000000
# [Finished in 1.9s]

@rockg
Copy link
Contributor

rockg commented Jul 20, 2014

Should we add a test for the fall DST transition similar to what you did for spring? I understand there is not really any specific "spring" code per se, but sometimes strange things popup. Otherwise the changes look fine.

@sinhrks
Copy link
Member Author

sinhrks commented Jul 22, 2014

@rockg Yep, added tests. Could you check whether it is valid, otherwise lmk better one?


def test_tslib_tz_convert_dst(self):
# Start DST
idx = date_range('2014-03-08 23:00', '2014-03-09 09:00', freq='1min', tz='UTC')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a freq loop around this with several freqs? (I know a bit annoying as you have to specify the result as well). you have T, maybe H and D? (as D should be unaffected by the DST transitions, while T and H are)

@rockg
Copy link
Contributor

rockg commented Jul 22, 2014

That's exactly what I was thinking.

@sinhrks
Copy link
Member Author

sinhrks commented Jul 22, 2014

OK, modified test to cover other freqs.

@sinhrks
Copy link
Member Author

sinhrks commented Jul 30, 2014

Added the fix for #7880.

@sinhrks
Copy link
Member Author

sinhrks commented Aug 2, 2014

@jreback I think it is ready, lmk if anything.

@jreback
Copy link
Contributor

jreback commented Aug 2, 2014

looks fine, what was the problem with #7880 ? (e.g. what fixed it)

@sinhrks
Copy link
Member Author

sinhrks commented Aug 3, 2014

Currently, it includes 2 fixes (described in release note)

jreback added a commit that referenced this pull request Aug 3, 2014
BUG: tslib.tz_convert and tslib.tz_convert_single may output different result in DST
@jreback jreback merged commit ce5a4ef into pandas-dev:master Aug 3, 2014
@jreback
Copy link
Contributor

jreback commented Aug 3, 2014

thanks!

@sinhrks sinhrks deleted the tz_convert_bug branch August 4, 2014 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Intersection of non-overlapping DatetimeIndexes with tz set fails
3 participants