Skip to content

BUG: UTC offset incorrect on dst transition with dateutil tz when using date_range #46499

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
tnielens opened this issue Mar 24, 2022 · 3 comments
Closed
2 of 3 tasks
Labels
Bug Timezones Timezone data dtype

Comments

@tnielens
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pytz
from  dateutil.tz import gettz
# CET = pytz.timezone("Europe/Paris") 
CET = gettz("Europe/Paris") 
start = pd.Timestamp("2021-10-31 01:45", tz=CET)
idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
pd.Series(idx, idx)

# 2021-10-31 01:45:00+02:00   2021-10-31 01:45:00+02:00
# 2021-10-31 02:15:00+02:00   2021-10-31 02:15:00+02:00
# 2021-10-31 02:45:00+02:00   2021-10-31 02:45:00+02:00
# 2021-10-31 02:15:00+01:00   2021-10-31 02:15:00+02:00 --> offset of index changes here
# 2021-10-31 02:45:00+01:00   2021-10-31 02:45:00+02:00
# 2021-10-31 03:15:00+01:00   2021-10-31 03:15:00+01:00 --> offset of values changes here
# 2021-10-31 03:45:00+01:00   2021-10-31 03:45:00+01:00
# Freq: 30T, dtype: datetime64[ns, tzfile('/usr/share/zoneinfo/Europe/Paris')]

Issue Description

I noticed unexpected duplicated values when manipulating timestamp ranges with the CET timezone of dateutil. There is no issue with the same timezone from pytz.

Expected Behavior

No duplicates in the values of the series here above.

Installed Versions

INSTALLED VERSIONS

commit : bb1f651
python : 3.10.2.final.0
python-bits : 64
OS : Darwin
OS-release : 21.3.0
Version : Darwin Kernel Version 21.3.0: Wed Jan 5 21:37:58 PST 2022; root:xnu-8019.80.24~20/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.4.0
numpy : 1.21.0
pytz : 2020.1
dateutil : 2.8.2
pip : 22.0.4
setuptools : 58.1.0
Cython : None
pytest : None
hypothesis : None
...
xarray : None
xlrd : 2.0.1
xlwt : None
zstandard : None

@tnielens tnielens added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 24, 2022
@Anupam-USP
Copy link
Contributor

I am able to understand the issue, but how should I proceed?

@mroeschke mroeschke added Timezones Timezone data dtype and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 28, 2022
@mroeschke
Copy link
Member

The issue might be in the repr as the underlying values are the same

In [17]: import pandas as pd
    ...: import pytz
    ...: from  dateutil.tz import gettz
    ...: # CET = pytz.timezone("Europe/Paris")
    ...: CET = "Europe/Paris"
    ...: start = pd.Timestamp("2021-10-31 01:45", tz=CET)
    ...: idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
    ...: pd.Series(idx, idx).astype("i8")
Out[17]:
2021-10-31 01:45:00+02:00    1635637500000000000
2021-10-31 02:15:00+02:00    1635639300000000000
2021-10-31 02:45:00+02:00    1635641100000000000
2021-10-31 02:15:00+01:00    1635642900000000000
2021-10-31 02:45:00+01:00    1635644700000000000
2021-10-31 03:15:00+01:00    1635646500000000000
2021-10-31 03:45:00+01:00    1635648300000000000
Freq: 30T, dtype: int64

In [18]: import pandas as pd
    ...: import pytz
    ...: from  dateutil.tz import gettz
    ...: # CET = pytz.timezone("Europe/Paris")
    ...: CET = gettz("Europe/Paris")
    ...: start = pd.Timestamp("2021-10-31 01:45", tz=CET)
    ...: idx = pd.date_range(start, pd.Timestamp("2021-10-31 03:45", tz=CET), freq="30T")
    ...: pd.Series(idx, idx).astype("i8")
Out[18]:
2021-10-31 01:45:00+02:00    1635637500000000000
2021-10-31 02:15:00+02:00    1635639300000000000
2021-10-31 02:45:00+02:00    1635641100000000000
2021-10-31 02:15:00+01:00    1635642900000000000
2021-10-31 02:45:00+01:00    1635644700000000000
2021-10-31 03:15:00+01:00    1635646500000000000
2021-10-31 03:45:00+01:00    1635648300000000000
Freq: 30T, dtype: int64

@mroeschke mroeschke changed the title BUG: date_range puzzler on dst transition with dateutil tz BUG: UTC offset incorrect on dst transition with dateutil tz when using date_range Jul 6, 2022
@jbrockmendel
Copy link
Member

fixed by #49684

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

4 participants