Skip to content

date_range returns duplicated dates when used with dateutil over a range that includes the end of daylight saving time #24373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lovasoa opened this issue Dec 20, 2018 · 4 comments
Labels
Output-Formatting __repr__ of pandas objects, to_string Timezones Timezone data dtype

Comments

@lovasoa
Copy link

lovasoa commented Dec 20, 2018

Code Sample, a copy-pastable example if possible

import pandas as pd
from dateutil.tz import gettz

TZ = 'Europe/Paris'
start = '2018-10-28T01:30' # Just before the end of daylight saving time in Paris
end = '2018-10-28T03:30' # Just after the end of daylight saving time in Paris

range_str = pd.date_range(
    start=pd.Timestamp(start, tz=TZ),
    end=pd.Timestamp(end, tz=TZ),
    freq='30min'
)


range_dateutil = pd.date_range(
    start=pd.Timestamp(start, tz=gettz(TZ)),
    end=pd.Timestamp(end, tz=gettz(TZ)),
    freq='30min'
)

assert((range_str.astype('str') == range_dateutil.astype('str')).all())

Problem description

In the example above range_dateutil is :

# range_dateutil
DatetimeIndex([
'2018-10-28 01:30:00+02:00',
'2018-10-28 02:00:00+02:00',
'2018-10-28 02:30:00+02:00',
'2018-10-28 02:00:00+02:00', # should be +01:00
'2018-10-28 02:30:00+02:00', # should be +01:00
'2018-10-28 03:00:00+01:00',
'2018-10-28 03:30:00+01:00'
],
dtype='datetime64[ns, tzfile('/usr/share/zoneinfo/Europe/Paris')]',
freq='30T')

In the example above, range_str is correct, but range_dateutil contains duplicated datetimes:
'2018-10-28 02:00:00+02:00' and '2018-10-28 02:30:00+02:00'.

I am not sure whether the problem is in pandas or dateutil.

Expected Output

# range_dateutil
DatetimeIndex([
'2018-10-28 01:30:00+02:00',
'2018-10-28 02:00:00+02:00',
'2018-10-28 02:30:00+02:00',
'2018-10-28 02:00:00+01:00',
'2018-10-28 02:30:00+01:00',
'2018-10-28 03:00:00+01:00',
'2018-10-28 03:30:00+01:00'
],
dtype='datetime64[ns, tzfile('/usr/share/zoneinfo/Europe/Paris')]',
freq='30T')

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: None
pip: 18.1
setuptools: 18.5
Cython: None
numpy: 1.15.4
scipy: 0.13.0b1
pyarrow: None
xarray: None
IPython: 5.8.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 1.3.1
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@lovasoa
Copy link
Author

lovasoa commented Dec 20, 2018

Here are the contents of my local /usr/share/zoneinfo/Europe/Paris:
Europe_Paris_tzfile.zip

@mroeschke
Copy link
Member

So this may be a repr issue as the underlying epoch timestamps are the same:

# range_pytz is your range_str
In [9]: range_pytz.asi8 == range_dateutil.asi8
Out[9]: array([ True,  True,  True,  True,  True,  True,  True])

@mroeschke mroeschke added Output-Formatting __repr__ of pandas objects, to_string Timezones Timezone data dtype labels Dec 20, 2018
@mroeschke
Copy link
Member

This might be quite a lift to support currently as it appears that dateutil uses the new fold argument to detect ambiguity, and we still support Python 2 (not for long though).

Note that the underlying data is still correct.

@mroeschke
Copy link
Member

Fixed by #31563

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Output-Formatting __repr__ of pandas objects, to_string Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

2 participants