Skip to content

BUG: 'infer_freq' does not work with tz != "UTC" #39556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sdementen opened this issue Feb 2, 2021 · 3 comments · Fixed by #39644
Closed

BUG: 'infer_freq' does not work with tz != "UTC" #39556

sdementen opened this issue Feb 2, 2021 · 3 comments · Fixed by #39644
Labels
Datetime Datetime data dtype Timezones Timezone data dtype
Milestone

Comments

@sdementen
Copy link
Contributor

The issue has been reported in #8772 which has been closed asking for an example (#8772 (comment)).

Here is the example

import pandas

for tz in [None, "UTC", "CET"]:
    index = pandas.date_range("2018", "2019", tz=tz, closed="left", freq='H')
    print(f"tz = {tz}, index freq = {index.freq}, inferred index freq = {pandas.infer_freq(index)}")

Problem description

pandas.infer_freq can infer the frequency when the index has no tz or tz=="UTC". But for tz=="CET", it returns None.

This is due to a special handling in https://github.com/pandas-dev/pandas/blob/master/pandas/tseries/frequencies.py#L198 to cater for timezone DST in frequencies beyond the hour (day, month,...). However for regular hourly frequencies, the delta which is constant when considering real time/UTC will now show multiple deltas (0 for the DST in october, 2H for the DST in march, 1H for the rest).
Probably needs two paths, one for business frequencies (days, months, ...) and one for hour and below frequencies. Not sure about frequencies between H and D (like 6H) but probably to be considered as business frequencies.

Expected Output

# output
# tz = None, index freq = <Hour>, inferred index freq = H
# tz = UTC, index freq = <Hour>, inferred index freq = H
# tz = CET, index freq = <Hour>, inferred index freq = None

Output of pd.show_versions()

INSTALLED VERSIONS

commit : b5958ee
python : 3.8.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : French_Belgium.1252

pandas : 1.1.5
numpy : 1.19.4
pytz : 2020.4
dateutil : 2.8.1
pip : 20.2.4
setuptools : 50.3.2
Cython : 0.29.17
pytest : 6.1.2
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.19.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : 0.8.4
fastparquet : 0.4.2
gcsfs : None
matplotlib : 3.3.4
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 2.0.0
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.4
sqlalchemy : None
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.52.0

@sdementen sdementen added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 2, 2021
@sdementen
Copy link
Contributor Author

my current workaround to bypass the conversion

index = pandas.date_range("2018", "2019", tz="CET", closed="left", freq='H')
freq = pandas.infer_freq(idx.tz_convert("UTC"))

sdementen added a commit to sdementen/pandas that referenced this issue Feb 7, 2021
…and pandas-dev#8772)

Fixes the issues pandas-dev#39556 and pandas-dev#8772 by ensuring that the check for delta being a multiple of a frequency also checks the delta is not 0 (which is a multiple of any number).
@MarcoGorelli
Copy link
Member

Thanks @sdementen - you have an old version of pandas (1.1.5), but I can confirm this reproduces on master

@MarcoGorelli MarcoGorelli added Timezones Timezone data dtype Datetime Datetime data dtype and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 7, 2021
@sdementen
Copy link
Contributor Author

I am working on a simple fix/PR

sdementen pushed a commit to sdementen/pandas that referenced this issue Feb 7, 2021
- check that the delta are unique before checking if the are day multiples
- add test with freq="H" that raises the bug
@jreback jreback added this to the 1.3 milestone Feb 22, 2021
mroeschke pushed a commit that referenced this issue Feb 23, 2021
* fix #39556:
- check that the delta are unique before checking if the are day multiples
- add test with freq="H" that raises the bug

* fix lint

* when freq=="B", the deltas are not unique (1 or 3 days) => change by taking for delta the minimum of deltas and checking delta is not null

* add whatsnew entry

* as self.deltas is already ordered, deltas[0] is the minimum delta

Co-authored-by: GFJ138 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants