-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Can't floor timezone aware DatetimeIndex
#46703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think this is the expected behavior as stated in the docs in the Notes section https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.floor.html
|
Thank you for looking into this. The Current (imperfect) workaround for timestamps: floored_ts1 = ts1.tz_convert('UTC').floor('H').tz_convert(ts1.tz) # NB: incorrect if ts1 has fractional offset (like 03:30 or 03:45) to UTC.
floored_ts1.freq = ts1.freq |
This is a DatetimeIndex
Since this DatetimeIndex has Timestamps that fall within the DST transition, this is an ambiguous operation since it's equivalent to
So you'll need to specify one of the
|
DatetimeIndex
Timestamp
Thanks @mroeschke for your answer. My question was originally about Either way, maybe the method below is helpful. Compared to the current implementation, it...
import pandas as pd
import pytz
def floor2(self, freq):
if self.tz is None:
return self.__class__(self.floor(freq), freq=self.freq)
# All timestamps are converted to the anchor's UTC offset.
anchor = self if isinstance(self, pd.Timestamp) else self[0]
# Turn into fixed offset to eliminate ambiguity...
fo = pytz.FixedOffset(anchor.utcoffset().total_seconds() / 60)
# (fo = 'UTC' gives incorrect result if UTC-offset is not integer number of hours)
# ...then floor and convert back to original timezone...
newinstance = self.tz_convert(fo).floor(freq).tz_convert(self.tz)
# ...and keep original frequency.
return self.__class__(newinstance, freq=self.freq)
pd.DatetimeIndex.floor2 = floor2
pd.Timestamp.floor2 = floor2
# Works for flooring DatetimeIndex to quarterhour
# ------------------------------------------------------
i0 = pd.date_range("2020-10-25 1:11", freq="H", periods=4)
i0.floor2("15T")
#DatetimeIndex(['2020-10-25 01:00:00', '2020-10-25 02:00:00',
# '2020-10-25 03:00:00', '2020-10-25 04:00:00'],
# dtype='datetime64[ns]', freq='H')
i1 = pd.date_range("2020-10-25 1:11", freq="H", periods=4, tz="Europe/Berlin")
i1.floor2("15T")
# DatetimeIndex(['2020-10-25 01:00:00+02:00', '2020-10-25 02:00:00+02:00',
# '2020-10-25 02:00:00+01:00', '2020-10-25 03:00:00+01:00'],
# dtype='datetime64[ns, Europe/Berlin]', freq='H')
i2 = pd.date_range("2020-10-25 1:11", freq="H", periods=4, tz="Asia/Kolkata")
i2.floor2("15T")
# DatetimeIndex(['2020-10-25 01:00:00+05:30', '2020-10-25 02:00:00+05:30',
# '2020-10-25 03:00:00+05:30', '2020-10-25 04:00:00+05:30'],
# dtype='datetime64[ns, Asia/Kolkata]', freq='H')
# Works for flooring Timestamp to quarterhour
# --------------------------------------------------
i0[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00', freq='H')
i1[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00+0200', tz='Europe/Berlin', freq='H')
i2[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00+0530', tz='Asia/Kolkata', freq='H') It's not finished/perfect yet - e.g. flooring to days is not done correctly. What's needed is a check: if the flooring is to the nearest hour, or shorter, the above code can be used. If it's to a frequency that's longer than an hour, the original code could be used. But this is quickly turning into a code review. Let me know if you think it's helpful if I turn this into a pull request. |
Timestamp
DatetimeIndex
@mroeschke , why is "wall" time selected for flooring the datetime? why are times not converted by default to "UTC", floored and then returned again to set timezone? |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
All timestamps in the above example are well-defined having both a timezone (Europe/Berlin) and a UTC-offset (+2 or +1). Flooring them to the nearest full (quarter)hour should be possible, but raises an error instead.
Expected Behavior
In this case, I don't think the result of either flooring operation is ambiguous. The first should return
i1
, and the second should returnts1
.Installed Versions
INSTALLED VERSIONS
commit : 06d2301
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : de_DE.cp1252
pandas : 1.4.1
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : None
pytest : 7.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.1.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
sqlalchemy : 1.4.32
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
The text was updated successfully, but these errors were encountered: