Skip to content

BUG: Can't floor timezone aware DatetimeIndex #46703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
rwijtvliet opened this issue Apr 8, 2022 · 5 comments
Closed
2 of 3 tasks

BUG: Can't floor timezone aware DatetimeIndex #46703

rwijtvliet opened this issue Apr 8, 2022 · 5 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Index Related to the Index class or subclasses Needs Triage Issue that has not been reviewed by a pandas team member Timezones Timezone data dtype Usage Question

Comments

@rwijtvliet
Copy link

rwijtvliet commented Apr 8, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
i1 = pd.date_range('2020-10-25 1:00', freq='H', periods= 4, tz='Europe/Berlin')
# DatetimeIndex(['2020-10-25 01:00:00+02:00', '2020-10-25 02:00:00+02:00',
#                '2020-10-25 02:00:00+01:00', '2020-10-25 03:00:00+01:00'],
#               dtype='datetime64[ns, Europe/Berlin]', freq='H')
ts1 = i1[1]
# Timestamp('2020-10-25 02:00:00+0200', tz='Europe/Berlin', freq='H')

# This error...
i1.floor('H') # AmbiguousTimeError: Cannot infer dst time from 2020-10-25 02:00:00, try using the 'ambiguous' argument
# ...can be solved using the `ambiguous` argument if we floor an index:
i1.floor('H', ambiguous='infer')

# But that won't work on the single timestamp:
ts1.floor('H') # AmbiguousTimeError: Cannot infer dst time from 2020-10-25 02:00:00, try using the 'ambiguous' argument
ts1.floor('H', ambiguous='infer') # ValueError: Cannot infer offset with only one time.

Issue Description

All timestamps in the above example are well-defined having both a timezone (Europe/Berlin) and a UTC-offset (+2 or +1). Flooring them to the nearest full (quarter)hour should be possible, but raises an error instead.

Expected Behavior

In this case, I don't think the result of either flooring operation is ambiguous. The first should return i1, and the second should return ts1.

Installed Versions

INSTALLED VERSIONS

commit : 06d2301
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : de_DE.cp1252

pandas : 1.4.1
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : None
pytest : 7.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.1.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
sqlalchemy : 1.4.32
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

@rwijtvliet rwijtvliet added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 8, 2022
@rhshadrach rhshadrach added Timezones Timezone data dtype Index Related to the Index class or subclasses datetime.date stdlib datetime.date support labels Apr 9, 2022
@mroeschke
Copy link
Member

I think this is the expected behavior as stated in the docs in the Notes section https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.floor.html

If the timestamps have a timezone, flooring will take place relative to the local (“wall”) time and re-localized to the same timezone. When flooring near daylight savings time, use nonexistent and ambiguous to control the re-localization behavior.

@mroeschke mroeschke added Usage Question Closing Candidate May be closeable, needs more eyeballs and removed Bug datetime.date stdlib datetime.date support labels Apr 9, 2022
@rwijtvliet
Copy link
Author

rwijtvliet commented Apr 10, 2022

Thank you for looking into this.

The nonexistent does not apply here (we're dealing only with existing timestamps), and ambiguous is only used for DatetimeIndex. (I've updated the example above to use Timestamp as well, as this too is part of my use-case.)

Current (imperfect) workaround for timestamps:

floored_ts1 = ts1.tz_convert('UTC').floor('H').tz_convert(ts1.tz) # NB: incorrect if ts1 has fractional offset (like 03:30 or 03:45) to UTC.
floored_ts1.freq = ts1.freq

@mroeschke
Copy link
Member

and ambiguous is only used for DatetimeIndex

This is a DatetimeIndex

In [4]: type(i1)
Out[4]: pandas.core.indexes.datetimes.DatetimeIndex

Since this DatetimeIndex has Timestamps that fall within the DST transition, this is an ambiguous operation since it's equivalent to

In [5]: i1.tz_localize(None).floor("H").tz_localize("Europe/Berlin")
AmbiguousTimeError: Cannot infer dst time from 2020-10-25 02:00:00, try using the 'ambiguous' argument

So you'll need to specify one of the ambiguous options

‘infer’ will attempt to infer fall dst-transition hours based on order

bool-ndarray where True signifies a DST time, False designates a non-DST time (note that this flag is only applicable for ambiguous times)

‘NaT’ will return NaT where there are ambiguous times

@rwijtvliet rwijtvliet changed the title BUG: Can't floor timezone aware DatetimeIndex BUG: Can't floor timezone aware Timestamp Apr 10, 2022
@rwijtvliet
Copy link
Author

rwijtvliet commented Apr 10, 2022

Thanks @mroeschke for your answer.

My question was originally about DatetimeIndex, but I now recognise the issue is primarily with Timestamp, which is why I changed the bugreport yesterday afternoon.

Either way, maybe the method below is helpful. Compared to the current implementation, it...

  • ...does work for Timestamp;
  • ...does not require the ambiguous argument for DatetimeIndex; and
  • ...retains the freq information.
import pandas as pd
import pytz


def floor2(self, freq):
    if self.tz is None:
        return self.__class__(self.floor(freq), freq=self.freq)

    # All timestamps are converted to the anchor's UTC offset.
    anchor = self if isinstance(self, pd.Timestamp) else self[0]

    # Turn into fixed offset to eliminate ambiguity...
    fo = pytz.FixedOffset(anchor.utcoffset().total_seconds() / 60)
    # (fo = 'UTC' gives incorrect result if UTC-offset is not integer number of hours)
    # ...then floor and convert back to original timezone...
    newinstance = self.tz_convert(fo).floor(freq).tz_convert(self.tz)
    # ...and keep original frequency.
    return self.__class__(newinstance, freq=self.freq)


pd.DatetimeIndex.floor2 = floor2
pd.Timestamp.floor2 = floor2

# Works for flooring DatetimeIndex to quarterhour
# ------------------------------------------------------
i0 = pd.date_range("2020-10-25 1:11", freq="H", periods=4)
i0.floor2("15T")
#DatetimeIndex(['2020-10-25 01:00:00', '2020-10-25 02:00:00',
#               '2020-10-25 03:00:00', '2020-10-25 04:00:00'],
#              dtype='datetime64[ns]', freq='H')

i1 = pd.date_range("2020-10-25 1:11", freq="H", periods=4, tz="Europe/Berlin")
i1.floor2("15T")
# DatetimeIndex(['2020-10-25 01:00:00+02:00', '2020-10-25 02:00:00+02:00',
#                '2020-10-25 02:00:00+01:00', '2020-10-25 03:00:00+01:00'],
#               dtype='datetime64[ns, Europe/Berlin]', freq='H')

i2 = pd.date_range("2020-10-25 1:11", freq="H", periods=4, tz="Asia/Kolkata")
i2.floor2("15T")
# DatetimeIndex(['2020-10-25 01:00:00+05:30', '2020-10-25 02:00:00+05:30',
#                '2020-10-25 03:00:00+05:30', '2020-10-25 04:00:00+05:30'],
#               dtype='datetime64[ns, Asia/Kolkata]', freq='H')

# Works for flooring Timestamp to quarterhour
# --------------------------------------------------
i0[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00', freq='H')
i1[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00+0200', tz='Europe/Berlin', freq='H')
i2[0].floor2("15T")
# Timestamp('2020-10-25 01:00:00+0530', tz='Asia/Kolkata', freq='H')

It's not finished/perfect yet - e.g. flooring to days is not done correctly. What's needed is a check: if the flooring is to the nearest hour, or shorter, the above code can be used. If it's to a frequency that's longer than an hour, the original code could be used.

But this is quickly turning into a code review. Let me know if you think it's helpful if I turn this into a pull request.

@rwijtvliet rwijtvliet changed the title BUG: Can't floor timezone aware Timestamp BUG: Can't floor timezone aware DatetimeIndex Apr 12, 2022
@lucasjamar
Copy link

I think this is the expected behavior as stated in the docs in the Notes section https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.floor.html

If the timestamps have a timezone, flooring will take place relative to the local (“wall”) time and re-localized to the same timezone. When flooring near daylight savings time, use nonexistent and ambiguous to control the re-localization behavior.

@mroeschke , why is "wall" time selected for flooring the datetime? why are times not converted by default to "UTC", floored and then returned again to set timezone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Index Related to the Index class or subclasses Needs Triage Issue that has not been reviewed by a pandas team member Timezones Timezone data dtype Usage Question
Projects
None yet
Development

No branches or pull requests

4 participants