Skip to content

BUG: Can't floor timezone-aware Timestamp #46757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
rwijtvliet opened this issue Apr 12, 2022 · 4 comments
Closed
2 of 3 tasks

BUG: Can't floor timezone-aware Timestamp #46757

rwijtvliet opened this issue Apr 12, 2022 · 4 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Timezones Timezone data dtype Usage Question

Comments

@rwijtvliet
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
ts = pd.Timestamp('2020-10-25 02:00:00+0200', tz='Europe/Berlin', freq='H')

# Can't floor single timestamp:
ts.floor('H') # AmbiguousTimeError: Cannot infer dst time from 2020-10-25 02:00:00, try using the 'ambiguous' argument
ts.floor('H', ambiguous='infer') # ValueError: Cannot infer offset with only one time.

Issue Description

The timestamp in the above example is well-defined having both a timezone (Europe/Berlin) and a UTC-offset (+2). Flooring it to the nearest full (quarter)hour should be possible, but raises an error instead.

Expected Behavior

I don't think the result of the flooring operation is ambiguous. The method should return ts.

I've posted a proposal for a partial solution here.

Installed Versions

INSTALLED VERSIONS

commit : 06d2301
python : 3.9.7.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19044
machine : AMD64
processor : Intel64 Family 6 Model 165 Stepping 2, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : de_DE.cp1252

pandas : 1.4.1
numpy : 1.22.3
pytz : 2022.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.4
Cython : None
pytest : 7.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.1.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.1
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.8.0
sqlalchemy : 1.4.32
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None

@rwijtvliet rwijtvliet added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 12, 2022
@mroeschke
Copy link
Member

Given https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DatetimeIndex.floor.html

If the timestamps have a timezone, flooring will take place relative to the local (“wall”) time and re-localized to the same timezone. When flooring near daylight savings time, use nonexistent and ambiguous to control the re-localization behavior.

And a wall time of 2020-10-25 02:00:00 being able to have +0200 and +0100, this is still ambiguous and where True/False clarifies the behavior

In [2]: ts = pd.Timestamp('2020-10-25 02:00:00+0200', tz='Europe/Berlin', freq='H')
<ipython-input-2-cdcc17322304>:1: FutureWarning: The 'freq' argument in Timestamp is deprecated and will be removed in a future version.
  ts = pd.Timestamp('2020-10-25 02:00:00+0200', tz='Europe/Berlin', freq='H')

In [3]: ts.floor("H", ambiguous=True)
Out[3]: Timestamp('2020-10-25 02:00:00+0200', tz='Europe/Berlin')

In [4]: ts.floor("H", ambiguous=False)
Out[4]: Timestamp('2020-10-25 02:00:00+0100', tz='Europe/Berlin')

@mroeschke mroeschke added Usage Question Timezones Timezone data dtype Closing Candidate May be closeable, needs more eyeballs and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 13, 2022
@rwijtvliet
Copy link
Author

rwijtvliet commented Apr 13, 2022

Thanks @mroeschke for your answer.

I can see that you are able to dictate which of the '02:00' timestamps you want to select with the ambiguous parameter. However, the latter is not the correct flooring of the provided timestamp under any circumstance, and IMHO should never be returned, regardless of which parameters are specified and with which value.

My point is even stronger: the original ts is unambiguous in describing a point on the universal time axis. The implementation could therefore be improved, so that the ambiguous parameter is not even needed.

(I am not sure how universal this is. But then again, I'm not sure what I would even expect to get from flooring this to 2H.)

@mroeschke
Copy link
Member

Given that 2:00 is floored relative to the wall time and not the absolute time, i.e. floor(2:00, "H") -> 2:00 in wall time, I am not sure how re-localizing 2020-10-25 02:00 is not ambiguous?

I somewhat understand on the surface implementation that should feel like a no-op, but I guess this is an unfortunate edge case of being consistent that rounding is assumed to be relative to wall time

#44287

@rwijtvliet
Copy link
Author

The UTC-offset (+0200) is present in ts.

If the constraint of the current implementation - namely, that this UTC-offset will be ignored - is set in stone, then yes, I agree with you, and the timestamp is ambiguous. After all, we get 02:00+0200 --wall--> 02:00 --floor--> 02:00 --Berlin--> ambiguous

I can imagine an additional check in the flooring method, where the UTC-offset is not ignored, and instead used to determine the correct return value, in case just using the wall time is ambiguous. And the ambiguous parameter is only required, if and only if the ambiguity remains even when using the UTC-offset.

But from your link I see that someone else already unsuccessfully made that case, so we can close this as a duplicate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Timezones Timezone data dtype Usage Question
Projects
None yet
Development

No branches or pull requests

2 participants