-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Timestamp fails when fold is passed and building from components as positional args (Pandas 2.0 regression) #52117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From the original implementation in 6e04264, I think the error message could have been clarified as |
Perhaps. This could be a won'tfix, then. Still it is a bit surprising. # 1/ This works, following calls are equivalent AFAIU
Timestamp(2019, 10, 27, 1, 30, tz=tz)
Timestamp(year=2019, month=10, day=27, hour=1, minute=30, tz=tz)
# 2/ This works
Timestamp(year=2019, month=10, day=27, hour=1, minute=30, tz=tz, fold=fold)
# 3/ This does not work
Timestamp(2019, 10, 27, 1, 30, tz=tz, fold=fold) Why? Pandas understands the positional args, so how come 3/ is not equivalent to 2/? I may lack time/skills to dig into the code and understand why, but it is not intuitive to me. Besides, it worked with Pandas 1.5. In any case, the error message is misleading. And probably the docs too. |
@AlexKirko do you happen to remember when adding I'm trying to determine if originally in #31563 if 3/ was never supposed to work even in 1.5 |
Hello, this is what I can remember or could dig up: In the examples that we documented then, we instruct the user to build from a naive datetime or keywords: pd.Timestamp(datetime.datetime(2019, 10, 27, 1, 30, 0, 0),
tz='dateutil/Europe/London', fold=0)
pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
tz='dateutil/Europe/London', fold=1) We never say that the user can't build from positional arguments. Now, the code responsible for the check is here: if fold is not None:
if fold not in [0, 1]:
raise ValueError(
"Valid values for the fold argument are None, 0, or 1."
)
if (ts_input is not _no_input and not (
PyDateTime_Check(ts_input) and
getattr(ts_input, "tzinfo", None) is None)):
raise ValueError(
"Cannot pass fold with possibly unambiguous input: int, "
"float, numpy.datetime64, str, or timezone-aware "
"datetime-like. Pass naive datetime-like or build "
"Timestamp from components."
) From what I remember and see in the code, we intended to raise an exception here only when we are trying to build from a non-ambiguous datetime-like input: one that passes I would say, the behavior in 2.0 is a bug. If we support the positional argument convention to mimic My guess would be that something in 2.0 makes us pass a |
Spot on. That would be consistent, indeed.
Thanks @AlexKirko. |
Hm, on 1.5.3 # Allow fold only for unambiguous input
if fold is not None:
if fold not in [0, 1]:
raise ValueError(
"Valid values for the fold argument are None, 0, or 1."
) Also, on 1.5.3 I'll go through the constructor code and see what changed. |
Somehow we used to go into this if tzinfo is not None:
# GH#17690 tzinfo must be a datetime.tzinfo object, ensured
# by the cython annotation.
if tz is not None:
if (is_integer_object(tz)
and is_integer_object(ts_input)
and is_integer_object(freq)
):
# GH#31929 e.g. Timestamp(2019, 3, 4, 5, 6, tzinfo=foo)
# TODO(GH#45307): this will still be fragile to
# mixed-and-matched positional/keyword arguments
ts_input = datetime(
ts_input,
freq,
tz,
unit or 0,
year or 0,
month or 0,
day or 0,
fold=fold or 0,
)
nanosecond = hour
tz = tzinfo
print("auxilary constructor")
print(ts_input)
return cls(ts_input, nanosecond=nanosecond, tz=tz)
raise ValueError('Can provide at most one of tz, tzinfo') Now we no longer do, which looks correct, but where before we would pass a timezone-aware datetime without the fold argument, we now keep going and pass positional arguments. |
Looks like the culprit is the predicate. We currently have: if (ts_input is not _no_input and not (
PyDateTime_Check(ts_input) and
getattr(ts_input, "tzinfo", None) is None)): Which is equivalent to: if (ts_input is not _no_input and (
not PyDateTime_Check(ts_input) or
getattr(ts_input, "tzinfo", None) is not None)): I think we should have: if (ts_input is not _no_input and
PyDateTime_Check(ts_input) and
getattr(ts_input, "tzinfo", None) is not None): This fixes the OP's problem, let me see is this passes the test suite. |
Of course not. Anyway, I'll find a predicate variant that does what we want and passes the tests. |
Also, this looks like a kludge to me: elif is_integer_object(year):
# User passed positional arguments:
# Timestamp(year, month, day[, hour[, minute[, second[,
# microsecond[, tzinfo]]]]])
ts_input = datetime(ts_input, year, month, day or 0,
hour or 0, minute or 0, second or 0, fold=fold or 0)
unit = None In fact, ts_input is the year (which is what the datetime constructor expects). print(ts_input)
print(_date_attributes)
> 2020
> [1, 1, 0, 0, None, None, None, None] This should probably be carefully fixed in a separate PR, if it's not just me finding weird that month is assigned to the variable named |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Code works with Pandas 1.5. Possible Pandas 2.0 regression?
Search yielded #44378 except I do build from components.
When passing components as kwargs, it works:
I figured maybe I'm not using the constructor correctly but I couldn't find a mention in the docs preventing this usage.
The only test I found regarding this is this one:
pandas/pandas/tests/scalar/timestamp/test_constructors.py
Lines 839 to 847 in ee84ef2
It is this test that led me to the "pass all as kwargs" solution.
Is this a bug or a known limitation?
Expected Behavior
Installed Versions
INSTALLED VERSIONS
commit : c2a7f1a
python : 3.9.2.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.0-21-amd64
Version : #1 SMP Debian 5.10.162-1 (2023-01-21)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8
pandas : 2.0.0rc1
numpy : 1.24.2
pytz : 2022.7.1
dateutil : 2.8.2
setuptools : 65.3.0
pip : 23.0.1
Cython : None
pytest : 7.2.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.9.5
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : None
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.10.1
snappy : None
sqlalchemy : 2.0.4
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : None
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: