Skip to content

BUG: args offset in Timestamp.__new__ causes bugs #52343

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
AlexKirko opened this issue Apr 1, 2023 · 7 comments
Open
3 tasks done

BUG: args offset in Timestamp.__new__ causes bugs #52343

AlexKirko opened this issue Apr 1, 2023 · 7 comments
Labels
Bug Timestamp pd.Timestamp and associated methods

Comments

@AlexKirko
Copy link
Member

AlexKirko commented Apr 1, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# Pandas 1.5.x
import pandas as pd

ts1 = pd.Timestamp(2023, 3, 30, 10, None, None, 30)
ts2 = pd.Timestamp(2023, 3, 30, 10, day=30)

> 2023-03-30 10:00:00.000030
> 2023-03-30 10:00:00.000030

# Pandas 2.x
ts1 = pd.Timestamp(2023, 3, 30, 10, None, 30)
ts2 = pd.Timestamp(2023, 3, 30, 10, second=30)

> 2023-03-30 10:00:00.000030
> 2023-03-30 10:00:00.000030

Issue Description

When we build Timestamp from components through positional args, the year initially ends up in ts_input, and we need to shift args one to the right, which we do with some kludges (normally by calling a constructor and passing it arguments in offset positions).

The problem is that we cannot tell the difference between the construction calls given in the example from inside the current Timestamp.__new__. This is the cause of both the bug outlined here and also of #52117 .

The two available solutions are to either decorate Timestamp, which causes a performance hit (see #52221) or to refactor Timestamp to be built from *args, **kwargs and handle the logic internally.

Expected Behavior

# Pandas 2.x
ts1 = pd.Timestamp(2023, 3, 30, 10, None, None, 30)
ts2 = pd.Timestamp(2023, 3, 30, 10, second=30)

> 2023-03-30 10:00:00.000030
> 2023-03-30 10:00:30

Installed Versions

INSTALLED VERSIONS

commit : 8c7b8a4
python : 3.8.16.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22621
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1251

pandas : 2.1.0.dev0+337.g8c7b8a4f3e
numpy : 1.23.5
pytz : 2023.2
dateutil : 2.8.2
setuptools : 67.6.0
pip : 23.0.1
Cython : 0.29.33
pytest : 7.2.2
hypothesis : 6.70.0
sphinx : 4.5.0
blosc : None
feather : None
xlsxwriter : 3.0.9
lxml.etree : 4.9.2
html5lib : 1.1
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : 3.1.2
IPython : 8.11.0
pandas_datareader: None
bs4 : 4.12.0
bottleneck : 1.3.7
brotli :
fastparquet : 2023.2.0
fsspec : 2023.3.0
gcsfs : 2023.3.0
matplotlib : 3.6.3
numba : 0.56.4
numexpr : 2.8.3
odfpy : None
openpyxl : 3.1.0
pandas_gbq : None
pyarrow : 11.0.0
pyreadstat : 1.2.1
pyxlsb : 1.0.10
s3fs : 2023.3.0
scipy : 1.10.1
snappy :
sqlalchemy : 2.0.7
tables : 3.7.0
tabulate : 0.9.0
xarray : 2023.1.0
xlrd : 2.0.1
zstandard : 0.19.0
tzdata : 2023.2
qtpy : None
pyqt5 : None

@AlexKirko AlexKirko added Bug Timestamp pd.Timestamp and associated methods labels Apr 1, 2023
@AlexKirko AlexKirko self-assigned this Apr 1, 2023
@AlexKirko
Copy link
Member Author

As discussed in #52221, I'll be refactoring Timestamp.__new__ as a solution.

@rhshadrach
Copy link
Member

As an alternative to what was discussed in #52221, I was wondering what we think of making ts_input keyword-only. We could still branch the logic on the type of input received for year for backwards compatibility. We would then be able to reliably get the remaining positional args even if they are specified by name.

cc @mroeschke

@mroeschke
Copy link
Member

I think if we're considering making alterations to the constructor, I think we should make the Timestamp constructor match the datetime.datetime constructor an create from_string, from_datetime, from_unit constructors for all the other arguments it accepts.

@rhshadrach
Copy link
Member

I took a more in depth look, it's not possible to do what I suggested; namely Timestamp(1, month=2, day=3) and Timestamp(year=1, month=2, day=3) would receive the same arguments but the result needs to differ.

@j-hendricks
Copy link
Contributor

I agree with mroeschke that we should build separate factory methods like from_string and from_datetime instead of handling all possible objects in the constructor as it would lead to better error messages and a separation of concerns. I will execute this approach unless their are any objections from the team :)

@rhshadrach
Copy link
Member

I'm +1 on from_string et al.

@AlexKirko AlexKirko removed their assignment Mar 26, 2025
@AlexKirko
Copy link
Member Author

I agree that adding factory methods is the better solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timestamp pd.Timestamp and associated methods
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants