Skip to content

BUG: Passing datetime strings of the form "1-01-01 00:00:00" #37071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
znicholls opened this issue Oct 12, 2020 · 4 comments
Closed
3 tasks done

BUG: Passing datetime strings of the form "1-01-01 00:00:00" #37071

znicholls opened this issue Oct 12, 2020 · 4 comments

Comments

@znicholls
Copy link
Contributor

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

>>> import pandas as pd
>>> pd.Timestamp("1-01-01 00:00:00")
Timestamp('2001-01-01 00:00:00')

Problem description

Passing "1-01-01 00:00:00" to "2001-01-01 00:00:00" is clearly wrong, this should raise an out of bounds datetime error.

Expected Output

>>> import pandas as pd
>>> pd.Timestamp("1-01-01 00:00:00")
OutOfBoundsDatetime

However, the problem here appears to be deeper than pandas, specifically

>>> from dateutil import parser
>>> parser.parse("1-01-01 00:00:00")
datetime.datetime(2001, 1, 1, 0, 0)

If it's helpful, I'm happy to raise an issue in https://github.com/dateutil/dateutil instead. It's not super clear to me where the error really is (maybe dateutil have a very good reason for parsing "1-01-01 00:00:00" as "2001-01-01 00:00:00").

Output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit : cd88ee2
python : 3.8.5.final.0
python-bits : 64
OS : Darwin
OS-release : 18.7.0
Version : Darwin Kernel Version 18.7.0: Mon Aug 31 20:53:32 PDT 2020; root:xnu-4903.278.44~1/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : en_AU.UTF-8
LOCALE : en_AU.UTF-8

pandas : 1.2.0.dev0+746.gcd88ee2dd
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.3
setuptools : 49.1.0.post20200704
Cython : 0.29.21
pytest : 6.1.1
hypothesis : 5.37.1
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.6
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.8.3
fastparquet : 0.4.0
gcsfs : 0.7.1
matplotlib : 3.2.2
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 0.17.1
pytables : None
pyxlsb : None
s3fs : 0.4.2
scipy : 1.5.1
sqlalchemy : 1.3.19
tables : 3.6.1
tabulate : 0.8.7
xarray : 0.15.1
xlrd : 1.2.0
xlwt : 1.3.0
numba : 0.48.0

@znicholls znicholls added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 12, 2020
@znicholls
Copy link
Contributor Author

I have put a failing test in #37070 in case that is helpful

@jorisvandenbossche
Copy link
Member

We indeed rely here on the behaviour of dateutil.

Note that specifying the year as 4 numbers gives the expected result:

In [21]: pd.Timestamp("0001-01-01 00:00:00")
...
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1-01-01 00:00:00

Funny (or ironic ..) thing is that the error message actually shows the datetime in the format you are using / doesn't work.

I was going to say that if you don't want to rely on "inferring" the date from dateutil, you should pass a format string to ensure you have always consistent parsing (https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes). But, I don't seem to find how to specify the format for a year like "1" (both %Y and %y expect zero-padded years)

@znicholls
Copy link
Contributor Author

We indeed rely here on the behaviour of dateutil

Ok in that case I'll add an issue in that repo and let's see what happens.

@znicholls
Copy link
Contributor Author

znicholls commented Oct 13, 2020

Ok in that case I'll add an issue in that repo and let's see what happens

Having thought about it more and looked through dateutil, it is indeed ambiguous (e.g. 2-03-04 could be 2nd March year 4 or 2nd March 2004 or 4th March year 2) so I'll go up the chain instead to where this originally arose (xarray). Thanks for your help @jorisvandenbossche 👍

@jorisvandenbossche jorisvandenbossche added Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants