-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR: Column with datetime values too big to be converted to pd.Timestamp leads to assertion error in groupby #36003
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @Khris777. This seems to be a regression from 1.0.5: In [1]: import pandas as pd
...: import datetime
...:
...: df = pd.DataFrame({'A': ['X', 'Y'], 'B': [datetime.datetime(2
...: 005, 1, 1, 10, 30, 23, 540000),
...: datetime.datetime(3
...: 005, 1, 1, 10, 30, 23, 540000)]})
...: df.groupby("A")["B"].max()
...:
Out[1]:
A
X 2005-01-01 10:30:23.540000
Y 3005-01-01 10:30:23.540000
Name: B, dtype: object
In [2]: pd.__version__
Out[2]: '1.0.5' Actually this was raising a TypeError after 4edcc55 but prior to the AssertionError. |
moved off 1.1.2 milestone (scheduled for this week) as no PRs to fix in the pipeline |
can confirm first bad commit: [4edcc55] CLN: Make Series._values match Index._values (#31182) https://github.com/simonjayhawkins/pandas/runs/1170442784?check_suite_focus=true |
moved off 1.1.3 milestone (overdue) as no PRs to fix in the pipeline |
moved off 1.1.4 milestone (scheduled for release tomorrow) as no PRs to fix in the pipeline |
[X ] I have checked that this issue has not already been reported.
[ X] I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
Two different dates, one within the range of what
pd.Timestamp
can handle, the other outside of that range:Problem description
pd.Timestamp
can't deal with a too big date like the year 3005, so to represent such a date I need to use thedatetime.datetime
type. Before 1.1.1 (1.1.0?) this hasn't been an issue, but now this code throws an assertion error:From testing with mixing
pd.Timestamp
anddatetime.datetime
types I presume pandas is converting applicable dates (first line in the example) topd.Timestamp
while leaving the others asdatetime.datetime
leading to a mixed-type result column and the assertion error.Expected Output
Since I'm explicitely operating with datatype
datetime.datetime
there should be no implicit conversion topd.Timestamp
if it's not assured that all values are within the range thatpd.Timestamp
allows.Output of
pd.show_versions()
commit : f2ca0a2
python : 3.7.8.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19041
machine : AMD64
processor : Intel64 Family 6 Model 79 Stepping 1, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None
pandas : 1.1.1
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.2
setuptools : 50.0.0.post20200830
Cython : 0.29.21
pytest : None
hypothesis : None
sphinx : 3.2.1
blosc : None
feather : None
xlsxwriter : 1.3.3
lxml.etree : 4.5.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : 4.9.1
bottleneck : None
fsspec : 0.8.0
fastparquet : 0.4.1
gcsfs : None
matplotlib : 3.3.1
numexpr : None
odfpy : None
openpyxl : 3.0.5
pandas_gbq : None
pyarrow : 1.0.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : None
tabulate : 0.8.7
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.51.1
The text was updated successfully, but these errors were encountered: