Skip to content

Fixes #45506 Catch overflow error when converting to datetime #45532

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 31, 2022

Conversation

Compro-Prasad
Copy link
Contributor

@Compro-Prasad Compro-Prasad commented Jan 21, 2022

Exception is raised when date part (like day, month, year) is greater than 32 bit signed integer

@pep8speaks
Copy link

pep8speaks commented Jan 21, 2022

Hello @Compro-Prasad! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-01-31 14:17:45 UTC

@mroeschke
Copy link
Member

Could you add a unit test?

Also I don't think this closes the original issue. I think in the original issue the strings should be just compared as strings not datetime-like objects.

@jeff-hernandez
Copy link
Contributor

Yeah I was hoping the fix would result in a boolean series.

import pandas as pd

data = [
    {"f_0": "2015-07-01", "f_2": "08335394550"},
    {"f_0": "2015-07-02", "f_2": "+49 (0) 0345 300033"},
    {"f_0": "2015-07-03", "f_2": "+49(0)2598 04457"},
    {"f_0": "2015-07-04", "f_2": "0741470003"},
    {"f_0": "2015-07-05", "f_2": "04181 83668"},
]

dtypes = dict(f_0='datetime64[ns]', f_2='string')
df = pd.DataFrame(data=data).astype(dtypes)
df['f_0'].eq(df['f_2'])
0    False
1    False
2    False
3    False
4    False
dtype: bool

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls always start with a test that demonstrates the issue (from the OP)

@Compro-Prasad
Copy link
Contributor Author

Compro-Prasad commented Jan 22, 2022

Could you add a unit test?

Sure. On it.

Also I don't think this closes the original issue. I think in the original issue the strings should be just compared as strings not datetime-like objects.

Guessing datatypes isn't the most pleasant things to work with. For efficiency its nice to convert both into strings. But if one is already a datetime like object then its more convenient to prefer datetime over string. If that fails, string should be preferred. This case from OP is not a real scenario anyway.

Yeah I was hoping the fix would result in a boolean series.

That is what I get after applying the fix:

Python 3.10.1 (main, Dec 18 2021, 23:53:45) [GCC 11.1.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> 
>>> data = [
...     {"f_0": "2015-07-01", "f_2": "08335394550"},
...     {"f_0": "2015-07-02", "f_2": "+49 (0) 0345 300033"},
...     {"f_0": "2015-07-03", "f_2": "+49(0)2598 04457"},
...     {"f_0": "2015-07-04", "f_2": "0741470003"},
...     {"f_0": "2015-07-05", "f_2": "04181 83668"},
... ]
>>> 
>>> dtypes = dict(f_0='datetime64[ns]', f_2='string')
>>> df = pd.DataFrame(data=data).astype(dtypes)
>>> df['f_0'].eq(df['f_2'])
0    False
1    False
2    False
3    False
4    False
dtype: bool

pls always start with a test that demonstrates the issue (from the OP)

Sure thing. I am very new to pandas development. Thanks for the suggestion. I will add a test.

@Compro-Prasad
Copy link
Contributor Author

Have added a test in pandas/tests/arrays/test_array.py. Is this the correct place where the test should go?

@Compro-Prasad Compro-Prasad force-pushed the fix-overflowerror branch 8 times, most recently from 7a2c829 to 5b6c6cf Compare January 22, 2022 11:56
@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Datetime Datetime data dtype labels Jan 23, 2022
@Compro-Prasad Compro-Prasad force-pushed the fix-overflowerror branch 3 times, most recently from 4e82d1d to 7c7286c Compare January 23, 2022 11:18
@Compro-Prasad
Copy link
Contributor Author

Compro-Prasad commented Jan 23, 2022

I found the following error from mypy:

pandas/io/parquet.py:184: error: "RawIOBase" has no attribute "name"  [attr-defined]

IMO using .name instead of .raw.name should be a good enough solution.

Shall I make the changes?

@jreback
Copy link
Contributor

jreback commented Jan 23, 2022

I found the following error from mypy:

pandas/io/parquet.py:184: error: "RawIOBase" has no attribute "name"  [attr-defined]

IMO using .name instead of .raw.name should be a good enough solution.

Shall I make the changes?

ignore as this is taken care of elsewhere

@Compro-Prasad Compro-Prasad force-pushed the fix-overflowerror branch 3 times, most recently from ee0b3e2 to 9f1fb0f Compare January 24, 2022 15:43
@Compro-Prasad Compro-Prasad force-pushed the fix-overflowerror branch 6 times, most recently from 89a3907 to 583d41e Compare January 28, 2022 09:55
@Compro-Prasad Compro-Prasad requested a review from jreback January 29, 2022 09:17
@Compro-Prasad
Copy link
Contributor Author

@jreback Have made the changes.

@jreback jreback added this to the 1.5 milestone Jan 29, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, can you add a whatsnew entry in bug fixes for 1.5 (conversion or the datetime section is fine). and ping on green.

@Compro-Prasad Compro-Prasad force-pushed the fix-overflowerror branch 2 times, most recently from 45eb751 to 8a12f16 Compare January 30, 2022 02:31
@Compro-Prasad
Copy link
Contributor Author

@jreback Have added the whatsnew entry.

@Compro-Prasad Compro-Prasad requested a review from jreback January 30, 2022 08:28
Exception is raised when a part of the date like day, month or year is greater
than 32 bit signed integer.

Added tests for this issue in pandas/tests/series/methods/test_compare.py

Added whatsnew entry for v1.5.0
@jreback jreback merged commit f385fc0 into pandas-dev:main Jan 31, 2022
@jreback
Copy link
Contributor

jreback commented Jan 31, 2022

thanks @Compro-Prasad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants