-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: fix to_datetime to handle int16 and int8 #13464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
tests! |
@jreback Hi, I am a bit confused about which file should the tests be written in? There is no separate file that contains tests for pandas.tseries.tools . |
https://github.com/pydata/pandas/blob/master/pandas/tseries/tests/test_timeseries.py#L2453 or you can make another function in the same class |
@@ -2563,6 +2563,26 @@ def test_dataframe(self): | |||
with self.assertRaises(ValueError): | |||
to_datetime(df2) | |||
|
|||
def test_dataframe_dtypes(self): | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number here as a comment
lgtm (minor comment), pls add a whatsnew entry. ping when green. |
@@ -510,6 +510,9 @@ def coerce(values): | |||
# we allow coercion to if errors allows | |||
return to_numeric(values, errors=errors) | |||
|
|||
# prevent overflow in case of int8 or int16 | |||
arg = arg.astype('int64', copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should cause another problem. On current master:
df = pd.DataFrame({'year': [2000, 2001], 'month': [1.5, 1], 'day': [1, 1]})
pd.to_datetime(df)
# ValueError: cannot assemble the datetimes: 'float' object is unsliceable
pd.to_datetime(df.astype(np.int64))
# 0 2000-01-01
# 1 2001-01-01
# dtype: datetime64[ns]
Better to raise if input contains non-integer dtype.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. @ravinimmi can you add checking that the passes dtypes are all integer.
use is_integer_dtype
, and if not raise (need tests as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback What should be the output in case of string type ? Or should it raise ValueError?
df = pd.DataFrame({'year': [2000, 2001], 'month': [1, 1], 'day': [1, 1]})
pd.to_datetime(df.astype(str))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strings are already tested / work.
internally pd.to_numeric
is used which coerces non-numeric to numeric (or it raises if it fails). Thus you are left with float/int.
Current coverage is 84.32%@@ master #13464 diff @@
==========================================
Files 138 138
Lines 51069 51072 +3
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43066 43069 +3
Misses 8003 8003
Partials 0 0
|
thanks! |
git diff upstream/master | flake8 --diff
Fixes #13451