Inconsistent pd.to_datetime() behaviour dependent on dtype #25740

austinlostinboston · 2019-03-15T16:22:55Z

I have two Series, both containing years. One series is dtype == float64 and the other is dtype == object. When I apply pd.to_datetime() to these two columns, i get drastically different results.

## Pandas 0.24.1
import pandas as pd
year_string = pd.Series(['1980','1985','1990','1995','2000'])
year_float = pd.Series([1980.0, 1985.0, 1990.0, 1995.0, 2000.0])

## Works as expected
pd.to_datetime(year_string, infer_datetime_format=True)
# 0   1980-01-01
# 1   1985-01-01
# 2   1990-01-01
# 3   1995-01-01
# 4   2000-01-01
# dtype: datetime64[ns]

## Automatically treats floats epoch nanoseconds
pd.to_datetime(year_float, infer_datetime_format=True)
# 0   1970-01-01 00:00:00.000001980
# 1   1970-01-01 00:00:00.000001985
# 2   1970-01-01 00:00:00.000001990
# 3   1970-01-01 00:00:00.000001995
# 4   1970-01-01 00:00:00.000002000
# dtype: datetime64[ns]

While I understand why this difference exists (perhaps there's an assumption being made that if dtype == float, then the value are epoch nanoseconds?), I'm not sure I agree with the inconsistency in behavior based strictly on dtype. I would favor a solution that treated the float64 series the same as the string/object series. Is it possible that the logic that infers year_string as containing years could also be applied to year_float?

The text was updated successfully, but these errors were encountered:

chris-b1 · 2019-03-15T17:38:43Z

I'm going to close this is as a duplicate of #15836 You are are right that the floats are being treated as ordinals - #15836 would have it raise an error unless a unit was specified as this behavior can be surprising. So agree this should be fixed!

That said, as a general rule - you should expect different dtypes to do different things. '1995' and 1995.0 are rarely interchangeable in pandas.

chris-b1 · 2019-03-15T17:39:15Z

Also, PR to fix is welcome - looks like a stalled one here #25463

chris-b1 closed this as completed Mar 15, 2019

chris-b1 added Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request labels Mar 15, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent pd.to_datetime() behaviour dependent on dtype #25740

Inconsistent pd.to_datetime() behaviour dependent on dtype #25740

austinlostinboston commented Mar 15, 2019

chris-b1 commented Mar 15, 2019

chris-b1 commented Mar 15, 2019

Inconsistent pd.to_datetime() behaviour dependent on dtype #25740

Inconsistent pd.to_datetime() behaviour dependent on dtype #25740

Comments

austinlostinboston commented Mar 15, 2019

chris-b1 commented Mar 15, 2019

chris-b1 commented Mar 15, 2019