Skip to content

Inconsistent pd.to_datetime() behaviour dependent on dtype #25740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
austinlostinboston opened this issue Mar 15, 2019 · 2 comments
Closed

Inconsistent pd.to_datetime() behaviour dependent on dtype #25740

austinlostinboston opened this issue Mar 15, 2019 · 2 comments
Labels
Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request

Comments

@austinlostinboston
Copy link

I have two Series, both containing years. One series is dtype == float64 and the other is dtype == object. When I apply pd.to_datetime() to these two columns, i get drastically different results.

## Pandas 0.24.1
import pandas as pd
year_string = pd.Series(['1980','1985','1990','1995','2000'])
year_float = pd.Series([1980.0, 1985.0, 1990.0, 1995.0, 2000.0])

## Works as expected
pd.to_datetime(year_string, infer_datetime_format=True)
# 0   1980-01-01
# 1   1985-01-01
# 2   1990-01-01
# 3   1995-01-01
# 4   2000-01-01
# dtype: datetime64[ns]

## Automatically treats floats epoch nanoseconds
pd.to_datetime(year_float, infer_datetime_format=True)
# 0   1970-01-01 00:00:00.000001980
# 1   1970-01-01 00:00:00.000001985
# 2   1970-01-01 00:00:00.000001990
# 3   1970-01-01 00:00:00.000001995
# 4   1970-01-01 00:00:00.000002000
# dtype: datetime64[ns]

While I understand why this difference exists (perhaps there's an assumption being made that if dtype == float, then the value are epoch nanoseconds?), I'm not sure I agree with the inconsistency in behavior based strictly on dtype. I would favor a solution that treated the float64 series the same as the string/object series. Is it possible that the logic that infers year_string as containing years could also be applied to year_float?

@chris-b1
Copy link
Contributor

I'm going to close this is as a duplicate of #15836 You are are right that the floats are being treated as ordinals - #15836 would have it raise an error unless a unit was specified as this behavior can be surprising. So agree this should be fixed!

That said, as a general rule - you should expect different dtypes to do different things. '1995' and 1995.0 are rarely interchangeable in pandas.

@chris-b1
Copy link
Contributor

Also, PR to fix is welcome - looks like a stalled one here #25463

@chris-b1 chris-b1 added Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request labels Mar 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

2 participants