Skip to content

Timestamp(foo) vs to_datetime(foo) #17697

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Sep 27, 2017 · 11 comments · Fixed by #21822
Closed

Timestamp(foo) vs to_datetime(foo) #17697

jbrockmendel opened this issue Sep 27, 2017 · 11 comments · Fixed by #21822
Labels
Timezones Timezone data dtype
Milestone

Comments

@jbrockmendel
Copy link
Member

I assumed that these should be the same:

In [2]: pd.to_datetime("2015-11-18 15:30:00+05:30")
Out[2]: Timestamp('2015-11-18 10:00:00')

In [3]: pd.Timestamp("2015-11-18 15:30:00+05:30")
Out[3]: Timestamp('2015-11-18 15:30:00+0530', tz='pytz.FixedOffset(330)')

Is the mismatch intentional?

@jreback
Copy link
Contributor

jreback commented Sep 27, 2017

xref to #13712

these I suppose should parse the same. but this is quite tricky. should these be given pytz or dateutil stamps?

we prob have a hardcoding somewhere.

also .to_datetime doesn't accept tz atm so it can only parse to UTC

@sinhrks sinhrks added the Timezones Timezone data dtype label Sep 27, 2017
@jbrockmendel
Copy link
Member Author

should these be given pytz or dateutil stamps?

I'm partial to dateutil, but will be happy either way.

we prob have a hardcoding somewhere.

It looks like tslib.array_to_datetime and tslib.convert_str_to_tsobject both call _string_to_dts but do different post-processing. Need to do some tinkering, but I suspect that unifying these will fix this.

@jorisvandenbossche
Copy link
Member

I agree it would be nice they give the same result, but I am not sure the fixed offset behaviour of Timestamp is that useful.

This is also not only the behaviour of to_datetime on scalar input, but also for parsing sequences of strings (and thus also of read_csv, etc). Changing this may break a lot of people's code.

@jbrockmendel
Copy link
Member Author

but I am not sure the fixed offset behaviour of Timestamp is that useful.

Do you mean you would prefer something other than FixedOffset, or that you prefer the tz-naive output?

This is also not only the behaviour of to_datetime on scalar input, but also for parsing sequences of strings (and thus also of read_csv, etc). Changing this may break a lot of people's code.

Fair point. How about a proof of concept that unifies the implementations and we'll see if that causes any test failures, then reevaluate.

@jbrockmendel
Copy link
Member Author

This came up when I was looking at tests.scalar.test_timestamp. A handful of tests use e.g. stamp = date_range(...)[0] instead of just stamp = Timestamp(...). In these particular cases they turn out to be equivalent. Any objection to

a) changing these existing tests to use the Timestamp constructor and
b) adding TestEquivalency tests to assert that date_range(...)[0] == Timestamp(...) in the relevant cases?

The idea here is that tslib/tslibs would be easier to test/maintain if the tests.scalar tests were self-contained.

@jorisvandenbossche
Copy link
Member

Any objection to changing these existing tests to use the Timestamp constructor and

As long as there are other tests for date_range that checks those things (eg the different types of timezones), then I have no objection

adding TestEquivalency tests to assert that date_range(...)[0] == Timestamp(...) in the relevant cases?

do you mean to_datetime instead of date_range ? (as that was the original issue)

@jbrockmendel
Copy link
Member Author

do you mean to_datetime instead of date_range ? (as that was the original issue)

In this case the tests I'm referring to specifically use date_range. Under the hood date_range calls DatetimeIndex which calls to_datetime, so for the purposes of scalar-izing tests, I think of them as being in the same bucket.

@jbrockmendel
Copy link
Member Author

The more I think about it, the more unsettling the behavior of to_datetime is.

>>> pd.to_datetime('2017-10-30 09:53:15-08:00')
Timestamp('2017-10-30 17:53:15')

strikes me as distinctly less correct than returning a tz-aware Timestamp. Is it just me?

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Oct 30, 2017

You are correct, but still the question is: what to return instead and how much user code will this break?

Do you mean you would prefer something other than FixedOffset, or that you prefer the tz-naive output?

I don't really know, I only know that often I don't want a pytz.FixedOffset.
Maybe returning a tz-aware datetime, but localized to UTC would make more sense as opposed to the current UTC but tz-naive result. Although that is something you can already obtain with the utc keyword:

In [11]: pd.to_datetime('2017-10-30 09:53:15-08:00')
Out[11]: Timestamp('2017-10-30 17:53:15')

In [12]: pd.to_datetime('2017-10-30 09:53:15-08:00', utc=True)
Out[12]: Timestamp('2017-10-30 17:53:15+0000', tz='UTC')

@jbrockmendel
Copy link
Member Author

what to return instead [...]

Seems like 3 cases:
a) all tzaware with same tz --> DatetimeTZIndex (or whatever its called)
b) all naive --> DatetimeIndex
c) mixed --> object-dtype Index/Series with each entry correct

Implementing this will be a minor hassle, but I'm bothered enough by this behavior to volunteer.

[...] and how much user code will this break?

That I have no idea. How would we go about figuring this out?

@jbrockmendel
Copy link
Member Author

Two more mismatched cases: "today" and "now". to_datetime goes through the iso8601 parser, while Timestamp goes through the datetime.now/datetime.today calls:

>>> pd.Timestamp("now")
Timestamp('2017-11-24 08:15:35.366420')
>>> pd.to_datetime("now")
Timestamp('2017-11-24 16:15:39')
>>> pd.Timestamp("today")
Timestamp('2017-11-24 08:15:42.678363')
>>> pd.to_datetime("today")
Timestamp('2017-11-24 00:00:00')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants