You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
>>>pd.to_datetime('2017-01-01T15:00:00-05:00')
... Timestamp('2017-01-01 20:00:00') # This is a timezone-naive Timestamp, but was converted to UTC
Problem description
Given a timezone-aware string (e.g., in ISO8601 format), pd.to_datetime converts the string to UTC time, but doesn't set the tzinfo to UTC.
This makes it difficult to differentiate between string that are truly timezone-naive strings (e.g., '2017-01-01') and timezone-aware strings (e.g., '2017-01-01T15:00:00-05:00').
Expected Output
Option 1: Convert the string to UTC (current behavior) and also set the timezone to UTC:
There is an open issue to add a tz= kwargs to to_datetime, see xref #13712, to make this compatible with Timestamp.
You can simply specify utc=True to get the result you want.
# default
In [1]: pd.to_datetime('2017-01-01T15:00:00-05:00')
Out[1]: Timestamp('2017-01-01 20:00:00')
# you can specify if you want utc output
In [3]: pd.to_datetime('2017-01-01T15:00:00-05:00', utc=True)
Out[3]: Timestamp('2017-01-01 20:00:00+0000', tz='UTC')
# Timestamp infers the tz by default
In [2]: Timestamp('2017-01-01T15:00:00-05:00')
Out[2]: Timestamp('2017-01-01 15:00:00-0500', tz='pytz.FixedOffset(-300)')
Normally you parse strings directly to naive (or UTC) as actually constructing the tz is not generally useful (IOW it must be a fixed time offset, rather than a common zone like US/Eastern). Then you can convert / localize into the viewing zone.
So your option 1 is [3]. option 2 I suppose could be done, but is generally non-performant as you can easily end up with mixed zones.
closing as this is essentially a duplicate of the request in #13721 having a tz= kw removes ambiguity of what the user wants (this would also remove the utc= kw).
Code Sample
Problem description
Given a timezone-aware string (e.g., in ISO8601 format),
pd.to_datetime
converts the string to UTC time, but doesn't set thetzinfo
to UTC.This makes it difficult to differentiate between string that are truly timezone-naive strings (e.g., '2017-01-01') and timezone-aware strings (e.g., '2017-01-01T15:00:00-05:00').
Expected Output
Option 1: Convert the string to UTC (current behavior) and also set the timezone to UTC:
Option 2: Don't convert to UTC and retain timezone. This is the behavior of
dateutil.parser.parse
:Output of
pd.show_versions()
pandas: 0.20.3
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.2
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.6
lxml: None
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: