Skip to content

Passing a timezone-aware string to pd.to_datetime creates a timezone-naive Timestamp #16898

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shshe opened this issue Jul 12, 2017 · 4 comments
Labels
Duplicate Report Duplicate issue or pull request Timezones Timezone data dtype

Comments

@shshe
Copy link

shshe commented Jul 12, 2017

Code Sample

>>> pd.to_datetime('2017-01-01T15:00:00-05:00')
... Timestamp('2017-01-01 20:00:00')  # This is a timezone-naive Timestamp, but was converted to UTC

Problem description

Given a timezone-aware string (e.g., in ISO8601 format), pd.to_datetime converts the string to UTC time, but doesn't set the tzinfo to UTC.

This makes it difficult to differentiate between string that are truly timezone-naive strings (e.g., '2017-01-01') and timezone-aware strings (e.g., '2017-01-01T15:00:00-05:00').

Expected Output

Option 1: Convert the string to UTC (current behavior) and also set the timezone to UTC:

>>> pd.to_datetime('2017-01-01T15:00:00-05:00')
... Timestamp('2017-01-01 20:00:00+0000', tz='UTC')

Option 2: Don't convert to UTC and retain timezone. This is the behavior of dateutil.parser.parse:

>>> pd.to_datetime(parser.parse('2017-01-01T15:00:00-0500'))
... Timestamp('2017-01-01 15:00:00-0500', tz='tzoffset(None, -18000)')

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.5.3.final.0 python-bits: 64 OS: Linux OS-release: 4.1.35-pv-ts1 machine: x86_64 processor: byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.3
pytest: 3.1.2
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.1
xarray: None
IPython: 6.1.0
sphinx: 1.6.2
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: 1.2.1
tables: 3.3.0
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: 2.4.7
xlrd: 1.0.0
xlwt: None
xlsxwriter: 0.9.6
lxml: None
bs4: 4.6.0
html5lib: 0.999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Jul 12, 2017

There is an open issue to add a tz= kwargs to to_datetime, see xref #13712, to make this compatible with Timestamp.

You can simply specify utc=True to get the result you want.

# default
In [1]: pd.to_datetime('2017-01-01T15:00:00-05:00')
Out[1]: Timestamp('2017-01-01 20:00:00')

# you can specify if you want utc output
In [3]: pd.to_datetime('2017-01-01T15:00:00-05:00', utc=True)
Out[3]: Timestamp('2017-01-01 20:00:00+0000', tz='UTC')

# Timestamp infers the tz by default
In [2]: Timestamp('2017-01-01T15:00:00-05:00')
Out[2]: Timestamp('2017-01-01 15:00:00-0500', tz='pytz.FixedOffset(-300)')

Normally you parse strings directly to naive (or UTC) as actually constructing the tz is not generally useful (IOW it must be a fixed time offset, rather than a common zone like US/Eastern). Then you can convert / localize into the viewing zone.

So your option 1 is [3]. option 2 I suppose could be done, but is generally non-performant as you can easily end up with mixed zones.

@jreback jreback added the Timezones Timezone data dtype label Jul 12, 2017
@jreback
Copy link
Contributor

jreback commented Jul 12, 2017

closing as this is essentially a duplicate of the request in #13721 having a tz= kw removes ambiguity of what the user wants (this would also remove the utc= kw).

@jreback jreback closed this as completed Jul 12, 2017
@jreback jreback added the Duplicate Report Duplicate issue or pull request label Jul 12, 2017
@jreback jreback added this to the No action milestone Jul 12, 2017
@ryanjdillon
Copy link

I have found that the following gives me naive timestamps:

pandas.to_datetime(df['times'], utc=True)

Whereas the following gives me tz aware timestamps:

pandas.to_datetime(df['times'].values, utc=True)

What might be going on here?

@worthy7
Copy link

worthy7 commented Jun 18, 2021

image
Doesn't seem to care at all about UTC, it's always tz-aware.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

4 participants