Skip to content

BUG: to_datetime ignores utc=True when arg is Series #6415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelaye opened this issue Feb 20, 2014 · 10 comments · Fixed by #17109
Closed

BUG: to_datetime ignores utc=True when arg is Series #6415

michaelaye opened this issue Feb 20, 2014 · 10 comments · Fixed by #17109
Labels
Bug Timezones Timezone data dtype

Comments

@michaelaye
Copy link
Contributor

michaelaye commented Feb 20, 2014

Is this a bug or a feature that I don't understand:

data = ['20100102 121314', '20100102 121315']
print (pd.to_datetime(data, format='%Y%m%d %H%M%S', 
                      utc=True)[0]).__repr__()
print (pd.to_datetime(pd.Series(data), format='%Y%m%d %H%M%S', 
                      utc=True)[0]).__repr__()
print (pd.to_datetime(pd.TimeSeries(data), format='%Y%m%d %H%M%S', 
                     utc=True)[0]).__repr__()

results in

Timestamp('2010-01-02 12:13:14+0000', tz='UTC')
Timestamp('2010-01-02 12:13:14', tz=None)
Timestamp('2010-01-02 12:13:14', tz=None)

Using version '0.13.1'


Other example in #15760

@jorisvandenbossche
Copy link
Member

This is not a feature, but a limitation of the current implementation. The difference is that in the first case the result is a DatetimeIndex and in the second and third case the result is a Series with a column with datetime64 values.
Timezone support of datetime64 is not used in pandas, but implemented specific for DatetimeIndex, and so is at the moment not available in Series/DataFrame columns. See also recent discussion in #6032.

But, maybe we should trigger a warning that utc=True has no effect when box=False?

@michaelaye
Copy link
Contributor Author

I don't see how the status of box should influence this problem?

trueorfalse = False
data = ['20100102 121314', '20100102 121315']
print (pd.to_datetime(data, format='%Y%m%d %H%M%S', 
                      utc=True,box=trueorfalse)[0]).__repr__()
print (pd.to_datetime(pd.Series(data), format='%Y%m%d %H%M%S', 
                      utc=True,box=trueorfalse)[0]).__repr__()
print (pd.to_datetime(pd.TimeSeries(data), format='%Y%m%d %H%M%S', 
                     utc=True,box=trueorfalse)[0]).__repr__()

gives

numpy.datetime64('2010-01-02T04:13:14.000000000-0800')
Timestamp('2010-01-02 12:13:14', tz=None)
Timestamp('2010-01-02 12:13:14', tz=None)

and as box=True is the default case, with true we get above results from the initial post.

@jorisvandenbossche
Copy link
Member

Note, the output you give here, for all three cases, the timezone info is lost (the fact that it is UTC). Numpy datetime64s are just always printed in your local timezone (a quirk in numpy, that's why you see a timezone).

With box=True/False:

In [7]: pd.to_datetime(data, format='%Y%m%d %H%M%S', utc=True, box=True)[0]
Out[7]: Timestamp('2010-01-02 12:13:14+0000', tz='UTC')
In [8]: pd.to_datetime(data, format='%Y%m%d %H%M%S', utc=True, box=False)[0]
Out[8]: numpy.datetime64('2010-01-02T13:13:14.000000000+0100')
# without utc=True
In [29]: pd.to_datetime(data, format='%Y%m%d %H%M%S', box=False)[0]
Out[29]: numpy.datetime64('2010-01-02T13:13:14.000000000+0100')
In [30]: pd.to_datetime(data, format='%Y%m%d %H%M%S', box=True)[0]
Out[30]: Timestamp('2010-01-02 12:13:14', tz=None)

box determines if a DatetimeIndex or an array/Series is returned. But you are right this doesn't influence the Series pd.to_datetime(pd.Series(data), ..) case. But this is due to that box is set to False always when the input is a Series (to also return a Series), regardless of what the user specifies for box.

@jorisvandenbossche
Copy link
Member

What I want to say is: utc=True has only effect on the output when returning a DatetimeIndex (unless you are working with tz-aware datetime.datetime objects), and maybe we should give a warning when utc=True is used when it has no effect.

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

I think maybe should remove utc argument, instead accept tz, where if True can do the same as utc=True, e.g. set tz=UTC (and fix so that Series is returned correctly.

Also can coerce the resultant series/index to a timezone (and raise if the read in has a tz).

#6398 related as it fixes this for Series/index coercions

@michaelaye want to submit a PR for this?

@jreback jreback added this to the 0.14.0 milestone Feb 20, 2014
@jorisvandenbossche
Copy link
Member

@jreback Do you mean something like:

I would like that I think.

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

yes

@filmor
Copy link
Contributor

filmor commented Mar 7, 2014

As a side-note, a similar problem appears if you pass a single datetime or Timestamp object to to_datetime, i.e. the utc argument is completely ignored.

@tacaswell
Copy link
Contributor

bump on this, just caused me a whole bunch of trouble 😞

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Nov 2, 2016

In any case, what I said above:

This is not a feature, but a limitation of the current implementation. The difference is that in the first case the result is a DatetimeIndex and in the second and third case the result is a Series with a column with datetime64 values. Timezone support of datetime64 is at the moment not available in Series/DataFrame columns

is no longer the case. Series now supports tz aware data, so there is no reason to ignore utc=True when the input/output is a Series.

@jorisvandenbossche jorisvandenbossche changed the title to_datetime ignores utc=True when arg is (Time)Series BUG: to_datetime ignores utc=True when arg is Series Mar 21, 2017
mroeschke added a commit to mroeschke/pandas that referenced this issue Jul 29, 2017
mroeschke added a commit to mroeschke/pandas that referenced this issue Aug 7, 2017
Modify test case

Comment about test edit, move conversion logic to convert_listlike

Add new section in whatsnew and update test
mroeschke added a commit to mroeschke/pandas that referenced this issue Aug 7, 2017
Modify test case

Comment about test edit, move conversion logic to convert_listlike

Add new section in whatsnew and update test

Alter SQL tests
mroeschke added a commit to mroeschke/pandas that referenced this issue Aug 10, 2017
Modify test case

Comment about test edit, move conversion logic to convert_listlike

Add new section in whatsnew and update test

Alter SQL tests

Modify whatsnew and make new wrapper function to handle UTC conversion

Simiplified whatsnew and reverted arg renaming
mroeschke added a commit to mroeschke/pandas that referenced this issue Sep 1, 2017
Modify test case

Comment about test edit, move conversion logic to convert_listlike

Add new section in whatsnew and update test

Alter SQL tests

Modify whatsnew and make new wrapper function to handle UTC conversion

Simiplified whatsnew and reverted arg renaming
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants