ENH: bad directive in to_datetime format - this uses std. strptime zone offset #13486

ghost · 2016-06-19T07:16:05Z

this is really bad for me since almost all linux logfiles contain zone offset.

As far as I can see there is not workaround without parsing the timestamp twice.

Code Sample, a copy-pastable example if possible

parse common Apache access log timestamp

pd.to_datetime(['28/Jul/2006:10:22:04 -0300'], format='%d/%b/%Y:%H:%M:%S %z')

...
/usr/lib/python2.7/dist-packages/pandas/tseries/tools.pyc in _convert_listlike(arg, box, format, name)
    381                 return DatetimeIndex._simple_new(values, name=name, tz=tz)
    382             except (ValueError, TypeError):
--> 383                 raise e
    384 
    385     if arg is None:

ValueError: 'z' is a bad directive in format '%d/%b/%Y:%H:%M:%S %z'

Expected Output

should parse timestamps with zone offset

output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Linux
OS-release: 4.4.0-24-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.17.1
nose: 1.3.7
pip: 8.1.2
setuptools: 23.0.0
Cython: None
numpy: 1.11.0
scipy: 0.17.0
statsmodels: 0.6.1
IPython: 4.2.0
sphinx: 1.3.6
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2014.10
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.4.3
matplotlib: 1.5.1
openpyxl: 2.3.0
xlrd: 0.9.4
xlwt: 0.7.5
xlsxwriter: None
lxml: 3.5.0
bs4: 4.4.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
Jinja2: None

The text was updated successfully, but these errors were encountered:

jorisvandenbossche · 2016-06-19T09:04:51Z

Timezones %z directive are indeed not supported at the moment.

Two possible workarounds (without parsing the date in two parts, the offset separately):

use python 3 and there datetime.datetime.strptime() has gained %z support:

In [2]: import datetime

In [3]: datetime.datetime.strptime('28/Jul/2006:10:22:04 -0300','%d/%b/%Y:%H:%M:%S %z')
Out[3]: datetime.datetime(2006, 7, 28, 10, 22, 4, tzinfo=datetime.timezone(datetime.timedelta(-1, 75600)))

In [4]: import sys

In [5]: sys.version
Out[5]: '3.5.1 |Anaconda 2.4.0 (64-bit)| (default, Feb 16 2016, 09:49:46) [MSC v.1900 64 bit (AMD64)]'

use dateutil. This is able to parse timezone offsets, but apparantly not the format as you have it. If you replace the first colon with a space, it works:

In [10]: from dateutil import parser

In [11]: parser.parse('28/Jul/2006:10:22:04 -0300')
...
ValueError: Unknown string format

In [12]: parser.parse('28/Jul/2006:10:22:04 -0300'.replace(':', ' ', 1))
Out[12]: datetime.datetime(2006, 7, 28, 10, 22, 4, tzinfo=tzoffset(None, -10800))

Both methods you could apply on the values in your datetime column. But that will not necessarily be faster than parsing the timezone offset separately (if performance is a bottleneck)

See also http://stackoverflow.com/questions/1101508/how-to-parse-dates-with-0400-timezone-string-in-python

jreback · 2016-06-19T19:03:11Z

actually I don't think this would be very hard to add here (and should have the single whitespace be optional)

its a pretty regular regex, and we could just be passed in directly as the tz arg (which already knows how to handle fixed offsets).

ghost · 2016-06-20T19:40:21Z

@jorisvandenbossche thanks for the great advice! Obviously the python 2 docu is wrong since %z is actually not supported. I learned a lot! At least for now I go with the dateutil workaround.
@jreback I am not into pandas performance but I assume this would be the fastest option. I just assumed that pandas implements the strptime format which is obviously not the case. I can imagine that others bump into that, too.

jreback · 2016-06-20T19:49:30Z

@markfink

just assumed that pandas implements the strptime format which is obviously not the case.

pandas DOES implement the strptime format parsing, the %z is missing

jorisvandenbossche added Datetime Datetime data dtype Timezones Timezone data dtype labels Jun 19, 2016

jreback added Enhancement Difficulty Intermediate labels Jun 19, 2016

jreback changed the title ~~bad directive in to_datetime format - this uses std. strptime zone offset~~ ENH: bad directive in to_datetime format - this uses std. strptime zone offset Jun 19, 2016

mroeschke mentioned this issue Mar 3, 2018

ENH: Parse %z and %Z directive in format for to_datetime #19979

Merged

4 tasks

jreback added this to the 0.23.0 milestone Mar 13, 2018

jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018

jreback modified the milestones: Next Major Release, 0.23.1, 0.24.0 May 23, 2018

jreback closed this as completed in #19979 May 29, 2018

bmironov mentioned this issue Nov 9, 2018

Issue parsing '%z' in timestamps via pd.to_datetime #23599

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: bad directive in to_datetime format - this uses std. strptime zone offset #13486

ENH: bad directive in to_datetime format - this uses std. strptime zone offset #13486

ghost commented Jun 19, 2016 •

edited by jreback

Loading

jorisvandenbossche commented Jun 19, 2016 •

edited

Loading

jreback commented Jun 19, 2016 •

edited

Loading

ghost commented Jun 20, 2016

jreback commented Jun 20, 2016

ENH: bad directive in to_datetime format - this uses std. strptime zone offset #13486

ENH: bad directive in to_datetime format - this uses std. strptime zone offset #13486

Comments

ghost commented Jun 19, 2016 • edited by jreback Loading

Code Sample, a copy-pastable example if possible

Expected Output

output of pd.show_versions()

INSTALLED VERSIONS

jorisvandenbossche commented Jun 19, 2016 • edited Loading

jreback commented Jun 19, 2016 • edited Loading

ghost commented Jun 20, 2016

jreback commented Jun 20, 2016

ghost commented Jun 19, 2016 •

edited by jreback

Loading

output of `pd.show_versions()`

jorisvandenbossche commented Jun 19, 2016 •

edited

Loading

jreback commented Jun 19, 2016 •

edited

Loading