Skip to content

to_datetime parsing bug when using format #4152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelaye opened this issue Jul 7, 2013 · 3 comments · Fixed by #4166
Closed

to_datetime parsing bug when using format #4152

michaelaye opened this issue Jul 7, 2013 · 3 comments · Fixed by #4166
Milestone

Comments

@michaelaye
Copy link
Contributor

import datetime as dt
import pandas as pd
val = '01-Apr-2011 00:00:01.978'
print 'pandas version:',pd.__version__
print 'Value to parse:',val
format = '%d-%b-%Y %H:%M:%S.%f'
print 'datetime.strptime        :',dt.datetime.strptime(val, format)
print 'to_datetime, w/out format:',pd.to_datetime(val)
print 'to_datetime, w/ format   :', pd.to_datetime(val, format=format)

pandas version: 0.12.0.dev-1101391
Value to parse: 01-Apr-2011 00:00:01.978
datetime.strptime        : 2011-04-01 00:00:01.978000
to_datetime, w/out format: 2011-04-01 00:00:01.978000
to_datetime, w/ format   : 2011-03-31 23:24:13.516352
@michaelaye
Copy link
Contributor Author

FYI, the reason why I want to use to_datetime() with a given format string is speed.

@hayd
Copy link
Contributor

hayd commented Jul 8, 2013

perhaps #3669 was prematurely closed (cc #2213 and #3890...)

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this issue Jul 8, 2013
@jorisvandenbossche
Copy link
Member

@hayd I don't think so, #3669 was about that the "format" argument was ignored completely. This is about that there seems to be a bug in the parser used when format is given. As an example, it doesn't matter if you give a single string or an array, both give a wrong result:

In [12]:  pd.to_datetime('01-Apr-2011 00:00:01.978', format= '%d-%b-%Y %H:%M:%S.%f')
Out[12]: Timestamp('2011-03-31 23:24:13.516352', tz=None)

In [14]:  pd.to_datetime(np.array(['01-Apr-2011 00:00:01.978']), format= '%d-%b-%Y %H:%M:%S.%f')
Out[14]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2011-03-31 23:24:13.516352]
Length: 1, Freq: None, Timezone: None

And it seems that it has something to do with the parsing of the microseconds:

In [17]:  pd.to_datetime('01-Apr-2011 00:00:01.000', format= '%d-%b-%Y %H:%M:%S.
Out[17]: Timestamp('2011-04-01 00:00:01', tz=None)

In [18]:  pd.to_datetime('01-Apr-2011 00:00:01.001', format= '%d-%b-%Y %H:%M:%S.
Out[18]: Timestamp('2011-04-01 00:16:41', tz=None)

@hayd hayd closed this as completed in #4166 Jul 9, 2013
hayd added a commit that referenced this issue Jul 9, 2013
BUG: wrong parsing of microseconds with format arg (#4152)
yarikoptic added a commit to neurodebian/pandas that referenced this issue Jul 25, 2013
* commit 'v0.12.0rc1-43-g7b2eaa4': (571 commits)
  PERF: add ix scalar get benchmark
  DOC: more prominent HDFStore store docs about storer/table formats
  BUG: invert_xaxis (negative tot_sec) triggers MilliSecondLocator (pandas-dev#3990)
  BUG: (GH4192) fixed broken unit test
  BUG: (GH4192) Fixed buglet in the broadcasting logic in Series.where
  CLN: Ignore warnings generated by 'DROP TABLE IF EXISTS' when table does not exist.
  DOC: more cookbook recipies
  DOC: update ipython_directive with changes from ipython to restart prompt number at 1 each page
  DOC: increased width of text area
  TST: fix ujson tests failures on 32-bit
  TST: raise when no data are found when trying to dld multiple symbols
  TST: Create a MySQL database and run MySQL tests on Travis.
  CLN: write the attributes in a HDFStore as strings
  TST: remove double call to yahoo finance
  DOC to_datetime warning about dayfirst strictness
  TST: to_datetime format fixes
  DOC: minor io/whatsnew doc edits
  BUG/TST: wrong parsing of microseconds with format arg (pandas-dev#4152)
  RLS: first release candidate for v0.12.0
  BLD: use the wheel url for scikits timeseries
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants