Skip to content

BUG: to_datetime issue parsing non-zero padded month in 0.17.1 #11871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dpinte opened this issue Dec 20, 2015 · 5 comments
Closed

BUG: to_datetime issue parsing non-zero padded month in 0.17.1 #11871

dpinte opened this issue Dec 20, 2015 · 5 comments
Labels
Bug Datetime Datetime data dtype
Milestone

Comments

@dpinte
Copy link

dpinte commented Dec 20, 2015

In pandas 0.16.2, the following date (non-zero padded month) was parsing correctly:

>>> import pandas
>>> pandas.__version__
'0.16.2'
>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d')
Timestamp('2005-01-13 00:00:00')

With 0.17.1, it raises a ValueError:

>>> import pandas
>>> pandas.__version__
u'0.17.1'
>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/util/decorators.py", line 89, in wrapper
    return func(*args, **kwargs)
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 276, in to_datetime
    unit=unit, infer_datetime_format=infer_datetime_format)
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 397, in _to_datetime
    return _convert_listlike(np.array([ arg ]), box, format)[0]
  File "/Users/dpinte/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas/tseries/tools.py", line 383, in _convert_listlike
    raise e
ValueError: time data '2005-1-13' does match format specified

Even if %m is supposed to be used for zero-padded month definitions, Python's strptime function parses them properly.

Is this a known issue?

@dpinte
Copy link
Author

dpinte commented Dec 20, 2015

It sounds like the following works :

>>> pandas.to_datetime('2005-1-13', format='%Y-%m-%d', infer_datetime_format=True)
Timestamp('2005-01-13 00:00:00')

This could be related to #11142 and considered as a regression. Having to guess the datetime_format when the given format is the appropriate one is overkilll:

>>> from pandas.tseries import tools
>>> tools._guess_datetime_format('2005-1-13')
'%Y-%m-%d'

@chris-b1
Copy link
Contributor

This PR (conveniently also mine) is a more likely cause for the problem - I'll take a look later.
#10615

@chris-b1
Copy link
Contributor

This happens because there is a special fastpath (in C) for iso8601 formatted dates, but that code doesn't handle dates without leading 0s. As a workaround, you can just not specify the format -

To fix this, probably either need to:

  1. Let fastpath code fall back to the regular parser. This code is already pretty complex, and this would just make it more so.
  2. Update C code to handle dates without leadings 0s. Not sure if this can be done in a performance neutral way?

@jorisvandenbossche jorisvandenbossche added Bug Datetime Datetime data dtype labels Dec 22, 2015
@jorisvandenbossche jorisvandenbossche added this to the 0.18.0 milestone Dec 22, 2015
@dpinte
Copy link
Author

dpinte commented Dec 29, 2015

@chris-b1 The second option is definitely the best one as it would keep the behaviour closer to the standard behaviour of strptime. Even if it is not performance neutral, it should not add a serious overhead to support no leading-zero's in the C code.

@jreback
Copy link
Contributor

jreback commented Dec 29, 2015

yes, more flexibility is good here. BTW this is quite straightforward to do as this is pretty straightforward c-code.

@jreback jreback changed the title to_datetime issue parsing non-zero padded month in 0.17.1 BUG: to_datetime issue parsing non-zero padded month in 0.17.1 Dec 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants