Skip to content

Garbled dates in pandas 0.18.0 #12808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
aisthesis opened this issue Apr 5, 2016 · 9 comments
Closed

Garbled dates in pandas 0.18.0 #12808

aisthesis opened this issue Apr 5, 2016 · 9 comments
Labels
Compat pandas objects compatability with Numpy or Python functions

Comments

@aisthesis
Copy link

Using pandas 0.18.0 and pandas-datareader 0.2.1:

>>> from pandas_datareader.data import Options
>>> tsla = Options('tsla', 'yahoo')
>>> data = tsla.get_all_data()
>>> data.index.levels[1]
DatetimeIndex(['2008-04-16', '2015-04-16', '2016-09-16', '2017-06-16',
       '2019-01-18', '2020-01-17', '2020-05-16', '2022-04-16',
       '2029-04-16'],
      dtype='datetime64[ns]', name='Expiry', freq=None)

The above expiries are non-sensical and are the result of confusing the day of the month with the year. Using pandas 0.17.1 and pandas-datareader 0.2.1, I correctly get:

>>> from pandas_datareader.data import Options
>>> tsla = Options('tsla', 'yahoo')
>>> data = tsla.get_all_data()
>>> data.index.levels[1]
DatetimeIndex(['2016-04-08', '2016-04-15', '2016-04-22', '2016-04-29',
           '2016-05-06', '2016-05-13', '2016-05-20', '2016-06-17',
           '2016-09-16', '2017-01-20', '2018-01-19'],
          dtype='datetime64[ns]', name='Expiry', freq=None)

The relevant code seems to be run by pandas.io.parsers.TextParser, but I haven't tracked it further.

Cf. pydata/pandas-datareader#193 and GriffinAustin/pynance#28

@aisthesis
Copy link
Author

The issue is actually with python-dateutil 2.5.2. Downgrading pandas didn't fix the issue in the virtual env where I was having the problem. So I compared other libraries. pandas 0.18.0 works as long as I have python-dateutil 2.4.2 installed and not 2.5.2.

@jreback
Copy link
Contributor

jreback commented Apr 5, 2016

this is probably python-dateutil>=2.5.0 where some things changed.

but this is a datareader issue.

@jreback jreback closed this as completed Apr 5, 2016
@jreback
Copy link
Contributor

jreback commented Apr 5, 2016

I already fixed pandas: #12731

@aisthesis
Copy link
Author

It actually isn't a datareader issue but, as you surmized a python-dateutil issue. I'm going to file it with them.

@jreback jreback added Data Reader Compat pandas objects compatability with Numpy or Python functions labels Apr 5, 2016
@jreback
Copy link
Contributor

jreback commented Apr 5, 2016

@aisthesis but, pandas-datareader actually needs to parse things differently. You can't rely up on the dayfirst/yearfirst flags anymore.

@pganssle
Copy link
Contributor

pganssle commented Apr 6, 2016

FYI, the issues with the dayfirst argument are a bug specifically in the 2.5.2 release, and will be fixed in the forthcoming 2.5.3 release (they are fixed in master and on the 2.5.x branch). python-dateutil>=2.5.0 should be API-compatible with previous versions. The 2.5.2 release has no non-dayfirst related bugfixes in it, so pinning 2.5.1 for now should be fine.

@pganssle
Copy link
Contributor

pganssle commented Apr 6, 2016

Actually, per the other discussion, I didn't realize that some people seem to have been counting on the bug behavior (I didn't realize that one would specify dayfirst unless you actually wanted the day to come first). That's my bad. Yes, there was a slight ongoing change in behavior. Given that Jeff was involved in one of the issue reports on this, I should have figured he actually understood the change :P.

@jreback
Copy link
Contributor

jreback commented Apr 6, 2016

@pganssle yeah I mainly adjusted some of our tests to match what dateutil is doing. We use dateutil only as a fallaback or if dayfirst/yearfirst are explicit passed. We have an issue #12585 about this. the biggest problem is when someone actually passes MIXED dayfirst/yearfirst dates (weird!). but it does happen. So we have to adjust some logic in order to NOT parse some of these.

@pganssle
Copy link
Contributor

pganssle commented Apr 6, 2016

@jreback Yes, that's understandably confusing. When using pandas, I tend to use per-column bound strptime or dateutil.parse functions if I need to handle a mixture of date formats, but obviously that doesn't work in all instances.

I'm hoping that resolving dateutil/dateutil#125 (plus dateutil/dateutil#214) will be of some help here. I've long felt that the "black box" nature of the parser is causing the growth of an unpleasantly complicated set of toggles to tell the parser how to handle various ambiguities, particularly because we only return "here's the date I found" rather than "here's the date I found - it only had a year and a month, is that what you wanted?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

No branches or pull requests

3 participants