Skip to content

BUG: Yahoo finance changed chart base url. Updated _get_hist_yahoo #5812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
1 commit merged into from
Jan 1, 2014

Conversation

sglyon
Copy link
Contributor

@sglyon sglyon commented Dec 31, 2013

The start of the old url was: http://ichart.yahoo.com/ and yahoo now uses http://ichart.finance.yahoo.com/

@jreback
Copy link
Contributor

jreback commented Dec 31, 2013

do the current tests for this fail?

@sglyon
Copy link
Contributor Author

sglyon commented Dec 31, 2013

I didn't run the tests, although I probably should have. This is what I saw before:

In [19]: get_data_yahoo('AAPL', start='1/1/2010', end='12/31/2013')
---------------------------------------------------------------------------
IOError                                   Traceback (most recent call last)
<ipython-input-19-26f3355b8c21> in <module>()
----> 1 get_data_yahoo('AAPL', start='1/1/2010', end='12/31/2013')

/usr/local/anaconda/lib/python2.7/site-packages/pandas-0.12.0_1327_g12eb775-py2.7-macosx-10.5-x86_64.egg/pandas/io/data.pyc in get_data_yahoo(symbols, start, end, retry_count, pause, adjust_price, ret_index, chunksize, name)
    394     """
    395     return _get_data_from(symbols, start, end, retry_count, pause,
--> 396                           adjust_price, ret_index, chunksize, 'yahoo', name)
    397
    398

/usr/local/anaconda/lib/python2.7/site-packages/pandas-0.12.0_1327_g12eb775-py2.7-macosx-10.5-x86_64.egg/pandas/io/data.pyc in _get_data_from(symbols, start, end, retry_count, pause, adjust_price, ret_index, chunksize, source, name)
    340     # If a single symbol, (e.g., 'GOOG')
    341     if isinstance(symbols, (compat.string_types, int)):
--> 342         hist_data = src_fn(symbols, start, end, retry_count, pause)
    343     # Or multiple symbols, (e.g., ['GOOG', 'AAPL', 'MSFT'])
    344     elif isinstance(symbols, DataFrame):

/usr/local/anaconda/lib/python2.7/site-packages/pandas-0.12.0_1327_g12eb775-py2.7-macosx-10.5-x86_64.egg/pandas/io/data.pyc in _get_hist_yahoo(sym, start, end, retry_count, pause)
    194            '&g=d' +
    195            '&ignore=.csv')
--> 196     return _retry_read_url(url, retry_count, pause, 'Yahoo!')
    197
    198

/usr/local/anaconda/lib/python2.7/site-packages/pandas-0.12.0_1327_g12eb775-py2.7-macosx-10.5-x86_64.egg/pandas/io/data.pyc in _retry_read_url(url, retry_count, pause, name)
    173
    174     raise IOError("after %d tries, %s did not "
--> 175                   "return a 200 for url %r" % (retry_count, name, url))
    176
    177

IOError: after 3 tries, Yahoo! did not return a 200 for url 'http://ichart.yahoo.com/table.csv?s=AAPL&a=0&b=1&c=2010&d=11&e=31&f=2013&g=d&ignore=.csv'

This is what I see after making the change:

In [3]: get_data_yahoo('AAPL', start='1/1/2010', end='12/31/2013')
Out[3]:
              Open    High     Low   Close    Volume  Adj Close
Date
2010-01-04  213.43  214.50  212.38  214.01  17633200     206.93
2010-01-05  214.60  215.59  213.25  214.38  21496600     207.29
2010-01-06  214.38  215.23  210.75  210.97  19720000     203.99
2010-01-07  211.75  212.00  209.05  210.58  17040400     203.61
2010-01-08  210.30  212.00  209.06  211.98  15986100     204.97
2010-01-11  212.80  213.00  208.45  210.11  16508200     203.16
2010-01-12  209.19  209.77  206.42  207.72  21230700     200.85
2010-01-13  207.87  210.93  204.10  210.65  21639000     203.68
2010-01-14  210.11  210.46  209.02  209.43  15460500     202.50
2010-01-15  210.93  211.60  205.87  205.93  21216700     199.12
2010-01-19  208.33  215.19  207.24  215.04  26071700     207.92
2010-01-20  214.91  215.55  209.50  211.73  21862600     204.72
2010-01-21  212.08  213.31  207.21  208.07  21719800     201.18
2010-01-22  206.78  207.50  197.16  197.75  31491700     191.21
2010-01-25  202.51  204.70  200.19  203.07  38060700     196.35
2010-01-26  205.95  213.71  202.58  205.94  66682500     199.12
2010-01-27  206.85  210.58  199.53  207.88  61520300     201.00
2010-01-28  204.93  205.50  198.70  199.29  41910800     192.70
2010-01-29  201.08  202.20  190.25  192.06  44498300     185.70
2010-02-01  192.37  196.00  191.30  194.73  26781300     188.29
2010-02-02  195.91  196.32  193.38  195.86  24940800     189.38
2010-02-03  195.17  200.20  194.42  199.23  21976000     192.64
2010-02-04  196.73  198.37  191.57  192.05  27059000     185.69
2010-02-05  192.63  196.00  190.85  195.46  30368100     188.99
2010-02-08  195.69  197.88  194.00  194.12  17081100     187.70
2010-02-09  196.42  197.50  194.75  196.19  22603100     189.70
2010-02-10  195.89  196.60  194.26  195.12  13227200     188.66
2010-02-11  194.88  199.75  194.06  198.67  19655200     192.10
2010-02-12  198.11  201.64  195.50  200.38  23409600     193.75
2010-02-16  201.94  203.69  201.52  203.40  19419200     196.67
2010-02-17  204.19  204.31  200.86  202.55  15585600     195.85
2010-02-18  201.63  203.89  200.92  202.93  15100900     196.21
2010-02-19  201.86  203.20  201.11  201.67  14838200     195.00
2010-02-22  202.34  202.50  199.19  200.42  13948700     193.79
2010-02-23  200.00  201.33  195.71  197.06  20539100     190.54
2010-02-24  198.23  201.44  197.84  200.66  16448800     194.02
2010-02-25  197.38  202.86  196.89  202.00  23754500     195.32
2010-02-26  202.38  205.17  202.00  204.62  18123600     197.85
2010-03-01  205.75  209.50  205.45  208.99  19646200     202.07
2010-03-02  209.93  210.83  207.74  208.85  20233800     201.94
2010-03-03  208.94  209.87  207.94  209.33  13287600     202.40
2010-03-04  209.28  210.92  208.63  210.71  13072900     203.74
2010-03-05  214.94  219.70  214.63  218.95  32129300     211.70
2010-03-08  220.01  220.09  218.25  219.08  15353200     211.83
2010-03-09  218.31  225.00  217.89  223.02  32866400     215.64
2010-03-10  223.83  225.48  223.20  224.84  21293500     217.40
2010-03-11  223.91  225.50  223.32  225.50  14489300     218.04
2010-03-12  227.37  227.73  225.75  226.60  14868700     219.10
2010-03-15  225.38  225.50  220.25  223.84  17625100     216.43
2010-03-16  224.18  224.98  222.51  224.45  15961000     217.02
2010-03-17  224.90  226.45  223.27  224.12  16105600     216.70
2010-03-18  224.10  225.00  222.61  224.65  12218200     217.22
2010-03-19  224.79  225.24  221.23  222.25  19980200     214.90
2010-03-22  220.47  226.00  220.15  224.75  16300700     217.31
2010-03-23  225.64  228.78  224.10  228.36  21515400     220.80
2010-03-24  227.64  230.20  227.51  229.37  21349300     221.78
2010-03-25  230.92  230.97  226.25  226.65  19367300     219.15
2010-03-26  228.95  231.95  228.55  230.90  22888400     223.26
2010-03-29  233.00  233.87  231.62  232.39  19312300     224.70
2010-03-30  236.60  237.48  234.25  235.85  18832500     228.05
               ...     ...     ...     ...       ...        ...

[1005 rows x 6 columns]

@jreback
Copy link
Contributor

jreback commented Dec 31, 2013

the existing tests should fail if the URL is broken
odd that they dont

so maybe suppressing the error

@MichaelWS
Copy link
Contributor

I just came across this error as well. I don't see any tests for get_data_yahoo.
should we add a csv in data for it or just check if the url works? (the downside of the csv is that the data can modified causing an error)

@jreback
Copy link
Contributor

jreback commented Jan 1, 2014

no just should check that it works
similar cases are in test_data.py iirc (for other sources)

@MichaelWS
Copy link
Contributor

you were right. it is in test. it passes the test because it does not receive a 200 for the url

test_read_yahoo (main.TestDataReader) ... SKIP: Skipping test after 3 tries, Yahoo! did not return a 200 for url 'http://ichart.yahoo.com/table.csv?s=GS&a=0&b=1&c=2010&d=11&e=31&f=2013&g=d&ignore=.csv'

@jreback
Copy link
Contributor

jreback commented Jan 1, 2014

hmm

@cpcloud do you know how these tests r suppose to validate?

@MichaelWS
Copy link
Contributor

It looks like it only checks for a 200 code, but urllib2 error is actually
URLError: <urlopen error [Errno 111] Connection refused>

On Tue, Dec 31, 2013 at 8:42 PM, jreback [email protected] wrote:

hmm

@cpcloud https://github.com/cpcloud do you know how these tests r
suppose to validate?


Reply to this email directly or view it on GitHubhttps://github.com//pull/5812#issuecomment-31417053
.

@ghost
Copy link

ghost commented Jan 1, 2014

#5809
@spencerlyon2, @ljump12 can one one of you fix up the tests to catch this
in addition to fixing the url? will merge.

@jmcnamara
Copy link
Contributor

The reason that this wasn't caught in the tests is that the @network decorator used in pandas/io/tests/test_data.py doesn't/can't distinguish between an exception caused by not having a network connection and one caused by an invalid URL.

The @network(raise_on_error=True) option is probably intended to allow for this but then all of the tests requiring a network connection will fail if the connection isn't there.

I don't have a suggestion on how to fix that.

@jreback
Copy link
Contributor

jreback commented Jan 1, 2014

we prob need a way to test temporarily (eg before release) that the network tests work and are valid urls
unfort can't make this the default because otherwise get spurious failures on Travis which is annoying

ghost pushed a commit that referenced this pull request Jan 1, 2014
BUG: Yahoo finance changed chart base url. Updated _get_hist_yahoo
@ghost ghost merged commit 4bb199c into pandas-dev:master Jan 1, 2014
@ghost
Copy link

ghost commented Jan 1, 2014

Thanks for the fix. We'll move the @network problem to it's own issue.

@jtratner
Copy link
Contributor

jtratner commented Jan 1, 2014

That error is actually a little misleading in its message. To be clear, DataReader fails with the equivalent of:

IOError("after 3 tries, Yahoo! did not return a 200 for url")

Which gets caught by the network decorator b/c it's an IOError and converted into : raise SkipTest("Skipping test %s" % exc) --> raise SkipTest("Skipping test after 3 tries, Yahoo! did not return a 200 for url").

@ghost ghost mentioned this pull request Jan 3, 2014
@ghost
Copy link

ghost commented Jan 3, 2014

...so now every pandas version out there except git master is broken? nice move, yahoo.

@ghost
Copy link

ghost commented Jan 4, 2014

cc @yarikoptic. You should probably add this as a distro patch for 0.13.0.

@jtratner
Copy link
Contributor

jtratner commented Jan 4, 2014

@y-p would it be difficult for us to put out a new release right now with this patch? (even if we had to create a separate branch or something)

@ghost
Copy link

ghost commented Jan 4, 2014

We could tag a 0.13.1 right now, but we may still have to put out a 0.13.2 in two weeks
since 0.13.x still hasn't gotten into users' hands and we haven't heard back about bugs.

If we put out 0.13.0 and commit to a 0.13.1 in a couple of weeks no matter what, we can
avoid the version churn without making users suffer too long. It's broken for everyone already,
it's not just 0.13.0.

Not sure, it's just that getting the binaries up is a bottleneck.

But if you and @jreback feel differently, we can do that.

@jreback
Copy link
Contributor

jreback commented Jan 4, 2014

i think the current plan is fine
release will never be perfect
let's release 0.13.0 (with the Numexpr fix back ported if needed)

then do a 0.13.1 in a few weeks

can post an easy monkey patch that would work for the yahoo fix (just monkey patch the function it's pretty small to issue / ml)

@yarikoptic
Copy link
Contributor

jreback [email protected] wrote:

i think the current plan is fine
release will never be perfect
let's release 0.13.0 (with the Numexpr fix back ported if needed)

I thought 0.13.0 was released already... let's not grow zombies

Sent from a phone which beats iPhone.

@ghost
Copy link

ghost commented Jan 5, 2014

0.13.0 has been tagged. We're waiting for wes to push the binaries to pypi, after which
we'll send out an ANN to the ml. a couple of weeks or so after that 0.13.1 will be released
with this (yahoo) fix and any other critical fixes for bugs users report in 0.13.0.

We hope that on top of the RC that is enough to ensure a stable release is available
during the next 3-month release cycle for 0.14 without us having to siphon off attention
into maintaining and backporting patches to multiple stable branches.

That's the plan for now.

@yarikoptic
Copy link
Contributor

On Sat, 04 Jan 2014, y-p wrote:

0.13.0 has been tagged. We're waiting for wes to push the binaries to
pypi, after which
we'll send out an ANN to the ml. a couple of weeks or after that 0.13.1
will be released
with this (yahoo) fix and any other critical fixes for bugs users report
in 0.13.0.

We hope that, on top of the RC is enough to ensure a stable release is
available
during the next 3-month release cycle for 0.14 without us having to siphon
off attention
into maintaining and backporting to multiple stable branches.

That's the plan for now.

sounds good to me, besides I would have forgot about 0.13.0
announcement, absorbed all the fixes for issues found so far, released
0.13.1 asap (not meaning tomorrow but asap ;) ), and announce that one
;)

in my case -- currently I have in the patch queue (will upload to Debian
proper shortly since should resolve armel issue)

deb_skip_sequencelike_on_armel
0001-BLD-fix-cythonized-msgpack-extension-in-setup.py-GH5.patch
0001-Add-division-future-import-everywhere.patch
0002-remove-explicit-truediv-kwarg.patch
0001-BUG-Yahoo-finance-changed-ichart-url.-Fixed-here.patch

unfortunately those numexpr "fixes"
(0001-Add-division-future-import-everywhere.patch and
0002-remove-explicit-truediv-kwarg.patch) seemed to have no healing
effect though, and there is a minor
#5851 failing e.g. on wheezy

Yaroslav O. Halchenko, Ph.D.
http://neuro.debian.net http://www.pymvpa.org http://www.fail2ban.org
Senior Research Associate, Psychological and Brain Sciences Dept.
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834 Fax: +1 (603) 646-1419
WWW: http://www.linkedin.com/in/yarik

@jtratner
Copy link
Contributor

jtratner commented Jan 5, 2014

Okay, for now I'm going to block using numexpr with pandas for versions
under 1.4.2, which will make those tests skip.

@jreback
Copy link
Contributor

jreback commented Jan 5, 2014

yep better -

@liuyigh
Copy link

liuyigh commented Jan 6, 2014

It seems Yahoo made the old ichart url working again. I am using 0.12.

In [5]: web.get_data_yahoo('HMIN','12/23/2010','12-12-2013')
Out[5]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 748 entries, 2010-12-23 00:00:00 to 2013-12-12 00:00:00
Data columns (total 6 columns):
Open         748  non-null values
High         748  non-null values
Low          748  non-null values
Close        748  non-null values
Volume       748  non-null values
Adj Close    748  non-null values
dtypes: float64(5), int64(1)

@jtratner
Copy link
Contributor

jtratner commented Jan 6, 2014

that's good, was a good opportunity to expose the URLs used anyways, so
probably net positive for us

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants