Skip to content

Replace Yahoo iCharts API #331

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed

Conversation

rgkimball
Copy link
Contributor

Based on multiple issues raised with this and similar libraries, it appears that Yahoo's iCharts API is being phased out. The new API requires a cookie and updated URL/parameter structure, implemented by this PR. See:

This will fix standard pulls for interval prices for a single stock, but not yet for multiple stocks - if the maintainers agree with the approach I can continue to debug and build this out, otherwise we can scrap it.

params=self.params, headers=self.headers)
out = str(self._sanitize_response(response))
# Matches: {"crumb":"AlphaNumeric"}
regex = re.search(r'{"crumb" ?: ?"([A-Za-z0-9.]{11,})"}', out)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is just me, but I had to change the regex a bit to make it work:

regex = re.search(r'"crumb":"([A-Za-z0-9.]{11})"', out)

I guess stripping out {} might be the key as I saw the crumb show up in the middle of the array:

"UHAccountSwitchStore":{"site":"fpctx","crumb":"KV.dLYWGrgK","sendRequest":false,"isEnabled":true}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good note, I'll push an update. All of my test cases found the crumbs nested in brackets but that's not very sustainable.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯 👍

@rgkimball
Copy link
Contributor Author

rgkimball commented May 18, 2017

Latest commit addresses the remaining tests. I'm noticing Yahoo's API fails periodically - I tried addressing this with a pause multiplier and cookie refresher for subsequent requests, which significantly improves successful requests, but I still hit "Unable to read URL" errors every so often. This is resolved by simply running the test again, but there ought to be a better way of getting around this. Anyone have ideas here?

Also, I'm failing the test_yahoo_DataReader test but I can't figure out why - the frames appear to be identical to me.

Not sure what's gone wrong with Eurostat but I can't get that one to pass locally either, even on master.

@aking1012
Copy link

aking1012 commented May 19, 2017

When the old API bombed on me, and it did during initial fetches, but not so much on incremental updates for historical data, I suspect it had to do with bandwidth per as opposed to requests per. I say this, because I had 4 processes swallowing a queue of symbols to fetch and went after them one per process at a time. It could be more convenient and faster to fetch multiple symbols at once, but for large chunks of time or small periods with medium chunks of time that bandwidth per minute making it bomb out would make sense.

On unable to read url errors, have you tried giving it a second chance on all fetches inside the fetcher instead of the test?

params=self.params, headers=self.headers)
out = str(self._sanitize_response(response))
# Matches: {"crumb":"AlphaNumeric"}
regex = re.search(r'"crumb" ?: ?"([A-Za-z0-9.\\]{11,})"', out)
Copy link

@chris-b1 chris-b1 May 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This token can contain some special characters and escaped unicode. For my local workaround this seems to work consistently (python3, may need adjusting for compat)

pat = r'"CrumbStore":{"crumb":"([^"]+)"}'
token = re.findall(pat, out)[0]
token = token.encode('ascii').decode('unicode-escape')

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, Chris! I've updated the pattern for your update, this does seem to handle edge cases that I was missing.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idealy you can add some more tests to cover the actual error conditions. not sure how tricky that is though.

@@ -1,8 +0,0 @@
"%PYTHON%" setup.py install
if errorlevel 1 exit 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are remove in master (so rebase to remove them from your PR)

@@ -85,6 +86,8 @@ def _read_url_as_StringIO(self, url, params=None):
response = self._get_response(url, params=params)
text = self._sanitize_response(response)
out = StringIO()
if len(text) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally return a more informative error (e.g. service name / url)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the error to include subclass and requested URL, and cleaned up my PR.

@@ -132,17 +127,18 @@ def test_get_data_multiple_symbols(self):
def test_get_data_multiple_symbols_two_dates(self):
pan = web.get_data_yahoo(['GE', 'MSFT', 'INTC'], 'JAN-01-12',
'JAN-31-12')
result = pan.Close.ix['01-18-12']
assert len(result) == 3
result = pan.Close['01-18-12'].transpose()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.T is common

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add a release note (will push a new whatsnew a bit later)

@arose13
Copy link

arose13 commented May 21, 2017

Are these changes are only in the master branch but not in PyPi?

@rgkimball
Copy link
Contributor Author

@arose13 These have not yet been merged in.

@arose13
Copy link

arose13 commented May 21, 2017

@rgkimball my bad, is that because of the CI tests?

@rgkimball
Copy link
Contributor Author

No worries - these changes have been approved, we just need some release documentation.

@jreback
Copy link
Contributor

jreback commented May 22, 2017

@rgkimball so a couple of tests are failing can you fix up the tests as needed (if they are applicable to this change)

@mijcs18
Copy link

mijcs18 commented May 23, 2017

Swiss and Tokyo Exchanges are not working. Tokyo isn't showing up in the English version of Yahoo Finance for some reason but Swiss Does.

@willfleury
Copy link

Not all assets are returned as numeric type. For instance symbol 'BRK-A' is returned as a string data type and I must apply numeric coercion to it df.apply(pd.to_numeric, errors='coerce')

@bkcollection
Copy link

Is this still on going and will be merge into the new release of 0.4.1?

@rgkimball
Copy link
Contributor Author

Eventually, that's the plan. I don't have time to work on this until next week, unfortunately.

@bkcollection
Copy link

bkcollection commented Jun 3, 2017

@rgkimball found that there are many 'null' string in the data with new API. Will it possible to fix that?

2016-06-08,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-09,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-10,AWG,null,null,null,null,null,null
2016-06-13,AWG,null,null,null,null,null,null
2016-06-14,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,1000
2016-06-15,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,2000
2016-06-16,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0
2016-06-17,AWG,0.600000,0.600000,0.600000,0.600000,0.600000,0

@OlegShteynbuk
Copy link
Contributor

OlegShteynbuk commented Jun 5, 2017

regarding many 'null' string in the data with new API see
pandas-dev/pandas#16471
#342 has temporary fix

@jreback
Copy link
Contributor

jreback commented Jun 10, 2017

can u see if u can get this passing

@gusgordon
Copy link

Thanks - what's the status of this?

@gliptak
Copy link
Contributor

gliptak commented Jun 17, 2017

I just submitted a documentation change PR #349 and Travis is fails with number of errors https://travis-ci.org/pydata/pandas-datareader/builds/244005471

@gliptak
Copy link
Contributor

gliptak commented Jun 24, 2017

After #352 #351 #350 are on master, I open a PR to update for Yahoo API changes. Thanks

@jreback
Copy link
Contributor

jreback commented Jul 2, 2017

superseded by #355

@jreback jreback closed this Jul 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.