Skip to content

Update daily.py #869

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Update daily.py #869

wants to merge 2 commits into from

Conversation

s-meinhardt
Copy link

@s-meinhardt s-meinhardt commented Jul 5, 2021

Why:
Yahoo Finance must have changed their API in the last few weeks causing errors or no returned data.

What:
I've changed the default interval to "1d" instead of "d" and added the missing headers in the _get_response call in the _read_one_data method of the YahooDailyReader class. With the proposed change, I was able to get the data from Yahoo's Finance API. Note that other intervals are effected too, but the DataReader class seems to use "d" only.

I've changed the default interval to `"1d"` instead of `"d"` as Yahoo must have changed their API in the last couple of weeks. With the proposed change, I was able to get the data from Yahoo's Finance API.
@bashtage
Copy link
Contributor

bashtage commented Jul 6, 2021

Does this fix work locally?

@s-meinhardt
Copy link
Author

I also had to add the headers to the _get_response call in the _read_one_data function which I didn't in the first commit. The missing headers raised the RemoteDataError in the _get_response function. Once the headers are added, the interval update is necessary to get a non-empty DataFrame. After these two changes, it was running fine locally and returned the price data etc.

@ralexx
Copy link

ralexx commented Jul 8, 2021

Merging 389ce32 will break the use case of passing a correctly parameterized requests.Session instance to YahooDailyReader via the session argument. Second, the change will further encourage the user anti-pattern of relying on a hard-coded user agent string, with a high likelihood of further issues being opened if said user agent string is blocked. Third, should the API owners implement robot blocking by filtering user agent strings for statistical outliers, hard-coding an arbitrary browser string will quickly make that string an outlier as its browser version ages.

All of these drawbacks can be avoided simply by pointing users to a documented recipe to pass in their choice of user agent string through the existing YahooDailyReader.session attribute as described above. requests is already present where pandas-datareader is installed, so no additional dependencies are created. I'm willing to open a PR for the doc change.

Problem example

When the headers parameter is passed to requests.Session.get, it replaces any headers that have already been bound to the Session instance. Anyone using a parameterized Session as I do will be forced to monkey patch their headers to YahooDailyReader.headers if 389ce32 is merged.

# from pandas_datareader.yahoo.daily import YahooDailyReader 
# import requests

USER_AGENT = {
    'User-Agent': ('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)'
                   ' Chrome/91.0.4472.124 Safari/537.36')
    }
my_headers = {**various_headers, **USER_AGENT}
sesh = requests.Session()
sesh.proxy = {<proxy settings>}  # one of the reasons I need to pass a Session instance to YDR
sesh.headers = my_headers


reader = YahooDailyReader(session=sesh, **other_kwargs)
# now I will have to monkey patch my headers
reader.headers = my_headers
df = reader.read()

Even if you disagree with me and prefer to enable the use of a 'fallback' hard-coded user agent string, please do not implement it as proposed in 389ce32. At a minimum, add a headers=None argument to YahooDailyReader.__init__ to remain consistent with the existing class pattern where all instance attributes except pause_multiplier are set via init arguments. Then move the default header to be a class attribute and bind it to the instance only where the default headers=None is passed.

@s-meinhardt
Copy link
Author

Hi @radexx,

I understand your point and see why one might want to have a user defined header. I'll close this PR and continue to contribute at PR 873 attacking the same issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants