-
Notifications
You must be signed in to change notification settings - Fork 679
_get_response without headers doesn't work (at least with 'yahoo' source #867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks, I tested here, it works like that, I was also with an error since yesterday |
Hi everyone, I am getting an error as well when I try to retrieve stock data from yahoo finance. Is this error related to the same issue as described above?
|
yes - looks to me the same |
I'm getting same error as well. Thanks for creating this fix. Did yahoo change their HTML in a recent release? |
My guess (speculation) is that yahoo tightened the requirements on url queries to verify these are legitimate browser requests, thus resulting in the breakage in case the header section of the query was missing (None). The suggested 2 lines addition to base.py (around line 152) of pandas_datareader seems to resolve the issue by explicitly forcing a valid header when one is missing. |
If it is true, then today's complication is just the beginning. Eventually, Yahoo engineers will force us to use paid services, but we can still make life a little more difficult for them. Instead of this:
do this:
or this
And don't forget to install and import the package: |
No change to # from pandas_datareader.yahoo.daily import YahooDailyReader as ydr
# import requests
USER_AGENT = {
'User-Agent': ('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko)'
' Chrome/91.0.4472.124 Safari/537.36')
}
sesh = requests.Session()
sesh.headers.update(USER_AGENT)
df = ydr(**your_kwargs, session=sesh) This works for me with e.g. the current Chrome user agent as above. See for example HSBC's HK listing 0005.HK (which has closed at the time of writing and should match your results if you check the "Historical Data" tab on finance.yahoo.com).
|
True, but in my opinion it is more elegant to abstract these details from the user, and have the library handle the various 'subtleties' internally. |
That elegance would come at the cost of making the Yahoo module more brittle. If I were a maintainer here I would not want the stream of issues that could result if/when an arbitrary, hard-coded user agent string is blocked by Yahoo. Better to let users follow the pattern of supplying their own preferred user agent strings. |
I also noticed this broke and started digging around in the code (but I'm a total newbee to python so just groping around in the dark really). Eventually, after failing to figure out why pandas datareader stopped working for yahoo, I wrote some code that uses a different url Anyways, just in case this is useful... (maybe someone can use it to figure out how to fix pandas datareader without special headers / cookies and sessions)... this is my amateurish code to read stock data from the import pandas as pd
import datetime
import requests
import dateutil
baseUrl = 'https://query1.finance.yahoo.com/v7/finance/download'
def timestamp(dt):
return round(datetime.datetime.timestamp(dt))
def get_csv_data(ticker='SPY', days=200) :
endDate = datetime.datetime.today()
startDate = endDate - datetime.timedelta(days=days)
response = requests.get(baseUrl+"/"+urllib.parse.quote(ticker), stream=True, params = {
'period1': timestamp(startDate),
'period2': timestamp(endDate),
'interval': '1d',
'events': 'history',
'includeAdjustedClose': 'true'
})
response.raise_for_status()
return pd.read_csv(response.raw)
data = get_csv_data();
print(data) Produces output like:
|
I come across same error from yesterday. Here is my code. My code means getting SPYD stock price.
The error I got is here. I think the yahoo URL have already changed, the yahoo URL written in DataReader need to change?
|
I agree with a lot of the comments here. Yes, the proposed solution by @galashour may make the yahoo module more brittle, but isn't the current implementation already brittle since it's currently broken? I agree we have to play a balancing game here: making this user friendly VS. keeping the yahoo module flexible. My angle is a mix of both suggestions already mentioned above. I think there should be a code change because one of reasons to import a library or module is so that it's ready to use "out of the box". The user shouldn't have to do extra and repetitive work, such as handling their own user agent. However, we do need to account for future changes on Yahoo's side so this doesn't break again. What are your thoughts? Can we achieve both goals? |
Simply switch to Alpha Vantage. 500 free requests per day is currently enough for my purposes. |
Hello, So in resp = self._get_response(url, params=params) can be this resp = self._get_response(url, params=params, headers=self.headers) |
I agree with the spirit of the comment, but my perspective is slightly different: So, think about it as a 'last safety measure', just in case the 'sub-class' or user didn't provide a value, so that we still end up with a valid header (lack of which introduced this little drama). |
So we have some confidence a change to Also, as a thought exercise, what are some ideas to address the "brittle"-ness of this change if Yahoo were to make future changes? Anyone on the YOLO route and just deal with Yahoo changes as they happen? Because, it's basically what we're doing right now with this current header issue. |
Indeed would be good if someone who has scripts that use also datareaders from other sources could verify that the suggested update doesn't degrade anything (it shouldn't, but worth double checking). |
There isn't really much you can do about that really. The nature of yahoo data reader is that it is essentially using yahoo internal apis. There will never be any guarantee they won't change something that breaks datareader in the future, either deliberately or simply as a consequence of them deciding to change the structure of their internal apis for whatever reason. Personally I don't actually think that yahoo broke this deliberately. But in any case, using an internal api like this will always be somewhat brittle.
Would be good to check that yes. It seems unlikely though. In some sense sending requests without setting a 'user-agent' is a bit unusual and is more likely to cause problems than the other way around. So overall I would expect a change that guarantees such a header is present to be making things less brittle (for any data provider, not just for yahoo). |
I did exactly the same thing and it works now. Probably the solution is not so elegant, but for now it works. |
Doesn't look like Yahoo going to fix problem they created so what's next? (Not found on server. Thank you for your patience. Our engineers are working quickly to resole the issue.)
|
Correct, Yahoo won't "fix" because this is a deliberate header change on their part. Since this is the datareader repo, we should try to agree on a fix. Moving to another API, such as Alpha Vantage, is an option, but not related to this repo. I think we're between @marcosinging and @galashour fixes. Any preference between the two? |
This one worked for me. Thanks
|
Make one or two PRs so the changes can be reviewed.
…On Thu, Jul 8, 2021, 15:31 shortsallday ***@***.***> wrote:
Doesn't look like Yahoo going to fix problem they created so what's next?
(Not found on server. Thank you for your patience. Our engineers are
working quickly to resole the issue.)
So what's next?
- Enhance Pandas datareader to work with Yahoo with marcosinging fix?
- use aeolio yfinance fix in 868?
- discontinue using datareader for Yahoo and find other data service
such as Alpha Vantage.
Correct, Yahoo won't "fix" because this is a deliberate header change on
their part.
Since this is the datareader repo, we should try to agree on a fix. Moving
to another API, such as Alpha Vantage, is an option, but not related to
this repo.
I think we're between @marcosinging <https://github.com/marcosinging> and
@galashour <https://github.com/galashour> fixes. Any preference between
the two?
Also, who ultimately decides which fix gets pushed?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#867 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKTSRL6NATV7Q4FSUFZPILTWWZFDANCNFSM47WMZLZQ>
.
|
Thanks Kevin,
I am not the one that knows how to make changes in Pandas datareader or how this gets done. So does someone have to raise there hand to make this change to the datareader and post it for review. I see Marcosinging proposed a fix and SaeedBohlooli <https://github.com/SaeedBohlooli> commented 1 hour ago <#867 (comment)> so I guess the review has started and then someone raises their hand that will fix and test it before going into production. I am new to GitHub so not sure how this works.
From: Kevin Sheppard
Sent: Thursday, July 8, 2021 8:58 AM
To: pydata/pandas-datareader ***@***.***>
Cc: uad1098 ***@***.***>; Comment ***@***.***>
Subject: Re: [pydata/pandas-datareader] _get_response without headers doesn't work (at least with 'yahoo' source (#867)
Make one or two PRs so the changes can be reviewed.
On Thu, Jul 8, 2021, 15:31 shortsallday ***@***.*** <mailto:***@***.***> > wrote:
Doesn't look like Yahoo going to fix problem they created so what's next?
(Not found on server. Thank you for your patience. Our engineers are
working quickly to resole the issue.)
So what's next?
- Enhance Pandas datareader to work with Yahoo with marcosinging fix?
- use aeolio yfinance fix in 868?
- discontinue using datareader for Yahoo and find other data service
such as Alpha Vantage.
Correct, Yahoo won't "fix" because this is a deliberate header change on
their part.
Since this is the datareader repo, we should try to agree on a fix. Moving
to another API, such as Alpha Vantage, is an option, but not related to
this repo.
I think we're between @marcosinging <https://github.com/marcosinging> and
@galashour <https://github.com/galashour> fixes. Any preference between
the two?
Also, who ultimately decides which fix gets pushed?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#867 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABKTSRL6NATV7Q4FSUFZPILTWWZFDANCNFSM47WMZLZQ>
.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#867 (comment)> , or unsubscribe <https://github.com/notifications/unsubscribe-auth/AUYT2BKKCQPJF5EGTTQ52B3TWW4IVANCNFSM47WMZLZQ> . <https://github.com/notifications/beacon/AUYT2BLGVSDIMC7ACVBNN5DTWW4IVA5CNFSM47WMZLZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOGQ7H4JI.gif>
|
Hi everybody.. honestly I'm really new here on github: I signed in last week because of the issue on pandasdatareader. So I don't know well how it works to fix an issue, but of course I can say I adopted that solution (proposed also by @vonOak) and it works. If we agree that it's the best solution we can update the code of pandasdatareade, but please help me in the process here on github. |
I just made a pull request using in the yahoo/daily: and also adding a 'safeguard' check in the base class. hopefully the request will go through. Thanks all for the constructive feedback. |
@883 builds on this and addresses the complaint. |
Due to some reasons, I have to use the existing pandas_datareader as I am using python 2.7 with some other old modules.
|
Adding the 2 lines to base.py didn't work for me. Do you know if there is an updated UA I should be using? Thanks |
This thread is quite outdated, and the library have changed, Also - eventually at the time (1.5 years ago) I think a different fix was eventually implemented. Consider moving to version 0.2 which seems to be working fine (or 0.1.94 which I think was the last on the 0.1 branch). |
|
Tried headers is None which returned below error: TypeError Traceback (most recent call last) File /Applications/Python/anaconda3/lib/python3.9/site-packages/pandas/util/_decorators.py:207, in deprecate_kwarg.._deprecate_kwarg..wrapper(*args, **kwargs) File /Applications/Python/anaconda3/lib/python3.9/site-packages/pandas_datareader/data.py:370, in DataReader(name, data_source, start, end, retry_count, pause, session, api_key) File /Applications/Python/anaconda3/lib/python3.9/site-packages/pandas_datareader/base.py:256, in _DailyBaseReader.read(self) File /Applications/Python/anaconda3/lib/python3.9/site-packages/pandas_datareader/yahoo/daily.py:153, in YahooDailyReader._read_one_data(self, url, params) TypeError: string indices must be integers Any idea what I am doing wrong? |
Can you explain what you mean by moving to version 0.2? What application are you referring to here? |
I assume you use this library in the context of a python application. pip install yfinance --upgrade --no-cache-dir |
got it - the library I was using was 'pandas_datareader' I did however install yfinance and got the below error: ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. basically I just want to be able to pull historically pricing using Yahoo. Is there anyway you can think of without requiring me to contact Yahoo support. Alternatively, can you suggest any other library's I can use to pull historical adjust close prices for X # stocks? |
It seems you have environment issue. Try to install Anaconda: and then create an 'Env' (or virtual env) using the relevant version of Python (I recommend using versions 3.8 or 3.9 since they have the majority of the relevant libraries you are likely to use). Again, the issue you have seems to be related to the environment/paths on your setup, rather than related to the yahoo library (regardless of if you use pandas-datareader or yfinance). resolving such issues, can be tedious, but worth it for the long run. good luck |
understood - thank you |
just out of interest - what gave of it is an environment issue vs Yahoo/yfinance? |
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. it seems like some dependencies were not resolved, can't say if it is related to the pip command itself or something else. Also 'dependency conflicts in conda-repo, and complaint that pathlib is not installed, etc. (these all seem to be generic complaints that while I can't say what triggers them, they don't seem to be related to yfinance). |
Uh oh!
There was an error while loading. Please reload this page.
to fix, I put in base.py:
The text was updated successfully, but these errors were encountered: