-
Notifications
You must be signed in to change notification settings - Fork 679
resolve issue #867 by explicitly adding headers to url _get_response #873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm relatively new, the fix is simple and addresses issue #867 , without the fix, using yahoo source fails (/crashes). I'm not sure if / how to do some of the above, but I did test that before the fix the library crashes and after the fix it completes. Also others commenting on issue #867 confrimed the same. I'm not sure if/how to performm the above checkboxes, and in which environment, possibly you guys can help. |
Hi galashour, I made a similar PR a couple of days ago here but was made aware of an issue by @ralexx with whom you had a similar discussion here. As there is no reason to have multiple PRs for the same problem, I'll close mine and continue to contribute here. I hope the issue raised by @ralexx can be solved soon as I want the YDR back to work. However, you should test if you actually get any data back and not just an empty DataFrame. I've noticed that I had to change the default interval from |
Thanks for the comment.
|
I can also get data now without the interval modification. Well, I hope you get your PR accepted asap. |
Please do not merge this pull request as it appears now. It will break the idiomatic use of user-defined I appreciate that @galashour is intent on embedding a default user agent string in pandas_datareader. To be clear, I have already submitted that this is both unnecessary and an anti-pattern. But leaving aside my protest, this patch currently takes a harmful approach that will unnecessarily break valid use cases. ProblemThese changes force all users to monkey patch self.headers = {
"Connection": "keep-alive",
"Expires": str(-1),
"Upgrade-Insecure-Requests": str(1),
# Google Chrome:
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 "
"(KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36"
),
} This is because in
The proposed change to Because no public Non-breaking approachIf you remain intent on providing a hard-coded, fallback user agent string, it can be accomplished without breaking any use case. Note that the # pandas_datareader._utils.py
DEFAULT_USER_AGENT = <some user agent string> # so far this is the only change needed for Yahoo API
def _init_session(session, retry_count=3):
if session is None:
session = requests.Session()
# do not set requests max_retries here to support arbitrary pause
session.headers['User-Agent'] = DEFAULT_USER_AGENT # <-- NON-BREAKING CHANGE
return session |
There is No HARD CODED header content in the fix, not sure if you actually read the actual fix. It puts on line 153 of the yahoo/daily.py the 3rd parameter that was missing, with NO hardcoded string: Similarly, it also put a safeguard in the base class (again happens to be line 153) only for the case somehow we end up with headers == None: |
As I said, even if you sort this issue out, your patch still forces all users to pass headers if they don't want to use the headers as defined above. There are other ways of accomplishing your purpose without disadvantaging users. |
Once again - wrong.
Not sure where all these false claims are coming from, since clearly you didn't read the fix. The original problem was that in yahoo/daily, the _read-one_data method didn't use the self.headers (that was already set correctly and just wasn't used in the url request). The very simple fix was just to add the 3rd parameter (with the existing value) to the function call: Instead of:
The fix is:
You really can't get more elegant, minimalistic and without changing any paradigm or interface. More so, in the internal testing I printed the 'existing' self.headers and verified it was (already) set with a valid value, and was just not used, and the trivial fix was to add it to resolve the issue. I do ask that the PR will be handled soon (this way or another), since currently the interface in the official build is broken when working with yahoo (and the proposed fix elegantly addresses it). |
Please be kind, I am trying to help you. I understand that your changes work for you, but so far you have not provided any tests for your proposed changes so it's not clear that your changes work for other use cases. As the author of a PR it's incumbent upon you to address fellow users' concerns. You propose a change that violates the public API of
This is not accurate. I demonstrated this to you in #867. Anyone using version 0.9 can download historical prices today by passing a Or, by making a trivial method call, you can even use the same hard-coded request headers on lines 84-93 of yahoo/daily.py that your patch points to: from pandas_datareader.yahoo.daily import YahooDailyReader
reader = YahooDailyReader('MSFT', start='2021-07-01', end='2021-07-09')
# confirm that the default headers are the ones we don't want
print(reader.session.headers)
# {'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive'}
# now update the *session* headers from `YahooDailyReader.headers`
reader.session.headers.update(reader.headers)
print(reader.session.headers)
# {'Connection': 'keep-alive',
# 'Expires': '-1',
# 'Upgrade-Insecure-Requests': '1',
# 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'}
# download prices as usual
df = reader.read()
print(df)
# High Low Open Close Volume Adj Close
# Date
# 2021-07-01 271.839996 269.600006 269.609985 271.600006 16725300 271.600006
# 2021-07-02 278.000000 272.500000 272.820007 277.649994 26458000 277.649994
# 2021-07-06 279.369995 274.299988 278.029999 277.660004 31565600 277.660004
# 2021-07-07 280.690002 277.149994 279.399994 279.929993 23260000 279.929993
# 2021-07-08 278.730011 274.869995 276.899994 277.420013 24618600 277.420013
# 2021-07-09 278.049988 275.320007 275.720001 277.940002 23905500 277.940002 Please confirm that you are able to run this code. |
The API that used to work was simple and is used by many:
This API is now broken. It is not only for me, quite a few on issue #867 (and also on issue #868 that is a duplicate) - reported the same problem, and reported it works for them with the simple fix applied. Original API was broken due to a 'bug' in the existing that yahoo.daily as I mentioned that 'forgot' to add the self.headers that was already set internally correctly (with no explicit value from the users), while a preferred practice regrdless to yahoo is to use headers in the url-request (as can be seen also in the base class). I will say it one last time, as nicely as I can, I didn't violate any public API, nor did I change any API (public or internal). Simply added a parameter that was already there unused while it should have been used. Lastly, I will add the yfinance package used the same approach as I'm suggesting to resolve the same failure they experienced last week with yahoo source. Addition
=== === === run results seem to work perfectly (thus, as desired, no breakage introduced by the proposed simple fix):C:\Users\user\anaconda3\envs\p38web2\python.exe D:/Python/Sandbox/index_tables.py {'User-Agent': 'python-requests/2.25.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '/', 'Connection': 'keep-alive'} {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36', 'Accept-Encoding': 'gzip, deflate', 'Accept': '/', 'Connection': 'keep-alive', 'Expires': '-1', 'Upgrade-Insecure-Requests': '1'} [6 rows x 6 columns] Process finished with exit code 0 |
#883 builds on this while addressing the issues it created. |
Please try master to see if it is working now.
If there are positive reports, I'll release very soon. |
Looks good from my side. Thanks (!) |
Fixed in master. Thanks for the headstart. |
git diff upstream/master -u -- "*.py" | flake8 --diff
black --check pandas_datareader