-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
[BUG]Fix read_html error when URL include Unicode #50259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Sorry, I'll fix some tests first. |
CI failure is just rate limit exceeded issue. I'll rerun later.
|
Now, 32 Bit Linux test is canceled due to a timeout issue, and I think it is debugging in the below PR. however, failed and canceled tests are irrelevant issues for this PR, so after fixing the above test, I think this PR is fine. |
All tests are passed now. Could someone help me to review it?@phofl, @mroeschke |
I'm not sure this is something we should be special-casing. Looks like there has been some upstream conversation on this already in Python https://bugs.python.org/issue3991 Not an expert on the RFCs but I think we would just want to defer to the language FYI since the characters included are non-printable the test case is a bit deceiving. If you remove the non-printable characters everything works fine |
@WillAyd Sorry for the late reply due to my new year holidays. Let's close the related issues or pull requests, and Let's clearly state this is the spec and not the bug. |
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.Let's tackle the last one after getting the approval for this PR.
I fixed the unicode URL issue for
read_html
function by converting the unicode-style URL.