-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug with read_json from str reporting "Protocol not known" #43594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for reporting this @Peterl777. Does this fall under the same umbrella as #36271? |
Ah I hadn't seen that one. I searched, but didn't come across #36271. Probable duplicate. Appears to be the same. But I don't think @steve-mavens proposed fix would fix my issue, as my string is in fact a valid URL. I note the error message in that report has changed - that report shows "ImportError: Missing optional dependency 'fsspec'". But that code and my code now both report "ValueError: Protocol not known". I haven't delved into pandas internal, but I'm not clear why |
@Peterl777: I don't think your string is a valid URL. It is The different error message is because you happen not to have fsspec installed. It is an optional dependency of pandas, so it isn't installed automatically, but people who have it installed will see the "protocol not known" error because their string is passed to fsspec, which sees the scheme The reason |
@steve-mavens Yes, true, but this is a JSON string, so it's an array (list) containing a string (JSON is always double-quoted), that contains a URL. So the actual URL is Oh I didn't know that about the optional module. Will have to look into that. But it doesn't explain the different behaviour if the JSON is coming from a file? (Or from a |
Possibly it is surprising, but the behaviour is documented: "The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. A local file could be: file://localhost/path/to/table.json.". The details of how it decides whether you've passed it a URL or JSON to parse are not documented. |
Ah! I get it. Wow, yes, it is getting the guessing of URL pretty wrong! And that explains why it's different when given a file (or file-like object) containing that string. And so yes, you're proposal appears to be likely to work! (with the caveat I haven't tried it, and like you don't know of any possible repercussions). I'll mark this as a duplicate then, and make a note over at #36271. FYI, the actual JSON string had a URL buried many levels deep, in the middle of a long string that contained HTML that had an
|
Duplicate of #36271 |
read_json
is supposed to parse from astr
or from a file-like object. If parsing from a value string has the texthttp:
then pandas crashes with aValueError
Steps to reproduce:
Work around 1:
Write the string to a file.
Work around 2:
Wrap the string into a file-like object.
Versions
pandas 1.3.3 on Windows
The text was updated successfully, but these errors were encountered: