Skip to content

Fix issue #36271 to disambiguate json string #36273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from
1 change: 1 addition & 0 deletions pandas/io/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -161,6 +161,7 @@ def is_fsspec_url(url: FilePathOrBuffer) -> bool:
return (
isinstance(url, str)
and "://" in url
and not " " in url
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A space is allowed in URLs for at least some storage systems

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, any other ideas how to fix #36271 ? All my json strings with URLs in them receive errors and I downgraded to pandas==1.0.5

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, a space isn't necessary for valid JSON either. Perhaps the only way to tell is to try to parse the string?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the PR to reflect what you say, I'm now trying to parse the string as json and if it succeeds, it's not a URL. How about that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable to me - I don't expect any URL is also valid JSON. You could conceivable make a regex to check for JSON-ness ("^\s*[\[{]" maybe?), but probably calling JSON decode is faster and more correct.

(I simply didn't know that you can pass a string directly to read_json, as opposed to a StringIO file-like)

and not url.startswith(("http://", "https://"))
)

Expand Down
6 changes: 6 additions & 0 deletions pandas/tests/io/test_common.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ class TestCommonIOCapabilities:
bar2,12,13,14,15
"""

def test_is_fsspec_url(self):
some_string = 'some :// string'
expected = False

assert icom.is_fsspec_url(some_string)==expected

def test_expand_user(self):
filename = "~/sometest"
expanded_name = icom._expand_user(filename)
Expand Down