-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix issue #36271 to disambiguate json string #36273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pd.read_json() fails currently for strings that look like fsspec_url and contain "://". adding another condition to fix this at least in most cases
Hello @tbachlechner! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2020-11-11 15:02:53 UTC |
Does this close an issue? Can you add a test? |
pandas/io/common.py
Outdated
@@ -161,6 +161,7 @@ def is_fsspec_url(url: FilePathOrBuffer) -> bool: | |||
return ( | |||
isinstance(url, str) | |||
and "://" in url | |||
and not " " in url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A space is allowed in URLs for at least some storage systems
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, any other ideas how to fix #36271 ? All my json strings with URLs in them receive errors and I downgraded to pandas==1.0.5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, a space isn't necessary for valid JSON either. Perhaps the only way to tell is to try to parse the string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR to reflect what you say, I'm now trying to parse the string as json and if it succeeds, it's not a URL. How about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds reasonable to me - I don't expect any URL is also valid JSON. You could conceivable make a regex to check for JSON-ness ("^\s*[\[{]"
maybe?), but probably calling JSON decode is faster and more correct.
(I simply didn't know that you can pass a string directly to read_json, as opposed to a StringIO file-like)
@@ -161,6 +176,7 @@ def is_fsspec_url(url: FilePathOrBuffer) -> bool: | |||
return ( | |||
isinstance(url, str) | |||
and "://" in url | |||
and not is_json(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For pandas maintainers: should this check only be done within read_json (which will presumably call json.loads anyway, repeating the work) ? Are there any other similar cases that the string passed might be handled as a path, or decoded directly, that I might have missed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_json
doesn't call json.loads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbachlechner if you really think t his out to be caught then i suppose a 'better' check instead of and "://" in url
could be done (e.g. a regex)
@@ -161,6 +176,7 @@ def is_fsspec_url(url: FilePathOrBuffer) -> bool: | |||
return ( | |||
isinstance(url, str) | |||
and "://" in url | |||
and not is_json(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
read_json
doesn't call json.loads
pandas/io/common.py
Outdated
def is_json(url: FilePathOrBuffer) -> bool: | ||
""" | ||
Returns true if the given string looks like | ||
something json.loads can handle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot do this generally as its completely non-performant
@@ -161,6 +176,7 @@ def is_fsspec_url(url: FilePathOrBuffer) -> bool: | |||
return ( | |||
isinstance(url, str) | |||
and "://" in url | |||
and not is_json(url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbachlechner if you really think t his out to be caught then i suppose a 'better' check instead of and "://" in url
could be done (e.g. a regex)
@jreback yes, this ought to be caught, pd.read_json() is claimed to accept a json string, but it fails for any json string containing |
I agree that |
mypy complaint |
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
are there opinions about moving def parse_json(json:str, **kwargs) -> DataFrame:
buffer = StringIO(json)
data = read_json(buffer, **kwargs)
buffer.close()
return data and a deprecation warning for 1.2 when passing a json string to edit: |
IIRC there was a discussion at some point of making a |
it wouldn't be unreasonable to deprecate passing a str (that is not a path) to read_josh and instead only accept a StringIO i don't think this is much of any issue for pickle as it needs bytes and so str always means a file excel i think is the same only read_josh accepts str as an actual transmission format (read_csv as well of course is similar but i think we are restrictive there already) |
pandas/io/common.py
Outdated
if json_pattern.match(url): | ||
return True | ||
else: | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be simplified return json_pattern.match(url) is not None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, I committed your suggestion
Returns true if the given string looks like | ||
json | ||
""" | ||
json_pattern = re.compile(r"^\s*[\[{]") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably need something like the following for typing
if not isinstance(url, str):
return False
The pre-commit is complaining about too many new lines.
closing as stale. if you would like to continue, pls ping. |
pd.read_json() fails currently for strings that look like fsspec_url and contain "://". adding another condition to fix this at least in most cases
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff