-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Added code to check if file exists for read_json. #29104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you add a test for the bug this fixes |
We should try to make the error messages consistent across the board i.e. align with what you would get if you did |
The code snippet I took is from read_csv itself. Related to this shall we also have same error message for read_excel. I think that has something to do with the fact that read excel uses import pandas as pd
pd.read_csv('file_1.csv')
FileNotFoundError: [Errno 2] File b'file_1.csv' does not exist: b'file_1.csv'
pd.read_excel('file_2.xlsx')
FileNotFoundError: [Errno 2] No such file or directory: 'file_2.xlsx' |
Great! We can look at updating |
@gfyoung which file is to be changed for read_csv? |
Sure |
@datapythonista @gfyoung can we name |
That sounds good if you find a better name, But in a separate PR, let's keep this focussed to its goal. You need to add the required test, may be add a whatsnew section (not sure if we do for something like this, you can check previous whatsnew files), and fix the broken tests. Thanks for working on this @farziengineer |
The challenge here is, we need to differentiate between a json string passed as input and a file_path. We cannot just say that since a string is not a valid path, raise a FileNotFoundException. import pandas as pd
#This should work just fine.
df = pd.read_json('{"a": [9, 10, 32], "b": [54, 57, 66]}')
#This should throw an error.
df = pd.read_json(<invalid_path> ) I cannot find any utility function used in pandas which differentiates json string and a file_path. |
@farziengineer : How would you actually know what the intent is with absolute certainty if the string is neither valid JSON nor a valid path? |
@gfyoung def check_json_or_filepath(filepath_or_buf):
try:
json_loads(filepath_or_buf)
return 'json'
except ValueError:
if os.path.exists(filepath_or_buf):
return 'filepath'
else:
return 'strinio' Looks a bit hackish though. |
Hackish + it also doesn't answer my question (notice how you're returning I would leave this part alone as @datapythonista suggested and just add the test. |
@gfyoung @datapythonista Hi, The issue is in the method pandas/pandas/io/json/_json.py Lines 680 to 690 in b1eb97b
With the current changes I have done in my code, https://github.com/pandas-dev/pandas/pull/29104/files , it would be a problem when a json string is passed, since it just checks if |
I understand, but what I'm trying to make clear is that you can never know for certain what the user intent was in the case when it fails to be both a valid filepath OR valid JSON. I think we're getting way too much in the weeds here on something that, in the interests of 100% correctness, is really going to be a dead end. |
change error type in read_json tests
I agree with @gfyoung here I don't think it's really possible (or rather worthwhile) to infer intent, unless there is something pathlib may offer |
@WillAyd is there a way forward on this? |
I'm not strongly opposed but don't think it's worth a lot of effort here to try and infer intent. @datapythonista had the original issue though and @gfyoung has some input so let's see what they think |
Didn't realize we couldn't know whether the user provided a json string or a path when I opened the issue. I guess it makes sense to leave the code as it is. Otherwise we need to make assumptions on the user intent. Unless it's easy to capture the current error, and get something like:
|
I guess based on feedback worth closing for now, but can reopen at a later date if new ideas come up In any case thanks for the PR @farziengineer ! |
@farziengineer How about using regex to check intent? |
What's wrong with assuming file path intent if isinstance(filepath_or_buffer, str) and filepath_or_buffer.lower().endswith(('.json', '.json.gz', '.json.bz2', '.json.whatever')): I must be missing something but it seems safe to me to issue pd.read_json('missing.json')
>>> ValueError: Unexpected character found when decoding 'true' |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff