Added code to check if file exists for read_json. #29104

mohitanand001 · 2019-10-19T19:08:41Z

closes Misleading error messages when opening inexistent json file #29102
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jbrockmendel · 2019-10-19T21:20:06Z

Can you add a test for the bug this fixes

gfyoung · 2019-10-19T22:51:05Z

We should try to make the error messages consistent across the board i.e. align with what you would get if you did read_csv on a non-existent file.

mohitanand001 · 2019-10-20T06:02:40Z

The code snippet I took is from read_csv itself. Related to this shall we also have same error message for read_excel. I think that has something to do with the fact that read excel uses xlrd.

import pandas as pd
pd.read_csv('file_1.csv')
FileNotFoundError: [Errno 2] File b'file_1.csv' does not exist: b'file_1.csv'

pd.read_excel('file_2.xlsx')
FileNotFoundError: [Errno 2] No such file or directory: 'file_2.xlsx'

gfyoung · 2019-10-20T06:58:58Z

The code snippet I took is from read_csv itself. Related to this shall we also have same error message for read_excel. I think that has something to do with the fact that read excel uses xlrd.

Great! We can look at updating read_excel error message in a separate PR.

krishnakatyal · 2019-10-22T06:02:12Z

@gfyoung which file is to be changed for read_csv?

mohitanand001 · 2019-10-22T17:34:47Z

Can you add a test for the bug this fixes

Sure

mohitanand001 · 2019-10-22T19:08:41Z

@datapythonista @gfyoung can we name _stringify_path something else, because the name gives an impression that the function would change the path to a string, while it might not always be the case. In case of string buffers it is returned as it is.

datapythonista · 2019-10-23T05:24:26Z

That sounds good if you find a better name, But in a separate PR, let's keep this focussed to its goal.

You need to add the required test, may be add a whatsnew section (not sure if we do for something like this, you can check previous whatsnew files), and fix the broken tests.

Thanks for working on this @farziengineer

mohitanand001 · 2019-10-24T07:28:12Z

The challenge here is, we need to differentiate between a json string passed as input and a file_path. We cannot just say that since a string is not a valid path, raise a FileNotFoundException.

import pandas as pd
#This should work just fine.
df = pd.read_json('{"a": [9, 10, 32], "b": [54, 57, 66]}') 

#This should throw an error.
df = pd.read_json(<invalid_path> )

I cannot find any utility function used in pandas which differentiates json string and a file_path.
Or any function which tells if the string is a valid json, or a valid file_path.

gfyoung · 2019-10-24T07:46:05Z

The challenge here is, we need to differentiate between a json string passed as input and a file_path.

@farziengineer : How would you actually know what the intent is with absolute certainty if the string is neither valid JSON nor a valid path?

mohitanand001 · 2019-10-24T07:49:36Z

@gfyoung
One approach could be to have a utility function that tests string provided is a json or file_path.
Something like the following.

def check_json_or_filepath(filepath_or_buf):
     try:
         json_loads(filepath_or_buf)
         return 'json'
     except ValueError:
         if os.path.exists(filepath_or_buf):
            return 'filepath'
         else:
           return 'strinio'

Looks a bit hackish though.

gfyoung · 2019-10-24T07:56:35Z

One approach could be to have a utility function that tests string provided is a json or file_path.
Something like the following.

Hackish + it also doesn't answer my question (notice how you're returning strinio if both checks fail, which assumes the intent of the user was always to pass in JSON).

I would leave this part alone as @datapythonista suggested and just add the test.

mohitanand001 · 2019-10-24T11:12:31Z

@gfyoung @datapythonista Hi, The issue is in the method _get_data_from_filepath(self, filepath_or_buffer). It checks if the filepath exists, it reads the content and returns it. If the filepath does not exist, it treats as if the intent of user was to pass a JSON and returns that value itself as the data. There is no check on when the filepath does not exist and the filepath_or_buffer is not a json, what should be our action.

pandas/pandas/io/json/_json.py

Lines 680 to 690 in b1eb97b

    
               def _get_data_from_filepath(self, filepath_or_buffer): 
        
                   """ 
        
                   The function read_json accepts three input types: 
        
                       1. filepath (string-like) 
        
                       2. file-like object (e.g. open file object, StringIO) 
        
                       3. JSON string 
        
                   This method turns (1) into (2) to simplify the rest of the processing. 
        
                   It returns input types (2) and (3) unchanged. 
        
                   """ 
        
                   data = filepath_or_buffer

With the current changes I have done in my code, https://github.com/pandas-dev/pandas/pull/29104/files , it would be a problem when a json string is passed, since it just checks if filepath_or_buffer is exists, if it doesn't it raises FileNotFoundError

gfyoung · 2019-10-24T18:06:54Z

It checks if the filepath exists, it reads the content and returns it. If the filepath does not exist, it treats as if the intent of user was to pass a JSON and returns that value itself as the data

I understand, but what I'm trying to make clear is that you can never know for certain what the user intent was in the case when it fails to be both a valid filepath OR valid JSON. I think we're getting way too much in the weeds here on something that, in the interests of 100% correctness, is really going to be a dead end.

change error type in read_json tests

WillAyd · 2019-11-08T00:48:45Z

I agree with @gfyoung here I don't think it's really possible (or rather worthwhile) to infer intent, unless there is something pathlib may offer

jbrockmendel · 2019-11-25T21:25:14Z

@WillAyd is there a way forward on this?

WillAyd · 2019-11-29T18:03:12Z

I'm not strongly opposed but don't think it's worth a lot of effort here to try and infer intent. @datapythonista had the original issue though and @gfyoung has some input so let's see what they think

datapythonista · 2019-11-30T17:24:09Z

Didn't realize we couldn't know whether the user provided a json string or a path when I opened the issue. I guess it makes sense to leave the code as it is. Otherwise we need to make assumptions on the user intent.

Unless it's easy to capture the current error, and get something like:

ValueError: No such file 'no_file.json'. Tried to parse as a JSON string, but got: Unexpected character found when decoding 'null'

ValueError: No such file '{"foo": 1, "bar": ...'. Tried to parse as a JSON string, but got: Unexpected character found when decoding 'null'

WillAyd · 2019-12-10T16:54:07Z

I guess based on feedback worth closing for now, but can reopen at a later date if new ideas come up

In any case thanks for the PR @farziengineer !

vampypandya · 2020-05-24T08:13:58Z

@farziengineer How about using regex to check intent?

janosh · 2022-04-08T13:25:58Z

What's wrong with assuming file path intent

if isinstance(filepath_or_buffer, str) and filepath_or_buffer.lower().endswith(('.json', '.json.gz', '.json.bz2', '.json.whatever')):

I must be missing something but it seems safe to me to issue FileNotFoundError if not isfile(filepath_or_buffer) in case the above if passes. Would definitely be better behavior than current v1.4.2 behavior:

pd.read_json('missing.json')
>>> ValueError: Unexpected character found when decoding 'true'

Added code to check if file exists for read_json.

2596bbd

nottatdat mentioned this pull request Oct 21, 2019

Unify error messages when opening inexistent excel/csv file #29125

Open

gfyoung added Error Reporting Incorrect or improved errors from pandas IO JSON read_json, to_json, json_normalize labels Oct 21, 2019

change error type in read_json tests

1a6c538

tadashigaki mentioned this pull request Oct 26, 2019

change error type in read_json tests mohitanand001/pandas#1

Merged

Merge pull request #1 from tadashigaki/read_json_no_file

dafbfa4

change error type in read_json tests

WillAyd closed this Dec 10, 2019

mohitanand001 mentioned this pull request Dec 11, 2021

Misleading error messages when opening inexistent json file #29102

Closed

cheungje mentioned this pull request Dec 16, 2021

TST: Added tests to check if file exists for read_json. #44921

Closed

2 tasks

cheungje mentioned this pull request Dec 16, 2021

ENH: Implemented a check to test if filepath_or_buffer is a valid JSON string or a valid filepath and raises an error in the case that it is neither. #44926

Closed

4 tasks

This was referenced Mar 29, 2023

ENH: Deprecate literal json string input to read_json #52271

Closed

cudf.read_json returns ValueError when given file is not found rapidsai/cudf#13026

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added code to check if file exists for read_json. #29104

Added code to check if file exists for read_json. #29104

mohitanand001 commented Oct 19, 2019 •

edited

Loading

jbrockmendel commented Oct 19, 2019

gfyoung commented Oct 19, 2019

mohitanand001 commented Oct 20, 2019 •

edited

Loading

gfyoung commented Oct 20, 2019

krishnakatyal commented Oct 22, 2019

mohitanand001 commented Oct 22, 2019

mohitanand001 commented Oct 22, 2019

datapythonista commented Oct 23, 2019

mohitanand001 commented Oct 24, 2019 •

edited

Loading

gfyoung commented Oct 24, 2019

mohitanand001 commented Oct 24, 2019 •

edited

Loading

gfyoung commented Oct 24, 2019

mohitanand001 commented Oct 24, 2019 •

edited

Loading

gfyoung commented Oct 24, 2019 •

edited

Loading

WillAyd commented Nov 8, 2019

jbrockmendel commented Nov 25, 2019

WillAyd commented Nov 29, 2019

datapythonista commented Nov 30, 2019

WillAyd commented Dec 10, 2019

vampypandya commented May 24, 2020

janosh commented Apr 8, 2022

Added code to check if file exists for read_json. #29104

Added code to check if file exists for read_json. #29104

Conversation

mohitanand001 commented Oct 19, 2019 • edited Loading

jbrockmendel commented Oct 19, 2019

gfyoung commented Oct 19, 2019

mohitanand001 commented Oct 20, 2019 • edited Loading

gfyoung commented Oct 20, 2019

krishnakatyal commented Oct 22, 2019

mohitanand001 commented Oct 22, 2019

mohitanand001 commented Oct 22, 2019

datapythonista commented Oct 23, 2019

mohitanand001 commented Oct 24, 2019 • edited Loading

gfyoung commented Oct 24, 2019

mohitanand001 commented Oct 24, 2019 • edited Loading

gfyoung commented Oct 24, 2019

mohitanand001 commented Oct 24, 2019 • edited Loading

gfyoung commented Oct 24, 2019 • edited Loading

WillAyd commented Nov 8, 2019

jbrockmendel commented Nov 25, 2019

WillAyd commented Nov 29, 2019

datapythonista commented Nov 30, 2019

WillAyd commented Dec 10, 2019

vampypandya commented May 24, 2020

janosh commented Apr 8, 2022

mohitanand001 commented Oct 19, 2019 •

edited

Loading

mohitanand001 commented Oct 20, 2019 •

edited

Loading

mohitanand001 commented Oct 24, 2019 •

edited

Loading

mohitanand001 commented Oct 24, 2019 •

edited

Loading

mohitanand001 commented Oct 24, 2019 •

edited

Loading

gfyoung commented Oct 24, 2019 •

edited

Loading