Skip to content

TRACKER: Simple analysis of IO JSON open issues #55046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 tasks done
loco-philippe opened this issue Sep 7, 2023 · 0 comments
Open
2 tasks done

TRACKER: Simple analysis of IO JSON open issues #55046

loco-philippe opened this issue Sep 7, 2023 · 0 comments
Labels
IO JSON read_json, to_json, json_normalize Master Tracker High level tracker for similar issues

Comments

@loco-philippe
Copy link
Contributor

Research

  • I have searched the [pandas] tag on StackOverflow for similar questions.

  • I have asked my usage related question on StackOverflow.

Link to question on StackOverflow

IO JSON open issues

Question about pandas

You can find below a quick analysis of open Json issues:

  • the third column is a (personal!) classification to mainly identify 'type' or 'dtype' problems
  • the fourth column is a subcategory only of the 'type' category
  • the fifth column identifies the issues that proposal PDEP0012 solves or provides an alternative solution

My first summary is as follows:

  • the first twelve issues will be impacted by PDEP0012
  • five issues are with numeric column name -> maybe they can be grouped together
  • three issues seem closed to me (can anyone check?)
  • eight issues concern None, NA, NaN or NaT values
  • ten issues concern the json_normalize function
Label category sub-category including PDEP0012
16492 No way with to_json to write only date out of datetime type date ok
49585 BUG: Series read_json tries to convert all column values to dates even when using keep_default_dates=True - if one column has an na value type datetime - NA ok
12997 to_json converts to UTC when encoding ISO formatted datetimes type datetime - tz ok
53252 ENH: simple - compact and reversible JSON interface type extend type ok
14358 read_json Raises AttributeError with Valid JSON as Input type null - NA ok
35464 BUG: Type mismatch in read_json type null - NA ok
51375 "BUG: to_json/read_json with orient=""table"" does not preserve types with pd.NA" type null - NA ok
36211 BUG: to_json for DataFrame containing Path objects crash with infinite recursion type Path ok
50782 BUG: Complex Numbers Not Imported Correctly Under JSON Read type table - complex ok
35420 to_json/read_json can't handle interval index type table - interval ok
39537 Error when converting df to json table (utc timezone date time object causes the error) type table - tz ok
52595 BUG: json that could be read by pandas 1.5.3 cannot be read by 2.0.0 type table - tz ok
16848 UnicodeDecodeError with html.table_schema = True type binary
25336 [BUG?] pd.read_json does not convert date before 1971-01-01 type datetime - conversion
22317 Request to add more date formats in to_json method type datetime - format
47930 ENH: Add new date_format option to_json matching datetime.isoformat exactly type datetime - tz
21454 pd.read_json converts large floats to inf type float - conversion
23328 inconsistent float rounding in to_json type float - conversion
44684 BUG: the precision of big integer in read_json type int - conversion
28609 OverflowError on using to_json to serialize NaN value with type Decimal type null - NA
31801 to_json index with Null Value Broken in 1.0 type null - NA
44693 BUG: dtypes cast when reading JSON type null - NA
46627 BUG: Pandas's ujson module incorrectly returns None when it reads NaN type null - NA
20608 read_json reads large integers as strings incorrectly if dtype not explicitly mentioned type str - conversion
42471 BUG: read_json converts Numeric Strings to Numbers type str - conversion
29025 Incorrect json round-trip with orient='table' when dataframe contains duplicate index values type table - index
19129 Raise ValueError for read_json and orient='table' With Numeric Column Names type table - int col
38256 "BUG: pandas to_json with orient ""table"" returns wrong schema & data string" type table - int col
40674 BUG: pd.read_json sets wrong value for numeric column names type table - int col
46392 BUG: Integer column index breaks json roundtrip with orient=table type table - int col
32037(44705) JSON table orient not roundtripping extension types type table - int col
26692 If tuples used as index pd.read_json( orient='split') does not read file saved by df.to_json(orient='split) type tuple index
21140 "Add Timedelta Support to JSON Reader with orient=""table""" to close
23584 Series to_json Docstring Updates to close
31917 to_json of Series with period dtype results in AttributeError to close
37100 BUG: Series.to_json produces incorrect json format to be completed
45959 QST: Why to_json defaults to force_ascii=True question
27241 ENH: Ignore flattening certain keys in json_normalize normalize
33414 ENH: Optionally pass dtypes as a dict into json_normalize normalize
34028 What is the best way to normalize_json before read_json for the file with gigabytes size? normalize
34465 BUG: unexpected behavior of json_normalize meta arg normalize
36245 BUG: pd.json_normalize on a column loses rows that have an empty list for that column normalize
42311 ENH: json_normalize flatten lists as well normalize
44329 ENH: errors='ignore' should work for record_path for pandas.json_normalize function normalize
51452 pd.json_normalize doesn't return data with index from series normalize
53126 BUG: json_normalize does not parse nested lists consistently normalize
54121 DOC: description of record_prefix param for json_normalize is wrong normalize
29928 Using to_json/read_json with orient='table' on a DataFrame with a single level MultiIndex does not work multiindex
50456 BUG: JSON serialization with orient split fails roundtrip with MultiIndex multiindex
42582 ENH: col descriptions that'd save in df schemas - helping users avoid creating separate documentation? metadata
51012 ENH: Include df.attrs in to_json output metadata
19261 Standardize pandas metadata for table schema and parquet internal
20599 OverflowError: Python int too large to convert to C long internal
28180 to_iso methods for DatetimeLikeArray internal
32326 "Unexpected behaviour of df.to_json(compression=""gzip"")" internal
33014 to_json should make separators configurable (similar to json.dump) internal
33877 BUG: weird interaction between pyslurm - ujson that changes function signature of ujson.dumps internal
35279 pandas/tests/io/json/test_pandas.py::TestPandasContainer::test_read_json_large_numbers failing for 32-bit system internal
39135 ENH: Add support for date_unit to be specified per column in to_json internal
41521 ENH: Add support to read_json to encode character escape hex codes to utf-8 characters internal
44881 ENH: change pd.read_json kwarg to rtype or return_type? internal
49604 BUG/CLN: Vendored ujson Module internal
54865 BUG: LSAN Detected Memory Leaks internal
17220 Enhancement: to_json and read_json for DataFrame should have option to output/parse values by column format
39913 ENH: new orient setting for read_json to support common API format format
46571 ENH: Allow usage of custom library to serialize with to_json method format
12286 Feature suggestion: flexible hierarchical data (json) importer (will implement if interest exists) extension
22853 Add chunksize support to to_json chunk
@loco-philippe loco-philippe added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Sep 7, 2023
@lithomas1 lithomas1 added Master Tracker High level tracker for similar issues IO JSON read_json, to_json, json_normalize and removed Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels Sep 8, 2023
@lithomas1 lithomas1 changed the title QST: Simple analysis of IO JSON open issues TRACKER: Simple analysis of IO JSON open issues Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Master Tracker High level tracker for similar issues
Projects
None yet
Development

No branches or pull requests

2 participants