Skip to content

Json normalize nan support #25619

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Mar 13, 2019
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v0.25.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ I/O
- Bug in :func:`read_json` for ``orient='table'`` when it tries to infer dtypes by default, which is not applicable as dtypes are already defined in the JSON schema (:issue:`21345`)
- Bug in :func:`read_json` for ``orient='table'`` and float index, as it infers index dtype by default, which is not applicable because index dtype is already defined in the JSON schema (:issue:`25433`)
- Bug in :func:`read_json` for ``orient='table'`` and string of float column names, as it makes a column name type conversion to Timestamp, which is not applicable because column names are already defined in the JSON schema (:issue:`25435`)
-
- Bug in :func:`json_normalize` for ``errors='ignore'`` and nullable metadata fields, the null values in dataframe were literal nan string and not numpy.nan (:issue:`25468`)
-
-

Expand Down
2 changes: 1 addition & 1 deletion pandas/io/json/normalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,6 @@ def _recursive_extract(data, path, seen_meta, level=0):
raise ValueError('Conflicting metadata name {name}, '
'need distinguishing prefix '.format(name=k))

result[k] = np.array(v).repeat(lengths)
result[k] = np.array(v, dtype=object).repeat(lengths)

return result
44 changes: 44 additions & 0 deletions pandas/tests/io/json/test_normalize.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,25 @@ def author_missing_data():
}]


@pytest.fixture
def address_missing_data():
return [
{'name': 'Alice',
'addresses': [{'number': 9562,
'street': 'Morris St.',
'city': 'Massillon',
'state': 'OH',
'zip': 44646}]
},
{'addresses': [{'number': 8449,
'street': 'Spring St.',
'city': 'Elizabethton',
'state': 'TN',
'zip': 37643}]
}
]


class TestJSONNormalize(object):

def test_simple_records(self):
Expand Down Expand Up @@ -378,6 +397,31 @@ def test_json_normalize_errors(self):
['general', 'trade_version']],
errors='raise')

def test_missing_meta(self, address_missing_data):
# GH25468: If metadata is nullable with errors set to ignore, the null
# values should be numpy.nan values
result = json_normalize(
data=address_missing_data,
record_path='addresses',
meta='name',
errors='ignore')
ex_data = [
{'city': 'Massillon',
'number': 9562,
'state': 'OH',
'street': 'Morris St.',
'zip': 44646,
'name': 'Alice'},
{'city': 'Elizabethton',
'number': 8449,
'state': 'TN',
'street': 'Spring St.',
'zip': 37643,
'name': np.nan}
]
expected = DataFrame(ex_data)
tm.assert_frame_equal(result, expected, check_like=True)

def test_donot_drop_nonevalues(self):
# GH21356
data = [
Expand Down