Avoids exception when pandas.io.json.json_normalize contains items in… #14505

dickreuter · 2016-10-26T20:29:50Z

Continued in #14583

When using pandas.io.json.json_normalize to parse a nested json and convert it to a dataframe, the meta parameter can be used to use fields as metadata for each record in resulting table. In some cases, not all items may contain all of the specified meta fields. This change will avoid throwing an error and output np.nan instead.

… meta parameter that don't always occur in every item of the list

codecov-io · 2016-10-26T21:21:02Z

Current coverage is 85.27% (diff: 90.00%)

Merging #14505 into master will increase coverage by <.01%

@@             master     #14505   diff @@
==========================================
  Files           140        140          
  Lines         50670      50698    +28   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43205      43233    +28   
  Misses         7465       7465          
  Partials          0          0

Powered by Codecov. Last update e3d943d...33495bc

jreback · 2016-10-26T22:14:31Z

what exactly is this fixing? would need a test.

…wers xref numpy/numpy#8127 closes #14489 Author: Jeff Reback <[email protected]> Closes #14498 from jreback/compat and squashes the following commits: 882872e [Jeff Reback] COMPAT/TST: fix test for range testing of negative integers to neg powers

closes #14496

Title is self-explanatory. Affects Python 2.x only. Closes #14477. Author: gfyoung <[email protected]> Closes #14492 from gfyoung/quotechar-unicode-2.x and squashes the following commits: ec9f59a [gfyoung] BUG: Accept unicode quotechars again in pd.read_csv

dickreuter · 2016-10-26T23:28:14Z

j={
    "Trades" : [{
            "general" : {
                "tradeid" : 100,
                "trade_version" : 1,
                "stocks" : [{

                        "symbol" : "AAPL",
                        "name" : "Apple",
                        "price" : "0"

                    }, {

                        "symbol" : "GOOG",
                        "name" : "Google",
                        "price" : "0"

                    }
                ]
            },
        }, {
            "general" : {
                "tradeid" : 100,
                "stocks" : [{

                        "symbol" : "AAPL",
                        "name" : "Apple",
                        "price" : "0"

                    }, {
                        "symbol" : "GOOG",
                        "name" : "Google",
                        "price" : "0"

                    }
                ]
            },
        }
    ]
}
json_normalize(data=j['Trades'], record_path=[['general','stocks']], meta=[['general','tradeid'],['general','trade_version']])

The above will fail because trade_version is only available in one of the two items in the list, so there is no way to output it if not all elements are exactly the same. With my change it will simply ignore it and output nan instead of throwing an error.

…54) (#14501)

When the driver was not installed, but sqlalchemy itself was, when passing a URI string, you got an error indicating that SQLAlchemy was not installed, instead of the driver not being installed. This was because the import error for the driver was captured as import error for sqlalchemy.

TomAugspurger · 2016-10-29T16:22:08Z

@dickreuter Can you add that as a test and add a release note? See http://pandas.pydata.org/pandas-docs/stable/contributing.html#contributing-to-the-code-base

dickreuter · 2016-10-29T19:23:16Z

This would be the test, but unclear where I should store it. Any suggestions?

from unittest import TestCase
from pandas.io.json import json_normalize


class Tester(TestCase):
    def test_json_normalise_fix(self):
        j = {
            "Trades": [{
                "general": {
                    "tradeid": 100,
                    "trade_version": 1,
                    "stocks": [{

                        "symbol": "AAPL",
                        "name": "Apple",
                        "price": "0"

                    }, {

                        "symbol": "GOOG",
                        "name": "Google",
                        "price": "0"

                    }
                    ]
                },
            }, {
                "general": {
                    "tradeid": 100,
                    "stocks": [{

                        "symbol": "AAPL",
                        "name": "Apple",
                        "price": "0"

                    }, {
                        "symbol": "GOOG",
                        "name": "Google",
                        "price": "0"

                    }
                    ]
                },
            }
            ]
        }
        j = json_normalize(data=j['Trades'], record_path=[['general', 'stocks']],
                           meta=[['general', 'tradeid'], ['general', 'trade_version']])
        self.assertEqual(len(j), 4)

TomAugspurger · 2016-10-30T12:21:02Z

Looks like those tests are all in https://github.com/pandas-dev/pandas/blob/master/pandas/io/tests/json/test_json_norm.py

You could add it as a test method under TestJSONNormalize

dickreuter · 2016-10-30T20:08:52Z

Test and documentation is now added.

…14541)

jreback · 2016-10-31T12:28:47Z

doc/source/whatsnew/v0.19.1.txt

@@ -78,3 +78,4 @@ Bug Fixes


 - Bug in ``pd.pivot_table`` may raise ``TypeError`` or ``ValueError`` when ``index`` or ``columns`` is not scalar and ``values`` is not specified (:issue:`14380`)
+- Bug in ``pandas.io.json.json_normalize``When parsing a nested json and convert it to a dataframe, the meta parameter can be used to use fields as metadata for each record in resulting table. In some cases, not all items may contain all of the specified meta fields. This change will avoid throwing an error and output np.nan instead. (:issue '14505')


pls simplify. a user just wants to know does this issue pertain to them, and a short expl.

make the issue

(:issue:14505)

jreback · 2016-10-31T12:29:10Z

pandas/io/json.py

-                        meta_val = _pull_field(obj, val[level:])
+                        try:
+                            meta_val = _pull_field(obj, val[level:])
+                        except:


don't use a bare except, list a specific exception KeyError?

jreback · 2016-10-31T12:29:22Z

pandas/io/tests/json/test_json_norm.py

@@ -225,6 +225,51 @@ def test_nested_flattens(self):

        self.assertEqual(result, expected)

+    def test_json_normalise_fix(self):
+        j = {


add the issue number as a comment

jreback · 2016-10-31T12:29:40Z

pandas/io/tests/json/test_json_norm.py

+
+                    }
+                    ]
+                },


this prob does not pass linting. make sure it does.

jreback · 2016-10-31T12:30:00Z

pandas/io/tests/json/test_json_norm.py

+        }
+        j = json_normalize(data=j['Trades'], record_path=[['general', 'stocks']],
+                           meta=[['general', 'tradeid'], ['general', 'trade_version']])
+        self.assertEqual(len(j), 4)


construct the expected frame and use assert_frame_equal

jreback · 2016-10-31T12:31:17Z

pandas/io/json.py

@@ -792,7 +792,10 @@ def _recursive_extract(data, path, seen_meta, level=0):
                    if level + 1 > len(val):
                        meta_val = seen_meta[key]
                    else:
-                        meta_val = _pull_field(obj, val[level:])


I think this should be a keyword, call it errors='raise'|'ignore'. You are defining ignore. Please leave the default as raise (which is the current behavior).

Closes gh-14459.

Added documenation Shortened what's new Removed commas in dictionary for linting compatibility

dickreuter · 2016-11-01T20:57:17Z

Added keyword errors {'raise'|'ignore}
Added documentation
Shortened what's new
Removed commas in dictionary for linting compatibility

…4520)

pandas.core.common.array_equivalent was removed without deprecation warning. This commits adds it back to the core.common namespace with deprecation warning

* BUG/API: Index.append with mixed object/Categorical indices * Only coerce to object if the calling index is not categorical * Add test for the df.info() case (GH14298)

… meta parameter that don't always occur in every item of the list

Added documenation Shortened what's new Removed commas in dictionary for linting compatibility

# Conflicts: # doc/source/whatsnew/v0.19.1.txt

jreback · 2016-11-03T22:53:31Z

you need to rebase on master

git rebase -i origin/master

dickreuter · 2016-11-03T22:59:35Z

I did that earlier today. It now says: "This branch is 8 commits ahead of pandas-dev:master.". There should currently be no more conflicts.

jreback · 2016-11-03T23:03:02Z

maybe you didn't push it
it's not about conflicts

dickreuter · 2016-11-03T23:13:07Z

My fork on github seems up to date with what I have locally, so I assume it has been pushed. Are there any further changes you expected me to implement that are not present?

jreback · 2016-11-03T23:14:50Z

@dickreuter its impossible to see until you rebase on master. this should have just your commits

https://github.com/pandas-dev/pandas/pull/14505/commits

dickreuter · 2016-11-03T23:29:32Z

I see, isn't it showing those commits of others only because I did a rebase of my fork (and then a local rebase of my local copy instead of a merge?). If that's a problem I could delete my fork and create a new one, then make the changes again and create a new pull request, unless you have a better suggestion.

jreback · 2016-11-03T23:31:52Z

you prob just need something like

git fetch origin
git rebase -i origin/master
git push yourremote thisbranchname -f

dickreuter · 2016-11-03T23:50:01Z

git fetch origin --> doesn't fetch anything as my local copy is in
sync with my remote origingit rebase -i origin/master --> noop - no
commits to pick

I think all my changes can be seen
here:9848837

What may be confusing is that I also did an automatic reformatting
with pycharm to make the file conform with pep8. But those changes
only concern empty spaces.

On 3 November 2016 at 23:32, Jeff Reback [email protected] wrote:

you prob just need something like

git fetch origin
git rebase -i origin/master
git push yourremote thisbranchname -f

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#14505 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABMrfm7iAtn0lL-LYi_wVB5pLLx7lS5Eks5q6m76gaJpZM4Khogo
.

jreback · 2016-11-03T23:54:37Z

could also be called upstream

(pandas) [Thu Nov 03 19:52:46 ~/pandas]$ git remote -v|grep origin
origin  https://github.com/pandas-dev/pandas.git (fetch)
origin  https://github.com/pandas-dev/pandas.git (push)

it doesn't matter if your branch is in sync with YOUR upstream, rather it needs to be in sync with pandas master (and on top of it), that's what a rebase is.

you need to rebase to remove all of the merges of master. you shouldn't do that, instead rebase.

dickreuter · 2016-11-04T00:47:41Z

This seems to be the problem:
http://stackoverflow.com/questions/40413071/after-rebasing-my-github-fork-commits-from-others-are-in-my-pull-request/40413455#40413455

Will try to fix it, or if it's too complicated just delete and redo.

jorisvandenbossche · 2016-11-04T11:17:23Z

@jreback Regarding your comment above (#14505 (comment)): pandas-dev repo is typically called 'upstream', and your own fork 'origin' (that's how our contributor guide also says it), so you need git fetch upstream and git rebase -i upstream/master instead of git fetch origin and git rebase -i origin/master to rebase properly.

jorisvandenbossche · 2016-11-04T11:20:14Z

Follow-up in #14583

Avoids exception when pandas.io.json.json_normalize contains items in…

a560331

… meta parameter that don't always occur in every item of the list

jreback added the IO JSON read_json, to_json, json_normalize label Oct 26, 2016

jreback and others added 4 commits October 26, 2016 18:18

BLD: Support Cython 0.25

66b4c83

closes #14496

BLD: fix 3.4 build for cython to 0.24.1

6ac759d

jorisvandenbossche and others added 5 commits October 27, 2016 09:11

TST: simplify tests for GH14346 (#14502)

31ca717

DOC: Expand on reference docs for read_json() (#14442)

e7ac84d

BUG: fix DatetimeIndex._maybe_cast_slice_bound for empty index (GH143…

d7fb5bd

…54) (#14501)

MAINT: Expand lint for *.py (#14516)

096d886

Added documentation and test for issue #14505

b793443

parthea and others added 2 commits October 31, 2016 08:23

DOC: Simplify the gbq integration testing procedure for contributors (#…

1ce6299

…14541)

BUG: tseries ceil doc fix (#14543)

47f117d

jreback reviewed Oct 31, 2016

View reviewed changes

pandas/io/tests/json/test_json_norm.py

}

]

},

Copy link

Contributor

jreback Oct 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this prob does not pass linting. make sure it does.

jreback reviewed Oct 31, 2016

View reviewed changes

jreback added the Error Reporting Incorrect or improved errors from pandas label Oct 31, 2016

gfyoung and others added 2 commits October 31, 2016 21:39

BUG: Don't parse inline quotes in skipped lines (#14514)

b088112

Closes gh-14459.

BUG: Dataframe constructor when given dict with None value (#14392)

60a335e

Added keyword errors {'raise'|'ignore}

c3e25c6

Added documenation Shortened what's new Removed commas in dictionary for linting compatibility

jreback and others added 13 commits November 2, 2016 05:59

asv compat for py3

8b80562

BUG: don't close user-provided file handles in C parser (GH14418) (#1…

eb7bd99

…4520)

BUG: DataFrame.quantile with NaNs (GH14357) (#14536)

52f31d4

PERF: casting loc to labels dtype before searchsorted (#14551)

1d95179

DEPR: add deprecation warning for com.array_equivalent (#14567)

093aa82

pandas.core.common.array_equivalent was removed without deprecation warning. This commits adds it back to the core.common namespace with deprecation warning

DOC: rst fixes

7f0c4e0

BUG/API: Index.append with mixed object/Categorical indices (#14545)

252526c

* BUG/API: Index.append with mixed object/Categorical indices * Only coerce to object if the calling index is not categorical * Add test for the df.info() case (GH14298)

DOC: update whatsnew/release notes for 0.19.1 (#14573)

e1cdc4b

Avoids exception when pandas.io.json.json_normalize contains items in…

5aaf8fe

… meta parameter that don't always occur in every item of the list

Added documentation and test for issue #14505

8928270

Added keyword errors {'raise'|'ignore}

9848837

Added documenation Shortened what's new Removed commas in dictionary for linting compatibility

Added documentation and test for issue #14505

cc2fdc2

Merge branch 'master' of https://github.com/dickreuter/pandas

33495bc

# Conflicts: # doc/source/whatsnew/v0.19.1.txt

jorisvandenbossche closed this Nov 4, 2016

jorisvandenbossche added this to the No action milestone Nov 4, 2016

dickreuter mentioned this pull request Nov 4, 2016

Added errors{'raise','ignore'} for keys not found in meta for json_normalize #14583

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoids exception when pandas.io.json.json_normalize contains items in… #14505

Avoids exception when pandas.io.json.json_normalize contains items in… #14505

dickreuter commented Oct 26, 2016 •

edited by jorisvandenbossche

Loading

codecov-io commented Oct 26, 2016 •

edited

Loading

jreback commented Oct 26, 2016

dickreuter commented Oct 26, 2016 •

edited

Loading

TomAugspurger commented Oct 29, 2016

dickreuter commented Oct 29, 2016 •

edited

Loading

TomAugspurger commented Oct 30, 2016

dickreuter commented Oct 30, 2016

jreback Oct 31, 2016

jreback Oct 31, 2016

jreback Oct 31, 2016

jreback Oct 31, 2016

jreback Oct 31, 2016

jreback Oct 31, 2016

dickreuter commented Nov 1, 2016

jreback commented Nov 3, 2016

dickreuter commented Nov 3, 2016 •

edited

Loading

jreback commented Nov 3, 2016

dickreuter commented Nov 3, 2016

jreback commented Nov 3, 2016

dickreuter commented Nov 3, 2016

jreback commented Nov 3, 2016

dickreuter commented Nov 3, 2016

jreback commented Nov 3, 2016

dickreuter commented Nov 4, 2016

jorisvandenbossche commented Nov 4, 2016

jorisvandenbossche commented Nov 4, 2016

		@@ -78,3 +78,4 @@ Bug Fixes


		- Bug in ``pd.pivot_table`` may raise ``TypeError`` or ``ValueError`` when ``index`` or ``columns`` is not scalar and ``values`` is not specified (:issue:`14380`)
		- Bug in ``pandas.io.json.json_normalize``When parsing a nested json and convert it to a dataframe, the meta parameter can be used to use fields as metadata for each record in resulting table. In some cases, not all items may contain all of the specified meta fields. This change will avoid throwing an error and output np.nan instead. (:issue '14505')

Avoids exception when pandas.io.json.json_normalize contains items in… #14505

Avoids exception when pandas.io.json.json_normalize contains items in… #14505

Conversation

dickreuter commented Oct 26, 2016 • edited by jorisvandenbossche Loading

codecov-io commented Oct 26, 2016 • edited Loading

Current coverage is 85.27% (diff: 90.00%)

jreback commented Oct 26, 2016

dickreuter commented Oct 26, 2016 • edited Loading

TomAugspurger commented Oct 29, 2016

dickreuter commented Oct 29, 2016 • edited Loading

TomAugspurger commented Oct 30, 2016

dickreuter commented Oct 30, 2016

jreback Oct 31, 2016

Choose a reason for hiding this comment

jreback Oct 31, 2016

Choose a reason for hiding this comment

jreback Oct 31, 2016

Choose a reason for hiding this comment

jreback Oct 31, 2016

Choose a reason for hiding this comment

jreback Oct 31, 2016

Choose a reason for hiding this comment

jreback Oct 31, 2016

Choose a reason for hiding this comment

dickreuter commented Nov 1, 2016

jreback commented Nov 3, 2016

dickreuter commented Nov 3, 2016 • edited Loading

jreback commented Nov 3, 2016

dickreuter commented Nov 3, 2016

jreback commented Nov 3, 2016

dickreuter commented Nov 3, 2016

jreback commented Nov 3, 2016

dickreuter commented Nov 3, 2016

jreback commented Nov 3, 2016

dickreuter commented Nov 4, 2016

jorisvandenbossche commented Nov 4, 2016

jorisvandenbossche commented Nov 4, 2016

dickreuter commented Oct 26, 2016 •

edited by jorisvandenbossche

Loading

codecov-io commented Oct 26, 2016 •

edited

Loading

dickreuter commented Oct 26, 2016 •

edited

Loading

dickreuter commented Oct 29, 2016 •

edited

Loading

dickreuter commented Nov 3, 2016 •

edited

Loading