-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix to_json lines with escaped characters #15117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 84.75% (diff: 100%)
|
@@ -1099,6 +1099,7 @@ def convert_json_to_lines(object arr): | |||
""" | |||
cdef: | |||
Py_ssize_t i = 0, num_open_brackets_seen = 0, in_quotes = 0, length | |||
Py_ssize_t is_escaping = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make in_quotes and is_escaping of type bint
pls add a whatsnew note (0.20.0 bug fix section) |
60bb9da
to
92f4e01
Compare
Thanks, @jreback. I made the changes you requested, rebased to master, and squashed the commits. Please note: I revised the test to include a test for escaped characters in keys and values (i.e. columns and data of dataframe). Previously I was only testing escaped characters in values/data. |
@@ -351,6 +351,7 @@ Bug Fixes | |||
- Require at least 0.23 version of cython to avoid problems with character encodings (:issue:`14699`) | |||
- Bug in converting object elements of array-like objects to unsigned 64-bit integers (:issue:`4471`, :issue:`14982`) | |||
- Bug in ``pd.pivot_table()`` where no error was raised when values argument was not in the columns (:issue:`14938`) | |||
- Bug in ``.to_json()`` where lines=True and contents (keys or values) contain escaped characters (:issue:`15096`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use double-backticks around lines=True
@@ -972,6 +972,13 @@ def test_to_jsonl(self): | |||
self.assertEqual(result, expected) | |||
assert_frame_equal(pd.read_json(result, lines=True), df) | |||
|
|||
# GH15096 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a 1-line comment on what you are testing here
lgtm, minor comments and some linting issues to fix. ping on green. |
Updates existing to_json methodology by adding is_escaping variable, which ensures escaped chars are handled correctly. - Includes test for escaped characters in keys and values (i.e. columns and data). - Includes bug fix in whatsnew - Revised type of in_quotes and is_escaping to bint xref pandas-dev#14693 xref pandas-dev#15096
92f4e01
to
d114455
Compare
thanks! nice concise PR! keep em coming |
Thanks, @jreback. I appreciate the reviews and feedback. FYI: There is no change in runtime performance from the bug fix. This is somewhat obvious, but it was a good exercise for me, so I thought I'd share. I reran the following performance tests from #14408: Master:
This branch:
|
@rouzazari great! note that we already have some asv's for perf testing, I am not sure how much they cover for json though, you can have a look in pandas/asv/benchmarks/packers.py if you think some cases need more, then please PR! |
Got it. I'm not too familiar with this area, but I'll take a look. |
Updates existing to_json methodology by adding is_escaping variable, which ensures escaped chars are handled correctly. xref pandas-dev#14693 closes pandas-dev#15096 Author: Rouz Azari <[email protected]> Closes pandas-dev#15117 from rouzazari/to_json_lines_with_escaping and squashes the following commits: d114455 [Rouz Azari] BUG: Fix to_json lines with escaped characters
Updates existing to_json methodology by adding is_escaping variable,
which ensures escaped chars are handled correctly.
Bug description: A simple check of whether the prior char is a backslash
is insufficient because the backslash may itself be escaped.
A test is also included (previously included in #14693).
xref #14693
xref #15096
git diff upstream/master | flake8 --diff
(usedpep8radius master --diff
instead)