-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Fix for json lines issue with backslashed quotes #14693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 85.20% (diff: 100%)@@ master #14693 diff @@
==========================================
Files 143 143
Lines 50787 50787
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43273 43274 +1
+ Misses 7514 7513 -1
Partials 0 0
|
@@ -59,7 +59,7 @@ Bug Fixes | |||
- Bug in clipboard functions on Windows 10 and python 3 (:issue:`14362`, :issue:`12807`) | |||
- Bug in ``.to_clipboard()`` and Excel compat (:issue:`12529`) | |||
|
|||
|
|||
- Bug in to_json with lines=true containing backslashed quotes (:issue:`14693`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.to_json()
with lines=True
specified, containing .....
result = df.to_json(orient="records", lines=True) | ||
expected = '{"a":"foo}","b":"bar"}\n{"a":"foo\\"","b":"bar"}' | ||
expected = ('{"a":"foo}","b":"bar"}\n{"a":"foo\\"","b":"bar"}\n' | ||
'{"a":"foo\\\\","b":"bar"}') | ||
self.assertEqual(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [5]: df.to_json(lines=True,orient='records')
Out[5]: '{"a":"foo}","b":"bar"}\n{"a":"foo\\"","b":"bar"}\n{"a":"foo\\\\","b":"bar"}'
is on current master. what is different?
This fixed an edge case, but ultimately there are a bunch more. I'm moving away from using this to the following code:
Their solution is fairly simple, but I'm not comfortable enough to update the vendored ujson package. I've added the following PR to speed it up a bit: |
@jreback I think the right way to do this is to use jsonlines or to build its functionality into ujson rather than trying to transform the json formatted output. What do you think? |
jfyi, |
also, there is no need to use import io
import jsonlines
buf = io.StringIO()
with jsonlines.Writer(buf) as writer:
writer.write_all(ujson.loads(s))
return buf.getvalue() |
IIRC from the original issue, @aterrel and I had discussed this. Though its pretty performant now, the correct approach is to put it in the custom ujson code that pandas uses. That is somewhat more involved (though probably pretty straightforward). |
Updates existing to_json methodology by adding is_escaping variable, which ensures escaped chars are handled correctly. Bug description: A simple check of whether the prior char is a backslash is insufficient because the backslash may itself be escaped. A test is also included (previously included in pandas-dev#14693). xref pandas-dev#14693 xref pandas-dev#15096
Updates existing to_json methodology by adding is_escaping variable, which ensures escaped chars are handled correctly. - Includes test for escaped characters in keys and values (i.e. columns and data). - Includes bug fix in whatsnew - Revised type of in_quotes and is_escaping to bint xref pandas-dev#14693 xref pandas-dev#15096
Updates existing to_json methodology by adding is_escaping variable, which ensures escaped chars are handled correctly. - Includes test for escaped characters in keys and values (i.e. columns and data). - Includes bug fix in whatsnew - Revised type of in_quotes and is_escaping to bint xref pandas-dev#14693 xref pandas-dev#15096
Updates existing to_json methodology by adding is_escaping variable, which ensures escaped chars are handled correctly. xref #14693 closes #15096 Author: Rouz Azari <[email protected]> Closes #15117 from rouzazari/to_json_lines_with_escaping and squashes the following commits: d114455 [Rouz Azari] BUG: Fix to_json lines with escaped characters
Updates existing to_json methodology by adding is_escaping variable, which ensures escaped chars are handled correctly. xref pandas-dev#14693 closes pandas-dev#15096 Author: Rouz Azari <[email protected]> Closes pandas-dev#15117 from rouzazari/to_json_lines_with_escaping and squashes the following commits: d114455 [Rouz Azari] BUG: Fix to_json lines with escaped characters
git diff upstream/master | flake8 --diff
This is an additional fix to:
#14429
#14391