-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[BUG] handle } in line delimited json #14391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 85.26% (diff: 100%)@@ master #14391 diff @@
==========================================
Files 140 140
Lines 50634 50639 +5
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43173 43178 +5
Misses 7461 7461
Partials 0 0
|
can you add a benchmark for |
Not sure why that test failed,, the only change was a line of whitespace |
self.f = '__test__.msg' | ||
self.N = 100000 | ||
self.C = 5 | ||
self.index = date_range('20000101', periods=self.N, freq='H') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like some things got repeated here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like self.N/self.C are repeated in most of these classes. Want me to clean them all up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure that would be great (you can also make a common base class(s) if that helps as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok that's fine (though @joshowen make sure your example doesn't have dups as this is new code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, yes, it is of course OK to remove the lines in this added code that you do not need for this benchmark
df = DataFrame([["foo}", "bar"], ['foo"', "bar"]], columns=['a', 'b']) | ||
result = df.to_json(orient="records", lines=True) | ||
expected = '{"a":"foo}","b":"bar"}\n{"a":"foo\\"","b":"bar"}' | ||
self.assertEqual(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also round trip it and user assert_frame_equal on the result (in addition to the above test)
@joshowen can you also post a run for this benchmark (versus previous); can also do it in a %timeit as well. Just checking if any perf issues. |
@jreback is there an easy way to do that? Or should I port the asv test to master and run/compare? |
you can run asv if you want, otherwise just do it in ipython (before and after), e.g. something like
and looking at this, we have a BIG perf hit when |
cc @aterrel |
@joshowen can you open another issue about the perf
something odd going on here |
lgtm. @jorisvandenbossche |
Thanks @joshowen ! |
}
#14390git diff upstream/master | flake8 --diff