Skip to content

PERF: improved perf in .to_json when lines=True #14429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Oct 14, 2016

closes #14408

master

In [1]: np.random.seed(1234)

In [2]: N = 100000
   ...: C = 5
   ...: df = DataFrame(dict([('float{0}'.format(i), np.random.randn(N)) for i in range(C)]))

In [3]: %timeit df.to_json('foo.json',orient='records',lines=True)
1 loop, best of 3: 3.67 s per loop

PR (with proper encoding/decoding, IOW work directly on the bytes and minimize copies)

In [5]: %timeit df.to_json('foo.json',orient='records',lines=True)
10 loops, best of 3: 137 ms per loop

@jreback jreback added Performance Memory or execution speed performance IO JSON read_json, to_json, json_normalize labels Oct 14, 2016
@jreback jreback added this to the 0.19.1 milestone Oct 14, 2016
@jreback
Copy link
Contributor Author

jreback commented Oct 14, 2016

xref #14391

cc @joshowen

@jreback jreback force-pushed the json branch 2 times, most recently from c26b4cd to ca94170 Compare October 15, 2016 13:35
@codecov-io
Copy link

codecov-io commented Oct 15, 2016

Current coverage is 85.26% (diff: 100%)

Merging #14429 into master will decrease coverage by <.01%

@@             master     #14429   diff @@
==========================================
  Files           140        140          
  Lines         50639      50622    -17   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
- Hits          43177      43161    -16   
+ Misses         7462       7461     -1   
  Partials          0          0          

Powered by Codecov. Last update 286b9b9...e855e1f

@jorisvandenbossche
Copy link
Member

Looks good to me

@jreback
Copy link
Contributor Author

jreback commented Oct 15, 2016

closed by 7cad3f1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO JSON read_json, to_json, json_normalize Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PERF: to_json very slow with lines=True
3 participants