-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PERF: 5x speedup for read_json() with orient='index' by avoiding transpose #26773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cc @TomAugspurger would this be related to #24387 at all? |
Ignore previous comment was too focused on the constructor and not the transposition. This makes sense to me |
Codecov Report
@@ Coverage Diff @@
## master #26773 +/- ##
===========================================
- Coverage 91.71% 41.21% -50.51%
===========================================
Files 178 178
Lines 50771 50771
===========================================
- Hits 46567 20926 -25641
- Misses 4204 29845 +25641
Continue to review full report at Codecov.
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #26773 +/- ##
===========================================
- Coverage 91.71% 41.21% -50.51%
===========================================
Files 178 178
Lines 50771 50771
===========================================
- Hits 46567 20926 -25641
- Misses 4204 29845 +25641
Continue to review full report at Codecov.
|
nice! @qwhelan mind taking a look at the test cases ( looks like this changes the order of the index ) https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=12630
|
@alimcmaster1 Given that this only fails on 3.5, I'm guessing this is a dict-orderedness issue in |
d77a2a2
to
5edd63c
Compare
lgtm, can you add a note in Performance for 0.25.0, ping on green. |
5edd63c
to
cef3d80
Compare
thanks @qwhelan |
The
.T
operator can be quite slow on mixed-typeDataFrame
s due to the creation ofobject
dtype columns. In comparison to direct construction withDataFrame.from_dict()
can generally be much more efficient.Making that swap inside
pd.read_json()
yields a~5-6x
speedup for theorient='index'
case:git diff upstream/master -u -- "*.py" | flake8 --diff