-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Update Performance Considerations section in docs #17303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* re-run all tests * add tests for feather and pickle
If you'd like, it'd be nice to show the new parquet functionality here too. #15838 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice update!
doc/source/io.rst
Outdated
@@ -5208,82 +5208,105 @@ easy conversion to and from pandas. | |||
Performance Considerations | |||
-------------------------- | |||
|
|||
This is an informal comparison of various IO methods, using pandas 0.13.1. | |||
This is an informal comparison of various IO methods, using pandas 0.20.3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you maybe put a stronger warning here that the timings are machine dependent and you should not look at small differences?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Data columns (total 2 columns): | ||
A 1000000 non-null float64 | ||
B 1000000 non-null float64 | ||
dtypes: float64(2) | ||
memory usage: 22.9 MB | ||
memory usage: 15.3 MB | ||
|
||
Writing | ||
|
||
.. code-block:: ipython | ||
|
||
In [14]: %timeit test_sql_write(df) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, we could make these an ipython block (so they would run)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't do that, building the docs already takes a long time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we actually do this in various sections. And the incremental time is quite small here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's true we have them in some places, but I suppose it are a lot smaller timings.
The extra time here is not small. Timing only writing ones that already existed takes 1min30 on my laptop, and then this PR even added more cases and you also have the reading. So this will add maybe like 3 to 5 min to the doc build. Which is IMO not worth it.
doc/source/io.rst
Outdated
In [34]: %timeit test_pickle_read() | ||
5.75 ms ± 110 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) | ||
|
||
In [35]: %timeit test_pickle_read_compress() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the compression ones are pretty bogus because you are using random data. maybe add a colum of 1's and a column of strings or something to make compression not horrible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
* Update all timings * Clarify wording
Codecov Report
@@ Coverage Diff @@
## master #17303 +/- ##
==========================================
- Coverage 91.03% 90.99% -0.05%
==========================================
Files 162 162
Lines 49569 49569
==========================================
- Hits 45124 45103 -21
- Misses 4445 4466 +21
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #17303 +/- ##
==========================================
+ Coverage 91.03% 91.21% +0.18%
==========================================
Files 162 163 +1
Lines 49569 49810 +241
==========================================
+ Hits 45124 45435 +311
+ Misses 4445 4375 -70
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was actually already good, so merging. Thanks!
Although, as noted by @chris-b1, it would be nice to include |
git diff upstream/master -u -- "*.py" | flake8 --diff