-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Add Zstandard compression support #43925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @jonashaag! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-12-22 09:44:38 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is probably ok, but
- need to add this to some of the builds
- need to update the doc strings for virtually all methods as the routines you changes are for evertyhing (e.g. it should work, unless of course those formats don't support it, e.g. hdf5 likely doesn't).
Thanks for the review!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the way you are importing / skipping is not inline with other testing.
cc @twoertwein if you can have a look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok if you can merge master and address the existing comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, looks good from my side!
I think the only pending parts are 1) whatsnew entry for 1.4, 2) pytest.importorskip, 3) not testing zstd on all CI configurations, and 4) rebasing
Will push an update soon |
a831405
to
2334af8
Compare
I've consolidated all |
850be36
to
532dd8f
Compare
77c9022
to
760ba0a
Compare
If this builds OK it should be ready for re-review |
Test failures should go away when this PR is merged (missing |
You will need to add "zstd" here Line 246 in 85c221a
|
The only obvious part missing to me is a whatsnew entry. |
92ed77e
to
6843884
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks really good! need to change the shared docs from generic to shared_docs and the circular import will go away.
some small other comments. pls merge master and ping on green.
pandas/io/common.py
Outdated
@@ -1136,18 +1130,20 @@ def _is_binary_mode(handle: FilePath | BaseBuffer, mode: str) -> bool: | |||
) | |||
|
|||
|
|||
def _get_binary_io_classes() -> tuple[type, ...]: | |||
def _get_binary_io_classes(_cache: list[type] = []) -> tuple[type, ...]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about?
@functools.cache
def _get_binary_io_classes() -> tuple[type, ...]:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python 3.9+ only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@functools.lru_cache
should work as well, or?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure if you prefer that over the custom memoization I'm happy to change it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is only one "least recently used" input argument (might be a bit too much to use lru_cache) but it keeps the code slightly simpler. Will defer to @jreback
(".xz", "xz"), | ||
(".XZ", "xz"), | ||
pytest.param((".zst", "zstd"), marks=td.skip_if_no("zstandard")), | ||
pytest.param((".ZST", "zstd"), marks=td.skip_if_no("zstandard")), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the compression tests take too much time: I would expect that we need only one upper-case test, we have one .lower()
call that affects all compression formats. @jreback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small benchmark: io/test_common.py
+ io/test_pickle.py
with full list: 4.25 s, with small list: 4.2 s
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick benchmark! Based on that we can keep all the upper cases. Compression is also used in a lot of CSV tests.
@jreback ping, should be good to go now. Test failures are because one of the test only works when this has been merged to |
4f6772c
to
84d61d0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there still seems to be a few uses of generic._share_docs, can you change these to directly use the shared_docs itself (we must alias this but its not great)
84d61d0
to
a3a6c1a
Compare
Something with CI is really broken, all the PRs are red |
f331616
to
8bd477f
Compare
@jreback should be good to go |
can you merge master once more, been having a number of CI issues |
8bd477f
to
735b1db
Compare
@jreback looks like those are only failing because those files are not yet on |
what files? |
https://github.com/pandas-dev/pandas/tree/master/pandas/tests/io/parser/data pandas/pandas/tests/io/parser/test_network.py Lines 38 to 43 in 5f36af3
Hardcoded URL access to master branch |
thanks @jonashaag very nice! |
Looks like master is failing some builds are missing zstandard (or not https://github.com/pandas-dev/pandas/runs/4612486852?check_suite_focus=true#step:10:255 |
Yep
|
Haven't found any discussion about Zstandard so far but I think it would be a valuable addition to the Pickle compression options.