Skip to content

DOC: fix shared compression_options and decompression_options #47609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 6, 2022
Merged

DOC: fix shared compression_options and decompression_options #47609

merged 2 commits into from
Jul 6, 2022

Conversation

alastair
Copy link
Contributor

@alastair alastair commented Jul 6, 2022

Both compression_options and decompression_options made the same
reference to zstandard.ZstdDecompressor however compression uses
zstandard.ZstdCompressor and so the documentation for
compression_options should refer to this class instead

The shared doc strings compression_options and decompression_options
require an interpolated string for the name of the parameter that takes
a filepath/buffer, with the name of this parameter.
It was missing in a few places.

Also fix a small gramatical error in compression_options

It took me a while to find the documentation for the python zstandard library to see what options are available to pass in as a dictionary, do you think it's worth updating the text zstandard.ZstdDecompressor in the docs to be a link to https://python-zstandard.readthedocs.io/en/latest/decompressor.html?

I saw some test failures when running test_fast.sh inside the provided Docker image, but from what I can tell, these are not related to the changes that I made, and I also see the same failure when running tests from main:

test failure
pandas/tests/test_expressions.py:133: AssertionError
________________ TestExpressions.test_run_binary[gt-False-df2] _________________
[gw2] linux -- Python 3.8.13 /opt/conda/bin/python

self = <pandas.tests.test_expressions.TestExpressions object at 0x7fcc8c9f3fd0>
df =         A   B   C   D
0       0   0  12   0
1      22  22  54  17
2       0   0  21   0
3       0  95   0   0
4       ...3  91
9997   25   0  32   5
9998    0  55   0  62
9999   42  25  30   1
10000   3   0  75  56

[10001 rows x 4 columns]
flex = False, comparison_op = <built-in function gt>

    @pytest.mark.parametrize(
        "df",
        [
            _integer,
            _integer2,
            # randint to get a case with zeros
            _integer * np.random.randint(0, 2, size=np.shape(_integer)),
            _frame,
            _frame2,
            _mixed,
            _mixed2,
        ],
    )
    @pytest.mark.parametrize("flex", [True, False])
    def test_run_binary(self, df, flex, comparison_op):
        """
        tests solely that the result is the same whether or not numexpr is
        enabled.  Need to test whether the function does the correct thing
        elsewhere.
        """
        arith = comparison_op.__name__
        with option_context("compute.use_numexpr", False):
            other = df.copy() + 1

        expr._MIN_ELEMENTS = 0
        expr.set_test_mode(True)

        result, expected = self.call_op(df, other, flex, arith)

        used_numexpr = expr.get_test_result()
>       assert used_numexpr, "Did not use numexpr as expected."
E       AssertionError: Did not use numexpr as expected.
E       assert []
  • no related ticket
  • Built and inspected docs
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • N/A Added type annotations to new arguments/methods/functions.
  • N/A Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

Both compression_options and decompression_options made the same
reference to zstandard.ZstdDecompressor however compression uses
zstandard.ZstdCompressor and so the documentation for
compression_options should refer to this class instead
@pep8speaks
Copy link

pep8speaks commented Jul 6, 2022

Hello @alastair! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-07-06 11:55:10 UTC

The shared doc strings compression_options and decompression_options
require an interpolated string for the name of the parameter that takes
a filepath/buffer, with the name of this parameter.
It was missing in a few places.

Also fix a small gramatical error in compression_options
Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you build the docs and show an example?

@alastair
Copy link
Contributor Author

alastair commented Jul 6, 2022

original https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html:

For on-the-fly compression of the output data. If ‘infer’ and ‘%s’ path-like, then

updated:
Screen Shot 2022-07-06 at 2 57 27 PM


original https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html:

For on-the-fly decompression of on-disk data. If ‘infer’ and ‘%s’ is path-like, then

updated:
Screen Shot 2022-07-06 at 2 59 02 PM


Decompression for pandas.read_sas appears to be quite new and I can't find it in the public docs.

Updated:
Screen Shot 2022-07-06 at 3 01 38 PM


original https://pandas.pydata.org/docs/reference/api/pandas.read_stata.html:

For on-the-fly decompression of on-disk data. If ‘infer’ and ‘%s’ is path-like, then

updated:
Screen Shot 2022-07-06 at 3 03 15 PM

Copy link
Member

@phofl phofl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, lgtm

@mroeschke mroeschke added Docs IO Data IO issues that don't fit into a more specific label labels Jul 6, 2022
@mroeschke mroeschke added this to the 1.5 milestone Jul 6, 2022
@mroeschke mroeschke merged commit 7a29d4a into pandas-dev:main Jul 6, 2022
@mroeschke
Copy link
Member

Thanks @alastair

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
…-dev#47609)

* DOC: Fix reference to ZstdCompressor in compression_options

Both compression_options and decompression_options made the same
reference to zstandard.ZstdDecompressor however compression uses
zstandard.ZstdCompressor and so the documentation for
compression_options should refer to this class instead

* DOC: Fix string interpolation in compression/decompression_options

The shared doc strings compression_options and decompression_options
require an interpolated string for the name of the parameter that takes
a filepath/buffer, with the name of this parameter.
It was missing in a few places.

Also fix a small gramatical error in compression_options
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants