Skip to content

BUG: lzma is a required part of python, make it optional #27882

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Aug 19, 2019
Merged

BUG: lzma is a required part of python, make it optional #27882

merged 20 commits into from
Aug 19, 2019

Conversation

guilherme-salome
Copy link
Contributor

@guilherme-salome guilherme-salome commented Aug 12, 2019

Importing lzma when Python has been compiled without its support (usually due to a lack of xz in the system) will raise a warning. Substituted import lzma for helper function.

@guilherme-salome
Copy link
Contributor Author

I cannot modify the pandas/_libs/parsers.pyx without running into issues with cythonize (via pip install -e .).

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @salompas.

We'll also want

  1. A release note in 0.25.1.rst. Can go under the "IO" bug fixes section.
  2. Update the uses of lzma to raise a RuntimeError when a broken Python install tries to use lzma
  3. A test

For the test, something like

diff --git a/pandas/tests/io/test_compression.py b/pandas/tests/io/test_compression.py
index ce459ab24..0d359dd72 100644
--- a/pandas/tests/io/test_compression.py
+++ b/pandas/tests/io/test_compression.py
@@ -1,5 +1,7 @@
 import contextlib
 import os
+import subprocess
+import textwrap
 import warnings

 import pytest
@@ -125,3 +127,15 @@ def test_compression_warning(compression_only):
         with tm.assert_produces_warning(RuntimeWarning, check_stacklevel=False):
             with f:
                 df.to_csv(f, compression=compression_only)
+
+
+def test_with_missing_lzma():
+    # https://github.com/pandas-dev/pandas/issues/27575
+    code = textwrap.dedent(
+        """\
+    import sys
+    sys.modules['lzma'] = None
+    import pandas
+    """
+    )
+    subprocess.check_output(["python", "-c", code])

@TomAugspurger
Copy link
Contributor

I cannot modify the pandas/_libs/parsers.pyx without running into issues with cythonize (via pip install -e .).

What issues do you see? You'll need a C compiler, since you're updating a Cython file. Typically I use python setup.py build_ext --inplace, but pip install -e . may do the same.

@TomAugspurger TomAugspurger added this to the 0.25.1 milestone Aug 12, 2019
@TomAugspurger TomAugspurger added the IO Data IO issues that don't fit into a more specific label label Aug 12, 2019
@guilherme-salome
Copy link
Contributor Author

guilherme-salome commented Aug 12, 2019

I cannot modify the pandas/_libs/parsers.pyx without running into issues with cythonize (via pip install -e .).

What issues do you see? You'll need a C compiler, since you're updating a Cython file. Typically I use python setup.py build_ext --inplace, but pip install -e . may do the same.

I was adding an if lzma is None to one of the lines to see if RuntimeError is necessary.
However, when I run pip install -e . after modifying the file, I get:

    ERROR: Complete output from command python setup.py egg_info:
    ERROR: 
    Error compiling Cython file:
    ------------------------------------------------------------
    ...
    
    
    cdef extern from "parser/tokenizer.h":
    
        ctypedef enum ParserState:
    	START_RECORD
    ^
    ------------------------------------------------------------
    
    pandas/_libs/parsers.pyx:90:0: Mixed use of tabs and spaces
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/Users/guilhermesalome/Desktop/pandas/setup.py", line 813, in <module>
        ext_modules=maybe_cythonize(extensions, compiler_directives=directives),
      File "/Users/guilhermesalome/Desktop/pandas/setup.py", line 542, in maybe_cythonize
        return cythonize(extensions, *args, **kwargs)
      File "/Users/guilhermesalome/.pyenv/versions/miniconda3-latest/envs/pandas-dev/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1096, in cythonize
        cythonize_one(*args)
      File "/Users/guilhermesalome/.pyenv/versions/miniconda3-latest/envs/pandas-dev/lib/python3.7/site-packages/Cython/Build/Dependencies.py", line 1219, in cythonize_one
        raise CompileError(None, pyx_file)
    Cython.Compiler.Errors.CompileError: pandas/_libs/parsers.pyx
    Compiling pandas/_libs/parsers.pyx because it changed.
    [1/1] Cythonizing pandas/_libs/parsers.pyx
    ----------------------------------------
ERROR: Command "python setup.py egg_info" failed with error code 1 in /Users/guilhermesalome/Desktop/pandas/

I am modifying the file in Emacs without any mode for Cython.
The error shows up at a line that has nothing to do with the modification I made.

@guilherme-salome
Copy link
Contributor Author

I managed to make it work in nano and putting whitespaces by hand. It might be an issue with my Emacs using tabs.

@guilherme-salome
Copy link
Contributor Author

guilherme-salome commented Aug 12, 2019

+def test_with_missing_lzma():
+    # https://github.com/pandas-dev/pandas/issues/27575
+    code = textwrap.dedent(
+        """\
+    import sys
+    sys.modules['lzma'] = None
+    import pandas
+    """
+    )
+    subprocess.check_output(["python", "-c", code])

This is much smarter than what I was doing to test the code (change the name of the lzma.py module).

@guilherme-salome
Copy link
Contributor Author

guilherme-salome commented Aug 12, 2019

I do not understand what the command git diff upstream/master -u -- "*.py" | flake8 --diff does, but if I do git diff master -u -- "*.py" | flake8 --diff" the output is nothing.

@guilherme-salome guilherme-salome changed the title BUG: #27575 BUG: import lzma can fail #27575 Aug 12, 2019
@guilherme-salome guilherme-salome marked this pull request as ready for review August 12, 2019 22:26
@guilherme-salome
Copy link
Contributor Author

Please let me know if this is looking good, and how I can improve the solution.

@jbrockmendel
Copy link
Member

but if I do git diff master -u -- "*.py" | flake8 --diff" the output is nothing

That means it is passing that check.

The isort check is failing. Should be fixed if you run isort -rc pandas

@guilherme-salome
Copy link
Contributor Author

I made the changes you suggested, let me know how it is looking now.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty close. Can you

  1. Rename import_lzma to _import_lzma (pandas.compat is supposed to be private, but people still use it)
  2. move the if lzma is None check to a new method in pandas/compat/__init__.py called _get_lzma_file(lzma). If lzma is None it raises the RuntimeError. Otherwise, it returns lzma.LZMAFile, which we can pass the kwargs to when lzma is not None.
  3. Add at least one subprocess-style test that hits the RuntimeError.

RuntimeError if the check fails. Updated remainder of the code to use
this function.
@guilherme-salome
Copy link
Contributor Author

Looks pretty close. Can you

  1. Rename import_lzma to _import_lzma (pandas.compat is supposed to be private, but people still use it)
  2. move the if lzma is None check to a new method in pandas/compat/__init__.py called _get_lzma_file(lzma). If lzma is None it raises the RuntimeError. Otherwise, it returns lzma.LZMAFile, which we can pass the kwargs to when lzma is not None.
  3. Add at least one subprocess-style test that hits the RuntimeError.

Done.
1., 2. - Good idea! Looks way better now.
3. - The test will fail if you do not have lzma, but succeeds otherwise.

@guilherme-salome
Copy link
Contributor Author

guilherme-salome commented Aug 13, 2019

Just for my future reference.
On Emacs if you have the following in your init.el:

(progn (add-hook 'before-save-hook 'whitespace-cleanup)
       (add-hook 'before-save-hook (lambda() (delete-trailing-whitespace))))

When saving a file, the function whitespace-cleanup is called and it substitutes spaces for tabs whenever possible.
To change the behavior so that tabs are substituted for space instead, add the following to init.el:
(setq-default indent-tabs-mode nil)
The variable indent-tabs-mode is used by whitespace-cleanup and will change its behavior.
Now there should not be any issues with pyx files.

@guilherme-salome
Copy link
Contributor Author

I guess another modification to this code is to show the warning just once (right now it shows up twice), but this might not be worth the time.

Copy link

@auvipy auvipy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a double warning is fine. Don't worry about that.

One request for changes on the second test. Otherwise, this looks great.

@jreback jreback changed the title BUG: import lzma can fail #27575 BUG: lzma is a required part of python, make it optional Aug 15, 2019
@guilherme-salome
Copy link
Contributor Author

@jreback @TomAugspurger
Thanks for the suggestions. I made the changes to the code.
Could you check and let me know where to improve? Thanks!

@TomAugspurger TomAugspurger merged commit ba94f9b into pandas-dev:master Aug 19, 2019
@TomAugspurger
Copy link
Contributor

Thanks @salompas!

@lumberbot-app
Copy link

lumberbot-app bot commented Aug 19, 2019

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
$ git checkout 0.25.x
$ git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
$ git cherry-pick -m1 ba94f9baaa1a802eee4820a1188db6501a231ae6
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
$ git commit -am 'Backport PR #27882: BUG: lzma is a required part of python, make it optional'
  1. Push to a named branch :
git push YOURFORK 0.25.x:auto-backport-of-pr-27882-on-0.25.x
  1. Create a PR against branch 0.25.x, I would have named this PR:

"Backport PR #27882 on branch 0.25.x"

And apply the correct labels and milestones.

Congratulation you did some good work ! Hopefully your backport PR will be tested by the continuous integration and merged soon!

If these instruction are inaccurate, feel free to suggest an improvement.

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request Aug 19, 2019
@TomAugspurger
Copy link
Contributor

Backported in #28012.

EunSeop pushed a commit to EunSeop/pandas that referenced this pull request Aug 20, 2019
…27882)

* Importing lzma when Python has been compiled without its support will
raise a warning. Substituted import lzma for helper function.
galuhsahid pushed a commit to galuhsahid/pandas that referenced this pull request Aug 25, 2019
…27882)

* Importing lzma when Python has been compiled without its support will
raise a warning. Substituted import lzma for helper function.
@rth
Copy link
Contributor

rth commented Sep 8, 2021

I opened #43461 to re-visit this. I would like to propose to remove the import warning if lzma is missing. Please comment in the linked issue if you disagree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

import pandas error for missing compression libraries
6 participants