Skip to content

show_versions fails on master with latest lxml #23934

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
max-sixty opened this issue Nov 26, 2018 · 11 comments · Fixed by #23949
Closed

show_versions fails on master with latest lxml #23934

max-sixty opened this issue Nov 26, 2018 · 11 comments · Fixed by #23949
Labels
CI Continuous Integration good first issue
Milestone

Comments

@max-sixty
Copy link
Contributor

Code Sample, a copy-pastable example if possible

[ins] In [1]: import pandas as pd
p
[ins] In [2]: pd.show_versions()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-3d232a07e144> in <module>
----> 1 pd.show_versions()

~/workspace/pandas/pandas/util/_print_versions.py in show_versions(as_json)
    107             else:
    108                 mod = importlib.import_module(modname)
--> 109             ver = ver_f(mod)
    110             deps_blob.append((modname, ver))
    111         except ImportError:

~/workspace/pandas/pandas/util/_print_versions.py in <lambda>(mod)
     86         ("xlwt", lambda mod: mod.__VERSION__),
     87         ("xlsxwriter", lambda mod: mod.__version__),
---> 88         ("lxml", lambda mod: mod.etree.__version__),
     89         ("bs4", lambda mod: mod.__version__),
     90         ("html5lib", lambda mod: mod.__version__),

AttributeError: module 'lxml' has no attribute 'etree'

Expected Output

This seems to work OK on 0.23.4 (output below).

Let me know if there are other potential factors you'd like me to check

Output of pd.show_versions()

[ins] In [5]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.1.final.0
python-bits: 64
OS: Darwin
OS-release: 18.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.23.4
pytest: 3.9.2
pip: 18.1
setuptools: 40.6.2
Cython: 0.28.5
numpy: 1.15.2
scipy: None
pyarrow: None
xarray: 0.10.9
IPython: 7.1.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.5
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.0
openpyxl: None
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml: 4.2.5
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.12
pymysql: None
psycopg2: None
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: 0.6.1+2.gd98c621
pandas_datareader: 0.7.0

@TomAugspurger
Copy link
Contributor

Are you sure lxml is installed properly? Whats the output of

import lxml.etree
lxml.etree.__version__

@max-sixty
Copy link
Contributor Author

Apologies - it works now. No idea how what could have caused this.


[ins] In [1]: import lxml.etree

[ins] In [2]: lxml.etree.__version__
Out[2]: '4.2.5'

[ins] In [3]: import pandas as pd

[ins] In [4]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: b7294dd3ec47dffa50f3c8bdf89aa8a01f4494f2
python: 3.7.1.final.0
python-bits: 64
OS: Darwin
OS-release: 18.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.19.0+4049.gb7294dd3e
pytest: 3.9.2
pip: 18.1
setuptools: 40.6.2
Cython: 0.28.5
numpy: 1.15.2
scipy: None
pyarrow: None
xarray: 0.10.9
IPython: 7.1.1
sphinx: None
patsy: 0.5.0
dateutil: 2.7.5
pytz: 2018.5
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.0
openpyxl: None
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: None
lxml: 4.2.5
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.12
pymysql: None
psycopg2: None
jinja2: 2.8.1
s3fs: None
fastparquet: None
pandas_gbq: 0.6.1+2.gd98c621
pandas_datareader: None
gcsfs: None

@max-sixty
Copy link
Contributor Author

Actually - if I don't import lmxl initially, then it fails:

[ins] In [1]: import pandas as pd

[ins] In [2]: pd.show_versions()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-3d232a07e144> in <module>
----> 1 pd.show_versions()

~/workspace/pandas/pandas/util/_print_versions.py in show_versions(as_json)
    107             else:
    108                 mod = importlib.import_module(modname)
--> 109             ver = ver_f(mod)
    110             deps_blob.append((modname, ver))
    111         except ImportError:

~/workspace/pandas/pandas/util/_print_versions.py in <lambda>(mod)
     86         ("xlwt", lambda mod: mod.__VERSION__),
     87         ("xlsxwriter", lambda mod: mod.__version__),
---> 88         ("lxml", lambda mod: mod.etree.__version__),
     89         ("bs4", lambda mod: mod.__version__),
     90         ("html5lib", lambda mod: mod.__version__),

AttributeError: module 'lxml' has no attribute 'etree'

@max-sixty
Copy link
Contributor Author

max-sixty commented Nov 26, 2018

Is this because etree isn't imported into lxml's root?


[ins] In [1]: import lxml

[ins] In [6]: lxml.etree
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-6-f7059756e878> in <module>
----> 1 lxml.etree

AttributeError: module 'lxml' has no attribute 'etree'

[ins] In [7]: from lxml import etree

[ins] In [8]:

@max-sixty max-sixty reopened this Nov 26, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 26, 2018

I can't replicate this on master and latest lxml.

I'm also inclined to believe lxml was not properly installed.

@max-sixty
Copy link
Contributor Author

I installed in an empty docker container to check:

# docker run -it python:3 bash                                                                                                                                                                                                                                                          

root@f6dd4625aeb6:/# pip install lxml

Collecting lxml
  Downloading https://files.pythonhosted.org/packages/7a/6b/a3d2d3c3075617edcbfc272d79281e812b1a94dab37923b1d06fdfe2e906/lxml-4.2.5-cp37-cp37m-manylinux1_x86_64.whl (5.8MB)
    100% |████████████████████████████████| 5.8MB 2.6MB/s
Installing collected packages: lxml
Successfully installed lxml-4.2.5

root@f6dd4625aeb6:/# python

Python 3.7.1 (default, Nov 16 2018, 22:26:09)
[GCC 6.3.0 20170516] on linux
Type "help", "copyright", "credits" or "license" for more information.

>>> import lxml
>>> lxml.etree
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'lxml' has no attribute 'etree'

@gfyoung
Copy link
Member

gfyoung commented Nov 27, 2018

That suggests a broken wheel perhaps on lxml ?

@max-sixty
Copy link
Contributor Author

Are you sure it's not that etree isn't imported into lxml root? That seems like the simplest explanation?
https://github.com/lxml/lxml/blob/master/src/lxml/__init__.py

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 27, 2018 via email

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 27, 2018

Reproduced. It seems like openpyxl imports lxml.etree, so it's already in sys.modules.

bash-4.4$ pip install openpyxl
Collecting openpyxl
...
Successfully installed openpyxl-2.5.11
bash-4.4$ python -c "import pandas; pandas.show_versions()"

INSTALLED VERSIONS
------------------
commit: a7767b91539d66741a15b3cb05e31ce6c21613e0
python: 3.7.0.final.0
python-bits: 64
OS: Darwin
OS-release: 17.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

To fix this, we should just change

("lxml", lambda mod: mod.etree.__version__),
to from "lxml" to "lxml.etree".

@TomAugspurger TomAugspurger added this to the Contributions Welcome milestone Nov 27, 2018
@TomAugspurger TomAugspurger added Effort Low CI Continuous Integration labels Nov 27, 2018
@jreback
Copy link
Contributor

jreback commented Nov 27, 2018

so we call .show_versions() in any tests?

maybe should do this in a subprocess call to make sure it works

@jreback jreback modified the milestones: Contributions Welcome, 0.24.0 Nov 28, 2018
TomAugspurger pushed a commit that referenced this issue Nov 29, 2018
* BUG: Fix lxml import in show_versions

Fixes #23934
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this issue Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Continuous Integration good first issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants