Skip to content

lxml missing from requirements_dev.txt #17747

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
datapythonista opened this issue Oct 2, 2017 · 6 comments · Fixed by #17748
Closed

lxml missing from requirements_dev.txt #17747

datapythonista opened this issue Oct 2, 2017 · 6 comments · Fixed by #17748
Labels
Dependencies Required and optional dependencies Testing pandas testing functions or related to the test suite
Milestone

Comments

@datapythonista
Copy link
Member

Problem description

Tests fail to pass after installing a clean pandas development environment, .because the lxml package is missing.

Based on the instruction at https://pandas.pydata.org/pandas-docs/stable/contributing.html, to set up the environment and run the tests I followed:

conda create -n pandas_dev --file ci/requirements_dev.txt
source activate pandas_dev
python setup.py build_ext --inplace
pytest pandas

And this is the result:

================================================================================================================= FAILURES ================================================================================================================= ______________________________________________________________________________________________________ test_importcheck_thread_safety ______________________________________________________________________________________________________
@pytest.mark.slow
def test_importcheck_thread_safety():
    # see gh-16928

    # force import check by reinitalising global vars in html.py
    reload(pandas.io.html)

    filename = os.path.join(DATA_PATH, 'valid_markup.html')
    helper_thread1 = ErrorThread(target=read_html, args=(filename,))
    helper_thread2 = ErrorThread(target=read_html, args=(filename,))

    helper_thread1.start()
    helper_thread2.start()

    while helper_thread1.is_alive() or helper_thread2.is_alive():
        pass
  assert None is helper_thread1.err is helper_thread2.err

E AssertionError: assert None is ImportError('lxml not found, please install it',)
E + where ImportError('lxml not found, please install it',) = <ErrorThread(Thread-1, stopped 123145411235840)>.err

pandas/tests/io/test_html.py:967: AssertionError
============================================================================================================= warnings summary =============================================================================================================
pandas/tests/dtypes/test_missing.py::test_array_equivalent_compat
/Users/marcgarcia/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/numpy/core/numeric.py:2604: FutureWarning: elementwise == comparison failed and returning scalar instead; this will raise an error or perform elementwise comparison in the future.
return bool(asarray(a1 == a2).all())

-- Docs: http://doc.pytest.org/en/latest/warnings.html
======================================================================== 1 failed, 12619 passed, 1940 skipped, 11 xfailed, 1 xpassed, 1 warnings in 733.55 seconds =========================================================================

Expected Output

After installing lxml, all tests pass.

conda install lxml
pytest pandas

============================================================================================================= warnings summary ============================================================================================================= pandas/tests/dtypes/test_missing.py::test_array_equivalent_compat /Users/marcgarcia/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/numpy/core/numeric.py:2604: FutureWarning: elementwise == comparison failed and returning scalar instead; this will raise an error or perform elementwise comparison in the future. return bool(asarray(a1 == a2).all())

-- Docs: http://doc.pytest.org/en/latest/warnings.html
============================================================================= 12627 passed, 1933 skipped, 11 xfailed, 1 xpassed, 1 warnings in 724.88 seconds ==============================================================================

Output of pd.show_versions()

pandas.show_versions()

INSTALLED VERSIONS

commit: bf5b089
python: 3.6.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.5.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.21.0.dev+558.gbf5b08980.dirty
pytest: 3.2.2
pip: 9.0.1
setuptools: 36.3.0
Cython: 0.27
numpy: 1.13.3
scipy: None
pyarrow: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Oct 2, 2017

@datapythonista which tests are actually failing?

==== 12627 passed, 1933 skipped, 11 xfailed, 1 xpassed, 1 warnings in 724.88 seconds

no tests are actually failing, xfail is another class of tests.

@datapythonista
Copy link
Member Author

It's difficult to see from the output in the problem description, but here you have the relevant parts:

E AssertionError: assert None is ImportError('lxml not found, please install it',)
E + where ImportError('lxml not found, please install it',) = <ErrorThread(Thread-1, stopped 123145411235840)>.err

1 failed, 12619 passed, 1940 skipped, 11 xfailed, 1 xpassed, 1 warnings in 733.55 seconds

So, besides the 11 xfailed, there is 1 failed because of the lxml ImportError

Does it make sense?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 2, 2017

pandas/tests/io/test_html::test_importcheck_thread_safety is the test that has (an unintentional) hard dependency on lxml.

@datapythonista
Copy link
Member Author

Oh, thanks for clarifying @TomAugspurger. Is the fix to this importing lxml just when the feature that requires it is used? I can send a PR.

@TomAugspurger
Copy link
Contributor

We could either pytest.importorskpip('lxml') this test or require lxml in the requirements_dev file. I think either ifs fine.

#17544 is talking a bit about what we want out of our requirements files.

@jreback
Copy link
Contributor

jreback commented Oct 2, 2017

this should fix the test (with a skip), not change the requirements (though not averse to adding them to the requirements), so @TomAugspurger first suggestion.

@jreback jreback added Dependencies Required and optional dependencies Testing pandas testing functions or related to the test suite labels Oct 3, 2017
@jreback jreback added this to the 0.22.0 milestone Oct 31, 2017
@jreback jreback modified the milestones: 0.22.0, 0.21.1 Nov 2, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dependencies Required and optional dependencies Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants