Skip to content

DOC: Update outdated caveats for Anaconda and HTML parsing (#9032) #14739

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 25, 2016
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 0 additions & 37 deletions doc/source/gotchas.rst
Original file line number Diff line number Diff line change
Expand Up @@ -514,40 +514,6 @@ parse HTML tables in the top-level pandas io function ``read_html``.
text from the URL over the web, i.e., IO (input-output). For very large
tables, this might not be true.

**Issues with using** |Anaconda|_

* `Anaconda`_ ships with `lxml`_ version 3.2.0; the following workaround for
`Anaconda`_ was successfully used to deal with the versioning issues
surrounding `lxml`_ and `BeautifulSoup4`_.

.. note::

Unless you have *both*:

* A strong restriction on the upper bound of the runtime of some code
that incorporates :func:`~pandas.io.html.read_html`
* Complete knowledge that the HTML you will be parsing will be 100%
valid at all times

then you should install `html5lib`_ and things will work swimmingly
without you having to muck around with `conda`. If you want the best of
both worlds then install both `html5lib`_ and `lxml`_. If you do install
`lxml`_ then you need to perform the following commands to ensure that
lxml will work correctly:

.. code-block:: sh

# remove the included version
conda remove lxml

# install the latest version of lxml
pip install 'git+git://github.com/lxml/lxml.git'

# install the latest version of beautifulsoup4
pip install 'bzr+lp:beautifulsoup'

Note that you need `bzr <http://bazaar.canonical.com/en>`__ and `git
<http://git-scm.com>`__ installed to perform the last two operations.

.. |svm| replace:: **strictly valid markup**
.. _svm: http://validator.w3.org/docs/help.html#validation_basics
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might just remove this entirely. Things install ok now? (with reasonably current versions)?

Expand All @@ -561,9 +527,6 @@ parse HTML tables in the top-level pandas io function ``read_html``.
.. |lxml| replace:: **lxml**
.. _lxml: http://lxml.de

.. |Anaconda| replace:: **Anaconda**
.. _Anaconda: https://store.continuum.io/cshop/anaconda


Byte-Ordering Issues
--------------------
Expand Down