Move HTML libraries gotchas to html io docs

jorisvandenbossche · jorisvandenbossche · commit 53a69707717d · 2017-01-25T11:41:31.000+01:00
diff --git a/doc/source/gotchas.rst b/doc/source/gotchas.rst
@@ -308,78 +308,6 @@ where the data copying occurs.
 See `this link <http://stackoverflow.com/questions/13592618/python-pandas-dataframe-thread-safe>`__
 for more information.
 
-.. _gotchas.html:
-
-HTML Table Parsing
-------------------
-There are some versioning issues surrounding the libraries that are used to
-parse HTML tables in the top-level pandas io function ``read_html``.
-
-**Issues with** |lxml|_
-
-   * Benefits
-
-     * |lxml|_ is very fast
-
-     * |lxml|_ requires Cython to install correctly.
-
-   * Drawbacks
-
-     * |lxml|_ does *not* make any guarantees about the results of its parse
-       *unless* it is given |svm|_.
-
-     * In light of the above, we have chosen to allow you, the user, to use the
-       |lxml|_ backend, but **this backend will use** |html5lib|_ if |lxml|_
-       fails to parse
-
-     * It is therefore *highly recommended* that you install both
-       |BeautifulSoup4|_ and |html5lib|_, so that you will still get a valid
-       result (provided everything else is valid) even if |lxml|_ fails.
-
-**Issues with** |BeautifulSoup4|_ **using** |lxml|_ **as a backend**
-
-   * The above issues hold here as well since |BeautifulSoup4|_ is essentially
-     just a wrapper around a parser backend.
-
-**Issues with** |BeautifulSoup4|_ **using** |html5lib|_ **as a backend**
-
-   * Benefits
-
-     * |html5lib|_ is far more lenient than |lxml|_ and consequently deals
-       with *real-life markup* in a much saner way rather than just, e.g.,
-       dropping an element without notifying you.
-
-     * |html5lib|_ *generates valid HTML5 markup from invalid markup
-       automatically*. This is extremely important for parsing HTML tables,
-       since it guarantees a valid document. However, that does NOT mean that
-       it is "correct", since the process of fixing markup does not have a
-       single definition.
-
-     * |html5lib|_ is pure Python and requires no additional build steps beyond
-       its own installation.
-
-   * Drawbacks
-
-     * The biggest drawback to using |html5lib|_ is that it is slow as
-       molasses.  However consider the fact that many tables on the web are not
-       big enough for the parsing algorithm runtime to matter. It is more
-       likely that the bottleneck will be in the process of reading the raw
-       text from the URL over the web, i.e., IO (input-output). For very large
-       tables, this might not be true.
-
-
-.. |svm| replace:: **strictly valid markup**
-.. _svm: http://validator.w3.org/docs/help.html#validation_basics
-
-.. |html5lib| replace:: **html5lib**
-.. _html5lib: https://github.com/html5lib/html5lib-python
-
-.. |BeautifulSoup4| replace:: **BeautifulSoup4**
-.. _BeautifulSoup4: http://www.crummy.com/software/BeautifulSoup
-
-.. |lxml| replace:: **lxml**
-.. _lxml: http://lxml.de
-
 
 Byte-Ordering Issues
 --------------------
diff --git a/doc/source/install.rst b/doc/source/install.rst
@@ -285,7 +285,7 @@ Optional Dependencies
     okay.)
   * `BeautifulSoup4`_ and `lxml`_
   * `BeautifulSoup4`_ and `html5lib`_ and `lxml`_
-  * Only `lxml`_, although see :ref:`HTML Table Parsing <faq.html>`
+  * Only `lxml`_, although see :ref:`HTML Table Parsing <gotchas.html>`
     for reasons as to why you should probably **not** take this approach.
 
   .. warning::
@@ -294,7 +294,7 @@ Optional Dependencies
        `lxml`_ or `html5lib`_ or both.
        :func:`~pandas.read_html` will **not** work with *only*
        `BeautifulSoup4`_ installed.
-     * You are highly encouraged to read :ref:`HTML Table Parsing gotchas <gotchas.html>`.
+     * You are highly encouraged to read :ref:`HTML Table Parsing gotchas <io.html.gotchas>`.
        It explains issues surrounding the installation and
        usage of the above three libraries.
      * You may need to install an older version of `BeautifulSoup4`_:
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -2043,8 +2043,8 @@ Reading HTML Content
 
 .. warning::
 
-   We **highly encourage** you to read the :ref:`HTML Table Parsing gotchas<gotchas.html>`
-   regarding the issues surrounding the BeautifulSoup4/html5lib/lxml parsers.
+   We **highly encourage** you to read the :ref:`HTML Table Parsing gotchas<io.html.gotchas>`
+   below regarding the issues surrounding the BeautifulSoup4/html5lib/lxml parsers.
 
 .. versionadded:: 0.12.0
 
@@ -2346,6 +2346,83 @@ Not escaped:
    Some browsers may not show a difference in the rendering of the previous two
    HTML tables.
 
+
+.. _io.html.gotchas:
+
+HTML Table Parsing Gotchas
+''''''''''''''''''''''''''
+
+There are some versioning issues surrounding the libraries that are used to
+parse HTML tables in the top-level pandas io function ``read_html``.
+
+**Issues with** |lxml|_
+
+   * Benefits
+
+     * |lxml|_ is very fast
+
+     * |lxml|_ requires Cython to install correctly.
+
+   * Drawbacks
+
+     * |lxml|_ does *not* make any guarantees about the results of its parse
+       *unless* it is given |svm|_.
+
+     * In light of the above, we have chosen to allow you, the user, to use the
+       |lxml|_ backend, but **this backend will use** |html5lib|_ if |lxml|_
+       fails to parse
+
+     * It is therefore *highly recommended* that you install both
+       |BeautifulSoup4|_ and |html5lib|_, so that you will still get a valid
+       result (provided everything else is valid) even if |lxml|_ fails.
+
+**Issues with** |BeautifulSoup4|_ **using** |lxml|_ **as a backend**
+
+   * The above issues hold here as well since |BeautifulSoup4|_ is essentially
+     just a wrapper around a parser backend.
+
+**Issues with** |BeautifulSoup4|_ **using** |html5lib|_ **as a backend**
+
+   * Benefits
+
+     * |html5lib|_ is far more lenient than |lxml|_ and consequently deals
+       with *real-life markup* in a much saner way rather than just, e.g.,
+       dropping an element without notifying you.
+
+     * |html5lib|_ *generates valid HTML5 markup from invalid markup
+       automatically*. This is extremely important for parsing HTML tables,
+       since it guarantees a valid document. However, that does NOT mean that
+       it is "correct", since the process of fixing markup does not have a
+       single definition.
+
+     * |html5lib|_ is pure Python and requires no additional build steps beyond
+       its own installation.
+
+   * Drawbacks
+
+     * The biggest drawback to using |html5lib|_ is that it is slow as
+       molasses.  However consider the fact that many tables on the web are not
+       big enough for the parsing algorithm runtime to matter. It is more
+       likely that the bottleneck will be in the process of reading the raw
+       text from the URL over the web, i.e., IO (input-output). For very large
+       tables, this might not be true.
+
+
+.. |svm| replace:: **strictly valid markup**
+.. _svm: http://validator.w3.org/docs/help.html#validation_basics
+
+.. |html5lib| replace:: **html5lib**
+.. _html5lib: https://github.com/html5lib/html5lib-python
+
+.. |BeautifulSoup4| replace:: **BeautifulSoup4**
+.. _BeautifulSoup4: http://www.crummy.com/software/BeautifulSoup
+
+.. |lxml| replace:: **lxml**
+.. _lxml: http://lxml.de
+
+
+
+
 .. _io.excel:
 
 Excel files