Skip to content

Commit b31c033

Browse files
committed
ENH: add ability to read html tables directly into DataFrames
use __import__ on 2.6 extra code from previous merges and importlib failure on tests fix bs4 issues with no match provided docstring storm! markup slow tests (bank list data) and add tests for failing parameter values PTF ok that is really it for docstring mania add testfor multiple matches
1 parent 99137af commit b31c033

File tree

10 files changed

+7173
-8
lines changed

10 files changed

+7173
-8
lines changed

ci/install.sh

+2
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,8 @@ if ( ! $VENV_FILE_AVAILABLE ); then
7575
pip install $PIP_ARGS xlrd>=0.9.0
7676
pip install $PIP_ARGS 'http://downloads.sourceforge.net/project/pytseries/scikits.timeseries/0.91.3/scikits.timeseries-0.91.3.tar.gz?r='
7777
pip install $PIP_ARGS patsy
78+
pip install $PIP_ARGS lxml
79+
pip install $PIP_ARGS beautifulsoup4
7880

7981
# fool statsmodels into thinking pandas was already installed
8082
# so it won't refuse to install itself. We want it in the zipped venv

doc/source/api.rst

+7
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,13 @@ File IO
5050
read_csv
5151
ExcelFile.parse
5252

53+
.. currentmodule:: pandas.io.html
54+
55+
.. autosummary::
56+
:toctree: generated/
57+
58+
read_html
59+
5360
HDFStore: PyTables (HDF5)
5461
~~~~~~~~~~~~~~~~~~~~~~~~~
5562
.. currentmodule:: pandas.io.pytables

doc/source/install.rst

+6
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,12 @@ Optional Dependencies
9999
* `openpyxl <http://packages.python.org/openpyxl/>`__, `xlrd/xlwt <http://www.python-excel.org/>`__
100100
* openpyxl version 1.6.1 or higher
101101
* Needed for Excel I/O
102+
* `lxml <http://lxml.de>`__, or `Beautiful Soup 4 <http://www.crummy.com/software/BeautifulSoup>`__: for reading HTML tables
103+
* The differences between lxml and Beautiful Soup 4 are mostly speed (lxml
104+
is faster), however sometimes Beautiful Soup returns what you might
105+
intuitively expect. Both backends are implemented, so try them both to
106+
see which one you like. They should return very similar results.
107+
* Note that lxml requires Cython to build successfully
102108

103109
.. note::
104110

pandas/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@
3333
read_fwf, to_clipboard, ExcelFile,
3434
ExcelWriter)
3535
from pandas.io.pytables import HDFStore, Term, get_store, read_hdf
36+
from pandas.io.html import read_html
3637
from pandas.util.testing import debug
3738

3839
from pandas.tools.describe import value_range

0 commit comments

Comments
 (0)