Skip to content

Read html tables into DataFrames #3477

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
1 commit merged into from
May 3, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ci/install.sh
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,8 @@ if ( ! $VENV_FILE_AVAILABLE ); then
pip install $PIP_ARGS xlrd>=0.9.0
pip install $PIP_ARGS 'http://downloads.sourceforge.net/project/pytseries/scikits.timeseries/0.91.3/scikits.timeseries-0.91.3.tar.gz?r='
pip install $PIP_ARGS patsy
pip install $PIP_ARGS lxml
pip install $PIP_ARGS beautifulsoup4

# fool statsmodels into thinking pandas was already installed
# so it won't refuse to install itself. We want it in the zipped venv
Expand Down
7 changes: 7 additions & 0 deletions doc/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,13 @@ File IO
read_csv
ExcelFile.parse

.. currentmodule:: pandas.io.html

.. autosummary::
:toctree: generated/

read_html

HDFStore: PyTables (HDF5)
~~~~~~~~~~~~~~~~~~~~~~~~~
.. currentmodule:: pandas.io.pytables
Expand Down
6 changes: 6 additions & 0 deletions doc/source/install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,12 @@ Optional Dependencies
* `openpyxl <http://packages.python.org/openpyxl/>`__, `xlrd/xlwt <http://www.python-excel.org/>`__
* openpyxl version 1.6.1 or higher
* Needed for Excel I/O
* `lxml <http://lxml.de>`__, or `Beautiful Soup 4 <http://www.crummy.com/software/BeautifulSoup>`__: for reading HTML tables
* The differences between lxml and Beautiful Soup 4 are mostly speed (lxml
is faster), however sometimes Beautiful Soup returns what you might
intuitively expect. Both backends are implemented, so try them both to
see which one you like. They should return very similar results.
* Note that lxml requires Cython to build successfully

.. note::

Expand Down
1 change: 1 addition & 0 deletions pandas/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@
read_fwf, to_clipboard, ExcelFile,
ExcelWriter)
from pandas.io.pytables import HDFStore, Term, get_store, read_hdf
from pandas.io.html import read_html
from pandas.util.testing import debug

from pandas.tools.describe import value_range
Expand Down
Loading