fix for read_html with bs4 failing on table with header and one column #12975

hnykda · 2016-04-24T20:17:15Z

closes #9178
The test is added and passing (while failing before the fix).
passes git diff upstream/master | flake8 --diff
whatsnew entry

Fix as had been proposed in PR 9194, but this PR was closed because of tests missing. They are added now.

jreback · 2016-04-25T12:53:05Z

git diff master | flake8 --diff

hnykda · 2016-04-25T13:54:11Z

Should be OK now.

jreback · 2016-04-25T13:58:21Z

pandas/io/tests/test_html.py

+        """
+        Don't fail with bs4 when there is a header and only one column
+        """
+        data = StringIO('''<html>


add the issue number as a comment

jreback · 2016-04-25T13:59:22Z

small comments. ping when green.

hnykda · 2016-04-25T14:17:22Z

Done.

(I wasn't sure if I can use (:issue 9178), so it's just as a regular comment)

jreback · 2016-04-25T14:25:18Z

pandas/io/tests/test_html.py

+        data = StringIO('''<html>
+            <body>
+             <table>
+                <thead>


this doesn't seem to replicate the error message though:

In [1]: data = StringIO('''<html> ...: <body> ...: <table> ...: <thead> ...: <tr> ...: <th>Header</th> ...: </tr> ...: </thead> ...: <tbody> ...: <tr> ...: <td>first</td> ...: </tr> ...: </tbody> ...: </table> ...: </body> ...: </html>''') In [2]: pd.read_html(data) Out[2]: [ Header 0 first] In [3]: pd.read_html(data)[0] Out[3]: Header 0 first In [4]: pd.__version__ Out[4]: u'0.18.0'

should't this raise a similar error?

You forgot to add flavor='bs4'. When I do:

In [2]: pandas.__version__ Out[2]: '0.18.0' In [3]: s = '''<html> <body> <table> <thead> <tr> <th>Header</th> </tr> </thead> <tbody> <tr> <td>first</td> </tr> </tbody> </table> </body> </html>''' In [4]: pandas.read_html(s, flavor="bs4") --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-4-5f8f3ea79c02> in <module>() ----> 1 pandas.read_html(s, flavor="bs4") /home/dan/.local/opt/miniconda3/envs/mathbs/lib/python3.5/site-packages/pandas/io/html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding) 868 _validate_header_arg(header) 869 return _parse(flavor, io, match, header, index_col, skiprows, --> 870 parse_dates, tupleize_cols, thousands, attrs, encoding) /home/dan/.local/opt/miniconda3/envs/mathbs/lib/python3.5/site-packages/pandas/io/html.py in _parse(flavor, io, match, header, index_col, skiprows, parse_dates, tupleize_cols, thousands, attrs, encoding) 741 parse_dates=parse_dates, 742 tupleize_cols=tupleize_cols, --> 743 thousands=thousands)) 744 except StopIteration: # empty table 745 continue /home/dan/.local/opt/miniconda3/envs/mathbs/lib/python3.5/site-packages/pandas/io/html.py in _data_to_frame(data, header, index_col, skiprows, parse_dates, tupleize_cols, thousands) 622 623 # fill out elements of body that are "ragged" --> 624 _expand_elements(body) 625 626 tp = TextParser(body, header=header, index_col=index_col, /home/dan/.local/opt/miniconda3/envs/mathbs/lib/python3.5/site-packages/pandas/io/html.py in _expand_elements(body) 599 600 def _expand_elements(body): --> 601 lens = Series(lmap(len, body)) 602 lens_max = lens.max() 603 not_max = lens[lens != lens_max] /home/dan/.local/opt/miniconda3/envs/mathbs/lib/python3.5/site-packages/pandas/compat/__init__.py in lmap(*args, **kwargs) 116 117 def lmap(*args, **kwargs): --> 118 return list(map(*args, **kwargs)) 119 120 def lfilter(*args, **kwargs): TypeError: len() of unsized object

while using patched version it works:

In [3]: import pandas In [4]: pandas.__version__ Out[4]: '0.18.0+145.g9b6f9f2' In [5]: s = '''<html> <body> <table> <thead> <tr> <th>Header</th> </tr> </thead> <tbody> <tr> <td>first</td> </tr> </tbody> </table> </body> </html>''' In [6]: pandas.read_html(s, flavor="bs4") Out[6]: [ Header 0 first]

jreback · 2016-04-25T14:48:05Z

@hnykda ahh I see. we are testing with multiple flavors. I think we default to lxml which I have installed so it works now. ok. then. ping on green.

hnykda · 2016-04-25T20:08:36Z

Exactly.

Everything is green.

jreback · 2016-04-25T21:57:21Z

thanks @hnykda

fix for read_html with bs4 failing on table with header and one column

d38b11b

hnykda mentioned this pull request Apr 24, 2016

BUG: read_html with a single column table #9178 #9194

Closed

jreback added Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap labels Apr 25, 2016

whats new entry, flake8 compliance

5bb6635

jreback reviewed Apr 25, 2016
View reviewed changes

jreback added this to the 0.18.1 milestone Apr 25, 2016

fixes according to comments

9b6f9f2

jreback reviewed Apr 25, 2016
View reviewed changes

jreback closed this in bec5272 Apr 25, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix for read_html with bs4 failing on table with header and one column #12975

fix for read_html with bs4 failing on table with header and one column #12975

hnykda commented Apr 24, 2016 •

edited

Loading

jreback commented Apr 25, 2016

hnykda commented Apr 25, 2016

jreback Apr 25, 2016

jreback commented Apr 25, 2016

hnykda commented Apr 25, 2016

jreback Apr 25, 2016

jreback Apr 25, 2016

hnykda Apr 25, 2016 •

edited

Loading

jreback commented Apr 25, 2016

hnykda commented Apr 25, 2016 •

edited

Loading

jreback commented Apr 25, 2016

fix for read_html with bs4 failing on table with header and one column #12975

fix for read_html with bs4 failing on table with header and one column #12975

Conversation

hnykda commented Apr 24, 2016 • edited Loading

jreback commented Apr 25, 2016

hnykda commented Apr 25, 2016

jreback Apr 25, 2016

Choose a reason for hiding this comment

jreback commented Apr 25, 2016

hnykda commented Apr 25, 2016

jreback Apr 25, 2016

Choose a reason for hiding this comment

jreback Apr 25, 2016

Choose a reason for hiding this comment

hnykda Apr 25, 2016 • edited Loading

Choose a reason for hiding this comment

jreback commented Apr 25, 2016

hnykda commented Apr 25, 2016 • edited Loading

jreback commented Apr 25, 2016

hnykda commented Apr 24, 2016 •

edited

Loading

hnykda Apr 25, 2016 •

edited

Loading

hnykda commented Apr 25, 2016 •

edited

Loading