Skip to content

Commit bec5272

Browse files
kotrfajreback
kotrfa
authored andcommitted
BUG: fix for read_html with bs4 failing on table with header and one column
closes #12975 closes #9178
1 parent 856c3bd commit bec5272

File tree

3 files changed

+35
-6
lines changed

3 files changed

+35
-6
lines changed

doc/source/whatsnew/v0.18.1.txt

+6-4
Original file line numberDiff line numberDiff line change
@@ -417,6 +417,11 @@ Bug Fixes
417417
- Bug in ``concat`` doesn't handle empty ``Series`` properly (:issue:`11082`)
418418

419419

420+
421+
- Bug in ``fill_value`` is ignored if the argument to a binary operator is a constant (:issue `12723`)
422+
423+
- Bug in ``pd.read_html`` when using bs4 flavor and parsing table with a header and only one column (:issue `9178`)
424+
420425
- Bug in ``pivot_table`` when ``margins=True`` and ``dropna=True`` where nulls still contributed to margin count (:issue:`12577`)
421426
- Bug in ``pivot_table`` when ``dropna=False`` where table index/column names disappear (:issue:`12133`)
422427
- Bug in ``crosstab`` when ``margins=True`` and ``dropna=False`` which raised (:issue:`12642`)
@@ -425,7 +430,4 @@ Bug Fixes
425430
- Bug in ``.describe()`` resets categorical columns information (:issue:`11558`)
426431
- Bug where ``loffset`` argument was not applied when calling ``resample().count()`` on a timeseries (:issue:`12725`)
427432
- ``pd.read_excel()`` now accepts path objects (e.g. ``pathlib.Path``, ``py.path.local``) for the file path, in line with other ``read_*`` functions (:issue:`12655`)
428-
- ``pd.read_excel()`` now accepts column names associated with keyword argument ``names``(:issue:`12870`)
429-
430-
431-
- Bug in ``fill_value`` is ignored if the argument to a binary operator is a constant (:issue:`12723`)
433+
- ``pd.read_excel()`` now accepts column names associated with keyword argument ``names``(:issue `12870`)

pandas/io/html.py

+4-2
Original file line numberDiff line numberDiff line change
@@ -356,14 +356,16 @@ def _parse_raw_thead(self, table):
356356
res = []
357357
if thead:
358358
res = lmap(self._text_getter, self._parse_th(thead[0]))
359-
return np.array(res).squeeze() if res and len(res) == 1 else res
359+
return np.atleast_1d(
360+
np.array(res).squeeze()) if res and len(res) == 1 else res
360361

361362
def _parse_raw_tfoot(self, table):
362363
tfoot = self._parse_tfoot(table)
363364
res = []
364365
if tfoot:
365366
res = lmap(self._text_getter, self._parse_td(tfoot[0]))
366-
return np.array(res).squeeze() if res and len(res) == 1 else res
367+
return np.atleast_1d(
368+
np.array(res).squeeze()) if res and len(res) == 1 else res
367369

368370
def _parse_raw_tbody(self, table):
369371
tbody = self._parse_tbody(table)

pandas/io/tests/test_html.py

+25
Original file line numberDiff line numberDiff line change
@@ -416,6 +416,31 @@ def test_empty_tables(self):
416416
res2 = self.read_html(StringIO(data2))
417417
assert_framelist_equal(res1, res2)
418418

419+
def test_header_and_one_column(self):
420+
"""
421+
Don't fail with bs4 when there is a header and only one column
422+
as described in issue #9178
423+
"""
424+
data = StringIO('''<html>
425+
<body>
426+
<table>
427+
<thead>
428+
<tr>
429+
<th>Header</th>
430+
</tr>
431+
</thead>
432+
<tbody>
433+
<tr>
434+
<td>first</td>
435+
</tr>
436+
</tbody>
437+
</table>
438+
</body>
439+
</html>''')
440+
expected = DataFrame(data={'Header': 'first'}, index=[0])
441+
result = self.read_html(data)[0]
442+
tm.assert_frame_equal(result, expected)
443+
419444
def test_tfoot_read(self):
420445
"""
421446
Make sure that read_html reads tfoot, containing td or th.

0 commit comments

Comments
 (0)