You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is essentially a rebased and squashed pandas-dev#17054 (mad props to @jowens
for doing all the hard thinking). My tweaks:
* test_computer_sales_page (see pandas-dev#17074) no longer tests for ParserError,
because the ParserError was a bug caused by missing colspan support.
Now, test that MultiIndex works as expected.
* I respectfully removed the fill_rowspan argument from pandas-dev#17073. Instead,
the virtual cells created by rowspan/colspan are always copies of the
real cells' text. This prevents _infer_columns() from naming virtual
cells as "Unnamed: ..."
* I removed a small layer of abstraction to respect pandas-dev#20891 (multiple
<tbody> support), which was implemented after @jowens' pull request.
Now _HtmlFrameParser has _parse_thead_trs, _parse_tbody_trs and
_parse_tfoot_trs, each returning a list of <tr>s. That let me remove
_parse_tr, Making All The Tests Pass.
* That caused a snowball effect. lxml does not fix malformed <thead>, as
tested by spam.html. The previous hacky workaround was in
_parse_raw_thead, but the new _parse_thead_trs signature returns nodes
instead of text. The new hacky solution: return the <thead> itself,
pretending it's a <tr>. This works in all the tests. A better solution
is to use html5lib with lxml; but that might belong in a separate pull
request.
Copy file name to clipboardExpand all lines: doc/source/whatsnew/v0.24.0.txt
+1
Original file line number
Diff line number
Diff line change
@@ -16,6 +16,7 @@ Other Enhancements
16
16
- :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether NaN/NaT values should be considered (:issue:`17534`)
17
17
- :func:`to_csv` now supports ``compression`` keyword when a file handle is passed. (:issue:`21227`)
18
18
- :meth:`Index.droplevel` is now implemented also for flat indexes, for compatibility with MultiIndex (:issue:`21115`)
19
+
- :func:`read_html` handles colspan and rowspan arguments and attempts to infer a header if the header is not explicitly specified (:issue:`17054`)
0 commit comments