You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ENH: read_html() handles tables with multiple header rows pandas-dev#13434
closespandas-dev#13434
Author: Brian <[email protected]>
Author: S. Brian Huey <[email protected]>
Closespandas-dev#15242 from brianhuey/thead-improvement and squashes the following commits:
fc1c80e [S. Brian Huey] Merge branch 'master' into thead-improvement
b54aa0c [Brian] removed duplicate test case
6ae2860 [Brian] updated docstring and io.rst
41fe8cd [Brian] review changes
873ea58 [Brian] switched from range to lrange
cd70225 [Brian] ENH:read_html() handles tables with multiple header rows pandas-dev#13434
Copy file name to clipboardExpand all lines: doc/source/whatsnew/v0.20.0.txt
+7-6
Original file line number
Diff line number
Diff line change
@@ -283,7 +283,7 @@ Other Enhancements
283
283
- ``DataFrame`` has gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`).
284
284
- ``DataFrame.groupby()`` has gained a ``.nunique()`` method to count the distinct values for all columns within each group (:issue:`14336`, :issue:`15197`).
285
285
286
-
- ``pd.read_excel`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
286
+
- ``pd.read_excel()`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
287
287
- Multiple offset aliases with decimal points are now supported (e.g. '0.5min' is parsed as '30s') (:issue:`8419`)
288
288
- ``.isnull()`` and ``.notnull()`` have been added to ``Index`` object to make them more consistent with the ``Series`` API (:issue:`15300`)
289
289
@@ -294,8 +294,8 @@ Other Enhancements
294
294
- ``pd.cut`` and ``pd.qcut`` now support datetime64 and timedelta64 dtypes (:issue:`14714`, :issue:`14798`)
295
295
- ``pd.qcut`` has gained the ``duplicates='raise'|'drop'`` option to control whether to raise on duplicated edges (:issue:`7751`)
296
296
- ``Series`` provides a ``to_excel`` method to output Excel files (:issue:`8825`)
297
-
- The ``usecols`` argument in ``pd.read_csv`` now accepts a callable function as a value (:issue:`14154`)
298
-
- The ``skiprows`` argument in ``pd.read_csv`` now accepts a callable function as a value (:issue:`10882`)
297
+
- The ``usecols`` argument in ``pd.read_csv()`` now accepts a callable function as a value (:issue:`14154`)
298
+
- The ``skiprows`` argument in ``pd.read_csv()`` now accepts a callable function as a value (:issue:`10882`)
299
299
- The ``nrows`` and ``chunksize`` arguments in ``pd.read_csv()`` are supported if both are passed (:issue:`6774`, :issue:`15755`)
300
300
- ``pd.DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
301
301
- ``pd.Series.interpolate`` now supports timedelta as an index type with ``method='time'`` (:issue:`6424`)
@@ -309,6 +309,7 @@ Other Enhancements
309
309
- ``pandas.tools.hashing`` has gained a ``hash_tuples`` routine, and ``hash_pandas_object`` has gained the ability to hash a ``MultiIndex`` (:issue:`15224`)
310
310
- ``Series/DataFrame.squeeze()`` have gained the ``axis`` parameter. (:issue:`15339`)
311
311
- ``DataFrame.to_excel()`` has a new ``freeze_panes`` parameter to turn on Freeze Panes when exporting to Excel (:issue:`15160`)
312
+
- ``pd.read_html()`` will parse multiple header rows, creating a multiindex header. (:issue:`13434`).
312
313
- HTML table output skips ``colspan`` or ``rowspan`` attribute if equal to 1. (:issue:`15403`)
313
314
314
315
- ``pd.TimedeltaIndex`` now has a custom datetick formatter specifically designed for nanosecond level precision (:issue:`8711`)
@@ -813,7 +814,7 @@ Other API Changes
813
814
^^^^^^^^^^^^^^^^^
814
815
815
816
- ``numexpr`` version is now required to be >= 2.4.6 and it will not be used at all if this requisite is not fulfilled (:issue:`15213`).
816
-
- ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv`` and will be removed in the future (:issue:`12665`)
817
+
- ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv()`` and will be removed in the future (:issue:`12665`)
817
818
- ``SparseArray.cumsum()`` and ``SparseSeries.cumsum()`` will now always return ``SparseArray`` and ``SparseSeries`` respectively (:issue:`12855`)
818
819
- ``DataFrame.applymap()`` with an empty ``DataFrame`` will return a copy of the empty ``DataFrame`` instead of a ``Series`` (:issue:`8222`)
819
820
- ``.loc`` has compat with ``.ix`` for accepting iterators, and NamedTuples (:issue:`15120`)
@@ -926,7 +927,7 @@ Bug Fixes
926
927
- Bug in ``pd.to_numeric()`` in which float and unsigned integer elements were being improperly casted (:issue:`14941`, :issue:`15005`)
927
928
- Cleaned up ``PeriodIndex`` constructor, including raising on floats more consistently (:issue:`13277`)
928
929
- Bug in ``pd.read_csv()`` in which the ``dialect`` parameter was not being verified before processing (:issue:`14898`)
929
-
- Bug in ``pd.read_fwf`` where the skiprows parameter was not being respected during column width inference (:issue:`11256`)
930
+
- Bug in ``pd.read_fwf()`` where the skiprows parameter was not being respected during column width inference (:issue:`11256`)
930
931
- Bug in ``pd.read_csv()`` in which missing data was being improperly handled with ``usecols`` (:issue:`6710`)
931
932
- Bug in ``pd.read_csv()`` in which a file containing a row with many columns followed by rows with fewer columns would cause a crash (:issue:`14125`)
932
933
- Added checks in ``pd.read_csv()`` ensuring that values for ``nrows`` and ``chunksize`` are valid (:issue:`15767`)
@@ -1054,4 +1055,4 @@ Bug Fixes
1054
1055
- Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`)
1055
1056
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
1056
1057
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
1057
-
- Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)
1058
+
- Bug in ``pd.read_msgpack()`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)
0 commit comments