You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ENH: pd.read_html argument to extract hrefs along with text from cells (#45973)
* ENH: pd.read_html argument to extract hrefs along with text from cells
* Fix typing error
* Simplify tests
* Fix still incorrect typing
* Summarise whatsnew entry and move detailed explanation into user guide
* More flexible link extraction
* Suggested changes
* extract_hrefs -> extract_links
* Move versionadded to correct place and improve docstring for extract_links (@attack68)
* Test for invalid extract_links value
* Test all extract_link options
* Fix for MultiIndex headers (also fixes tests)
* Test that text surrounding <a> tag is still captured
* Test for multiple <a> tags in cell
* Fix all tests, with both MultiIndex -> Index and np.nan -> None conversions resolved
* Add back EOF newline to test_html.py
* Correct user guide example
* Update pandas/io/html.py
* Update pandas/io/html.py
* Update pandas/io/html.py
* Simplify MultiIndex -> Index conversion
* Move unnecessary fixtures into test body
* Simplify statement
* Fix code checks
Co-authored-by: JHM Darbyshire <[email protected]>
Copy file name to clipboardExpand all lines: doc/source/whatsnew/v1.5.0.rst
+1
Original file line number
Diff line number
Diff line change
@@ -289,6 +289,7 @@ Other enhancements
289
289
- Added ``check_like`` argument to :func:`testing.assert_series_equal` (:issue:`47247`)
290
290
- Add support for :meth:`GroupBy.ohlc` for extension array dtypes (:issue:`37493`)
291
291
- Allow reading compressed SAS files with :func:`read_sas` (e.g., ``.sas7bdat.gz`` files)
292
+
- :func:`pandas.read_html` now supports extracting links from table cells (:issue:`13141`)
292
293
- :meth:`DatetimeIndex.astype` now supports casting timezone-naive indexes to ``datetime64[s]``, ``datetime64[ms]``, and ``datetime64[us]``, and timezone-aware indexes to the corresponding ``datetime64[unit, tzname]`` dtypes (:issue:`47579`)
293
294
- :class:`Series` reducers (e.g. ``min``, ``max``, ``sum``, ``mean``) will now successfully operate when the dtype is numeric and ``numeric_only=True`` is provided; previously this would raise a ``NotImplementedError`` (:issue:`47500`)
294
295
- :meth:`RangeIndex.union` now can return a :class:`RangeIndex` instead of a :class:`Int64Index` if the resulting values are equally spaced (:issue:`47557`, :issue:`43885`)
0 commit comments