Skip to content

Commit 90fa87e

Browse files
committed
Merge pull request #7851 from cpcloud/read-html-date-parsing
BUG: fix greedy date parsing in read_html
2 parents bae392d + be323ae commit 90fa87e

File tree

4 files changed

+1797
-41
lines changed

4 files changed

+1797
-41
lines changed

doc/source/v0.15.0.txt

+5
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,9 @@ API changes
106106

107107
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
108108

109+
- The ``infer_types`` argument to :func:`~pandas.io.html.read_html` now has no
110+
effect (:issue:`7762`, :issue:`7032`).
111+
109112

110113
.. _whatsnew_0150.cat:
111114

@@ -320,6 +323,8 @@ Bug Fixes
320323

321324

322325

326+
- Bug in ``read_html`` where the ``infer_types`` argument forced coercion of
327+
date-likes incorrectly (:issue:`7762`, :issue:`7032`).
323328

324329

325330

pandas/io/html.py

+3-11
Original file line numberDiff line numberDiff line change
@@ -607,11 +607,6 @@ def _data_to_frame(data, header, index_col, skiprows, infer_types,
607607
parse_dates=parse_dates, tupleize_cols=tupleize_cols,
608608
thousands=thousands)
609609
df = tp.read()
610-
611-
if infer_types: # TODO: rm this code so infer_types has no effect in 0.14
612-
df = df.convert_objects(convert_dates='coerce')
613-
else:
614-
df = df.applymap(text_type)
615610
return df
616611

617612

@@ -757,9 +752,8 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None,
757752
that sequence. Note that a single element sequence means 'skip the nth
758753
row' whereas an integer means 'skip n rows'.
759754
760-
infer_types : bool, optional
761-
This option is deprecated in 0.13, an will have no effect in 0.14. It
762-
defaults to ``True``.
755+
infer_types : None, optional
756+
This has no effect since 0.15.0. It is here for backwards compatibility.
763757
764758
attrs : dict or None, optional
765759
This is a dictionary of attributes that you can pass to use to identify
@@ -838,9 +832,7 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None,
838832
pandas.io.parsers.read_csv
839833
"""
840834
if infer_types is not None:
841-
warnings.warn("infer_types will have no effect in 0.14", FutureWarning)
842-
else:
843-
infer_types = True # TODO: remove effect of this in 0.14
835+
warnings.warn("infer_types has no effect since 0.15", FutureWarning)
844836

845837
# Type check here. We don't want to parse only to fail because of an
846838
# invalid value of an integer skiprows.

0 commit comments

Comments
 (0)