Skip to content

Commit fb6b803

Browse files
committed
Merge pull request pandas-dev#6447 from cpcloud/read-html-float-iterable-5129
BUG/TST: read_html should follow pandas conventions when creating empty data
2 parents d0e2a9f + 1d573b4 commit fb6b803

File tree

4 files changed

+651
-8
lines changed

4 files changed

+651
-8
lines changed

doc/source/release.rst

+5
Original file line numberDiff line numberDiff line change
@@ -184,6 +184,11 @@ Bug Fixes
184184
- Bug in ``sum`` of a ``timedelta64[ns]`` series (:issue:`6462`)
185185
- Bug in ``resample`` with a timezone and certain offsets (:issue:`6397`)
186186
- Bug in ``iat/iloc`` with duplicate indices on a Series (:issue:`6493`)
187+
- Bug in ``read_html`` where nan's were incorrectly being used to indicate
188+
missing values in text. Should use the empty string for consistency with the
189+
rest of pandas (:issue:`5129`).
190+
- Bug in ``read_html`` tests where redirected invalid URLs would make one test
191+
fail (:issue:`6445`).
187192

188193
pandas 0.13.1
189194
-------------

pandas/io/html.py

+6-5
Original file line numberDiff line numberDiff line change
@@ -579,8 +579,9 @@ def _expand_elements(body):
579579
lens_max = lens.max()
580580
not_max = lens[lens != lens_max]
581581

582+
empty = ['']
582583
for ind, length in iteritems(not_max):
583-
body[ind] += [np.nan] * (lens_max - length)
584+
body[ind] += empty * (lens_max - length)
584585

585586

586587
def _data_to_frame(data, header, index_col, skiprows, infer_types,
@@ -760,15 +761,15 @@ def read_html(io, match='.+', flavor=None, header=None, index_col=None,
760761
the table in the HTML. These are not checked for validity before being
761762
passed to lxml or Beautiful Soup. However, these attributes must be
762763
valid HTML table attributes to work correctly. For example, ::
763-
764+
764765
attrs = {'id': 'table'}
765-
766+
766767
is a valid attribute dictionary because the 'id' HTML tag attribute is
767768
a valid HTML attribute for *any* HTML tag as per `this document
768769
<http://www.w3.org/TR/html-markup/global-attributes.html>`__. ::
769-
770+
770771
attrs = {'asdf': 'table'}
771-
772+
772773
is *not* a valid attribute dictionary because 'asdf' is not a valid
773774
HTML attribute even if it is a valid XML attribute. Valid HTML 4.01
774775
table attributes can be found `here

0 commit comments

Comments
 (0)