Skip to content

BUG: to_html misses truncation indicators (...) when index=False #22786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 15, 2018
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1288,6 +1288,7 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
- :func:`read_sas()` will correctly parse sas7bdat files with many columns (:issue:`22628`)
- :func:`read_sas()` will correctly parse sas7bdat files with data page types having also bit 7 set (so page type is 128 + 256 = 384) (:issue:`16615`)
- Bug in :meth:`detect_client_encoding` where potential ``IOError`` goes unhandled when importing in a mod_wsgi process due to restricted access to stdout. (:issue:`21552`)
- Bug in :func:`to_html()` with ``index=False`` misses truncation indicators (...) on truncated DataFrame (:issue:`15019`, :issue:`22783`)
- Bug in :func:`to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
- Bug in :func:`DataFrame.to_csv` where a single level MultiIndex incorrectly wrote a tuple. Now just the value of the index is written (:issue:`19589`).
- Bug in :meth:`HDFStore.append` when appending a :class:`DataFrame` with an empty string column and ``min_itemsize`` < 8 (:issue:`12242`)
Expand Down
19 changes: 17 additions & 2 deletions pandas/io/formats/html.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,8 @@ def _column_header():
align = self.fmt.justify

if truncate_h:
if self.fmt.index is False:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing is False can you leverage the implicit truthiness here and do if not self.fmt.index? While not documented I believe index=0 and index=None are acceptable in this and other parsers, so would be ideal to handle those consistently with False

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this

row_levels = 0
ins_col = row_levels + self.fmt.tr_col_num
col_row.insert(ins_col, '...')

Expand Down Expand Up @@ -342,8 +344,21 @@ def _write_body(self, indent):
else:
self._write_regular_rows(fmt_values, indent)
else:
for i in range(min(len(self.frame), self.max_rows)):
row = [fmt_values[j][i] for j in range(len(self.columns))]
truncate_h = self.fmt.truncate_h
truncate_v = self.fmt.truncate_v
ncols = len(self.fmt.tr_frame.columns)
nrows = len(self.fmt.tr_frame)

row = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give some comments on what is happening here, would not object to a self.write_row?

Copy link
Member Author

@simonjayhawkins simonjayhawkins Sep 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
The added code is copy and pasted from self._write_regular_rows()

But for the index=False case, the (row) index value is not added to row. The assignment to dot_col_ix and the value of nindex_levels in the self.write_tr call is also changed to account for this.

Currently the truncation tests in tests\io\formats\test_to_html.py are being skipped. Without test coverage the values of dot_col_ix and nindex_levels in self._write_regular_rows() can take on any value and pass the tests.

To avoid potential regression on the notebook display codepath (index=True), i have duplicated the code rather than make any changes to self._write_regular_rows()

Tests could be added but I wanted to avoid going out of scope on this PR.

The existing code for the index=False case was not in a function unlike the code for the index=True cases. However it was only 3 lines.

So I would agree that a self.write_row function would now be a good idea if it was unlikely that a future refactoring was not to use self._write_regular_rows() instead.

How should I proceed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests could be added but I wanted to avoid going out of scope on this PR.

can you do a pre-cursor PR which locks down the tests first?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins i mean comments in the code

for i in range(nrows):
if truncate_v and i == (self.fmt.tr_row_num):
str_sep_row = ['...'] * len(row)
self.write_tr(str_sep_row, indent,
self.indent_delta, tags=None)
row = [fmt_values[j][i] for j in range(ncols)]
if truncate_h:
dot_col_ix = self.fmt.tr_col_num
row.insert(dot_col_ix, '...')
self.write_tr(row, indent, self.indent_delta, tags=None)

indent -= self.indent_delta
Expand Down
30 changes: 30 additions & 0 deletions pandas/tests/io/formats/data/gh15019_expected_output.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>0</th>
<th>1</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.764052</td>
<td>0.400157</td>
</tr>
<tr>
<td>0.978738</td>
<td>2.240893</td>
</tr>
<tr>
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>0.950088</td>
<td>-0.151357</td>
</tr>
<tr>
<td>-0.103219</td>
<td>0.410599</td>
</tr>
</tbody>
</table>
27 changes: 27 additions & 0 deletions pandas/tests/io/formats/data/gh22783_expected_output.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th>0</th>
<th>1</th>
<th>...</th>
<th>3</th>
<th>4</th>
</tr>
</thead>
<tbody>
<tr>
<td>1.764052</td>
<td>0.400157</td>
<td>...</td>
<td>2.240893</td>
<td>1.867558</td>
</tr>
<tr>
<td>-0.977278</td>
<td>0.950088</td>
<td>...</td>
<td>-0.103219</td>
<td>0.410599</td>
</tr>
</tbody>
</table>
38 changes: 38 additions & 0 deletions pandas/tests/io/formats/test_to_html.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,28 @@
pass


def expected_html(datapath, name):
"""
Read HTML file from formats data directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be nice to change some of the existing tests to use this function (future PR though)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR #23747 to change existing tests to use this function.

Parameters
----------
datapath : pytest fixture
The datapath fixture injected into a test by pytest.
name : str
The name of the HTML file without the suffix.

Returns
-------
str : contents of HTML file.
"""
filename = '.'.join([name, 'html'])
filepath = datapath('io', 'formats', 'data', filename)
with open(filepath) as f:
html = f.read()
return html.rstrip()


class TestToHTML(object):

def test_to_html_with_col_space(self):
Expand Down Expand Up @@ -1881,6 +1903,22 @@ def test_to_html_multiindex_max_cols(self):
</table>""")
assert result == expected

def test_to_html_truncation_index_false_max_rows(self, datapath):
# GH 15019
np.random.seed(seed=0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do really need to set the seed?

df = pd.DataFrame(np.random.randn(5, 2))
result = df.to_html(max_rows=4, index=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize over index=False and index=0

expected = expected_html(datapath, 'gh15019_expected_output')
assert result == expected

def test_to_html_truncation_index_false_max_cols(self, datapath):
# GH 22783
np.random.seed(seed=0)
df = pd.DataFrame(np.random.randn(2, 5))
result = df.to_html(max_cols=4, index=False)
expected = expected_html(datapath, 'gh22783_expected_output')
assert result == expected

def test_to_html_notebook_has_style(self):
df = pd.DataFrame({"A": [1, 2, 3]})
result = df.to_html(notebook=True)
Expand Down