-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: output formatting with to_html(), index=False and/or index_names=False (#22579, #22747) #22655
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 100 commits
6adc266
a30da56
b444fa2
c61ea4a
863b6d6
b44d4ff
d5c37e3
6b441df
dcf74a5
dd07605
4605a4e
dd825f3
b7fe95c
dec609d
5b9bc6e
a725108
d7e8237
47fd132
4e48a32
30ac94e
5f6a8d1
1b4c5dc
82d57eb
cfa7570
9f884fb
25e8103
d7ec7a3
17f66ac
92d0c9a
e6da59b
96b181e
e737cd6
9dbb6ed
130873f
fc8851d
2ace532
49177d7
5cbb8c5
1db45f2
cf883d9
3842c52
8722603
d7c2e20
2aed4ec
f8191f0
c21f7cd
3263cc1
528fcbc
fae4070
c3ab8d2
0d9330e
ebf711b
b798429
23e3204
9fd31b3
22396e2
aee9b7e
aae1632
3989085
21a70ed
85490a9
2452aff
9a0512c
f2b5f75
f94e336
db32821
2546d82
29867d4
483ace8
96b5442
00d452c
3d1d5c4
5dcd146
5327098
0961db7
9b8e203
677472a
5ec0c57
72d485d
dd84b8d
57c035f
9b7eaae
7383c8a
d89a180
24ba91c
a4feea8
b4a2b10
e00cbaf
8d8b138
767541b
b8f1e6a
9eb162f
48d118b
28fab9d
d60fa0b
d3986f0
1febf76
6d60064
1f61968
bd815e7
7da52a1
0a0f82f
bc7f8c7
8d2d68a
d2e233e
bdaa279
b7e4f54
613ce00
5b635e4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -42,6 +42,15 @@ def __init__(self, formatter, classes=None, notebook=False, border=None, | |
self.border = border | ||
self.table_id = table_id | ||
self.render_links = render_links | ||
# see gh-22579 | ||
# Column misalignment also occurs for | ||
# a standard index when the columns index is named. | ||
# Determine if ANY column names need to be displayed | ||
# since if the row index is not displayed a column of | ||
# blank cells need to be included before the DataFrame values. | ||
self.show_col_idx_names = all((self.fmt.has_column_names, | ||
self.fmt.show_index_names, | ||
self.fmt.header)) | ||
|
||
@property | ||
def is_truncated(self): | ||
|
@@ -201,7 +210,22 @@ def write_result(self, buf): | |
|
||
def _write_header(self, indent): | ||
truncate_h = self.fmt.truncate_h | ||
row_levels = self.frame.index.nlevels | ||
# see gh-22579 | ||
# Column Offset Bug with to_html(index=False) with | ||
# MultiIndex Columns and Index. | ||
# Column misalignment also occurs for | ||
# a standard index when the columns index is named. | ||
# If the row index is not displayed a column of | ||
# blank cells need to be included before the DataFrame values. | ||
# However, in this code block only the placement of the truncation | ||
# indicators within the header is affected by this. | ||
# TODO: refactor to class property as row_levels also used in | ||
# _write_regular_rows and _write_hierarchical_rows | ||
if self.fmt.index: | ||
row_levels = self.frame.index.nlevels | ||
else: | ||
row_levels = 1 if self.show_col_idx_names else 0 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What gets shown when this evaluates to True? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. row_levels is used to determine the number of header tags, number of blank cells before the column header etc. it is the column number(of the html table) where the data starts. if showing the (row) index (if self.fmt.index) the number of cells in the html table before the data is the number of levels in the row index (row_levels = self.frame.index.nlevels). if not showing the (row) index, the number of cells in the html table before the data is 1 (blank) if the column names are displayed and none if the column names are not displayed or the column index is not named (self.show_col_idx_names)
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
row_levels=1 in Changing the 1 to a 0 does not fail any tests so i've added an additional test to gh-22783 to ensure we get a test failure if |
||
|
||
if not self.fmt.header: | ||
# write nothing | ||
return indent | ||
|
@@ -267,12 +291,26 @@ def _write_header(self, indent): | |
values = (values[:ins_col] + [u('...')] + | ||
values[ins_col:]) | ||
|
||
name = self.columns.names[lnum] | ||
row = [''] * (row_levels - 1) + ['' if name is None else | ||
pprint_thing(name)] | ||
|
||
if row == [""] and self.fmt.index is False: | ||
row = [] | ||
# see gh-22579 | ||
# Column Offset Bug with to_html(index=False) with | ||
# MultiIndex Columns and Index. | ||
# Initially fill row with blank cells before column names. | ||
# TODO: Refactor to remove code duplication with code | ||
# block below for standard columns index. | ||
row = [''] * (row_levels - 1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. might be worth it to make this a function / method as it appears to repeat below. can you add a comment on what this is doing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the fixes for both the issues with the standard index and and issues with the multiindex have been purposefully made to use similar code to make the subsequent refactor simpler. imo the refactor should create a However, i've not yet done this refactor because I want to get the bugs fixed first and the parametric tests in place for added assurance. I really wanted to limit the scope of this PR to the bug fixes since there is also a couple of issues with the header=False and sparsify=False which really need sorting before refactoring. I will happily add a function but it would only be temporary. please let me know your thoughts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ok that’s fine for then |
||
if self.fmt.index or self.show_col_idx_names: | ||
# see gh-22747 | ||
# If to_html(index_names=False) do not show columns | ||
# index names. | ||
# TODO: Refactor to use _get_column_name_list from | ||
# DataFrameFormatter class and create a | ||
# _get_formatted_column_labels function for code | ||
# parity with DataFrameFormatter class. | ||
if self.fmt.show_index_names: | ||
name = self.columns.names[lnum] | ||
row.append(pprint_thing(name or '')) | ||
else: | ||
row.append('') | ||
|
||
tags = {} | ||
j = len(row) | ||
|
@@ -287,17 +325,27 @@ def _write_header(self, indent): | |
self.write_tr(row, indent, self.indent_delta, tags=tags, | ||
header=True) | ||
else: | ||
if self.fmt.index: | ||
row = [''] * (self.frame.index.nlevels - 1) | ||
row.append(self.columns.name or '') | ||
else: | ||
row = [] | ||
# see gh-22579 | ||
# Column misalignment also occurs for | ||
# a standard index when the columns index is named. | ||
# Initially fill row with blank cells before column names. | ||
# TODO: Refactor to remove code duplication with code block | ||
# above for columns MultiIndex. | ||
row = [''] * (row_levels - 1) | ||
if self.fmt.index or self.show_col_idx_names: | ||
# see gh-22747 | ||
# If to_html(index_names=False) do not show columns | ||
# index names. | ||
# TODO: Refactor to use _get_column_name_list from | ||
# DataFrameFormatter class. | ||
if self.fmt.show_index_names: | ||
row.append(self.columns.name or '') | ||
else: | ||
row.append('') | ||
row.extend(self.columns) | ||
align = self.fmt.justify | ||
|
||
if truncate_h: | ||
if not self.fmt.index: | ||
row_levels = 0 | ||
ins_col = row_levels + self.fmt.tr_col_num | ||
row.insert(ins_col, '...') | ||
|
||
|
@@ -348,7 +396,14 @@ def _write_regular_rows(self, fmt_values, indent): | |
index_values = self.fmt.tr_frame.index.format() | ||
row_levels = 1 | ||
else: | ||
row_levels = 0 | ||
# see gh-22579 | ||
# Column misalignment also occurs for | ||
# a standard index when the columns index is named. | ||
# row_levels is used for the number of <th> cells and | ||
# the placement of the truncation indicators. | ||
# TODO: refactor to class property as row_levels also used in | ||
# _write_header and _write_hierarchical_rows | ||
row_levels = 1 if self.show_col_idx_names else 0 | ||
|
||
row = [] | ||
for i in range(nrows): | ||
|
@@ -361,6 +416,12 @@ def _write_regular_rows(self, fmt_values, indent): | |
row = [] | ||
if self.fmt.index: | ||
row.append(index_values[i]) | ||
# see gh-22579 | ||
# Column misalignment also occurs for | ||
# a standard index when the columns index is named. | ||
# Add blank cell before data cells. | ||
elif self.show_col_idx_names: | ||
row.append('') | ||
row.extend(fmt_values[j][i] for j in range(self.ncols)) | ||
|
||
if truncate_h: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
<table border="1" class="dataframe"> | ||
<thead> | ||
<tr> | ||
<th colspan="2" halign="left">a</th> | ||
<th colspan="2" halign="left">b</th> | ||
</tr> | ||
<tr> | ||
<th>c</th> | ||
<th>d</th> | ||
<th>c</th> | ||
<th>d</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td>0</td> | ||
<td>10</td> | ||
<td>10</td> | ||
<td>10</td> | ||
</tr> | ||
<tr> | ||
<td>1</td> | ||
<td>11</td> | ||
<td>11</td> | ||
<td>11</td> | ||
</tr> | ||
<tr> | ||
<td>2</td> | ||
<td>12</td> | ||
<td>12</td> | ||
<td>12</td> | ||
</tr> | ||
<tr> | ||
<td>3</td> | ||
<td>13</td> | ||
<td>13</td> | ||
<td>13</td> | ||
</tr> | ||
<tr> | ||
<td>4</td> | ||
<td>14</td> | ||
<td>14</td> | ||
<td>14</td> | ||
</tr> | ||
<tr> | ||
<td>5</td> | ||
<td>15</td> | ||
<td>15</td> | ||
<td>15</td> | ||
</tr> | ||
<tr> | ||
<td>6</td> | ||
<td>16</td> | ||
<td>16</td> | ||
<td>16</td> | ||
</tr> | ||
<tr> | ||
<td>7</td> | ||
<td>17</td> | ||
<td>17</td> | ||
<td>17</td> | ||
</tr> | ||
<tr> | ||
<td>8</td> | ||
<td>18</td> | ||
<td>18</td> | ||
<td>18</td> | ||
</tr> | ||
<tr> | ||
<td>9</td> | ||
<td>19</td> | ||
<td>19</td> | ||
<td>19</td> | ||
</tr> | ||
</tbody> | ||
</table> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
<table border="1" class="dataframe"> | ||
<thead> | ||
<tr> | ||
<th>foo</th> | ||
<th colspan="2" halign="left">a</th> | ||
<th>...</th> | ||
<th colspan="2" halign="left">b</th> | ||
</tr> | ||
<tr> | ||
<th></th> | ||
<th colspan="2" halign="left">c</th> | ||
<th>...</th> | ||
<th colspan="2" halign="left">d</th> | ||
</tr> | ||
<tr> | ||
<th>baz</th> | ||
<th>e</th> | ||
<th>f</th> | ||
<th>...</th> | ||
<th>e</th> | ||
<th>f</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<th></th> | ||
<td>0</td> | ||
<td>1</td> | ||
<td>...</td> | ||
<td>6</td> | ||
<td>7</td> | ||
</tr> | ||
<tr> | ||
<th></th> | ||
<td>8</td> | ||
<td>9</td> | ||
<td>...</td> | ||
<td>14</td> | ||
<td>15</td> | ||
</tr> | ||
<tr> | ||
<th>...</th> | ||
<td>...</td> | ||
<td>...</td> | ||
<td>...</td> | ||
<td>...</td> | ||
<td>...</td> | ||
</tr> | ||
<tr> | ||
<th></th> | ||
<td>48</td> | ||
<td>49</td> | ||
<td>...</td> | ||
<td>54</td> | ||
<td>55</td> | ||
</tr> | ||
<tr> | ||
<th></th> | ||
<td>56</td> | ||
<td>57</td> | ||
<td>...</td> | ||
<td>62</td> | ||
<td>63</td> | ||
</tr> | ||
</tbody> | ||
</table> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
<table border="1" class="dataframe"> | ||
<thead> | ||
<tr> | ||
<th colspan="2" halign="left">a</th> | ||
<th>...</th> | ||
<th colspan="2" halign="left">b</th> | ||
</tr> | ||
<tr> | ||
<th colspan="2" halign="left">c</th> | ||
<th>...</th> | ||
<th colspan="2" halign="left">d</th> | ||
</tr> | ||
<tr> | ||
<th>e</th> | ||
<th>f</th> | ||
<th>...</th> | ||
<th>e</th> | ||
<th>f</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td>0</td> | ||
<td>1</td> | ||
<td>...</td> | ||
<td>6</td> | ||
<td>7</td> | ||
</tr> | ||
<tr> | ||
<td>8</td> | ||
<td>9</td> | ||
<td>...</td> | ||
<td>14</td> | ||
<td>15</td> | ||
</tr> | ||
<tr> | ||
<td>...</td> | ||
<td>...</td> | ||
<td>...</td> | ||
<td>...</td> | ||
<td>...</td> | ||
</tr> | ||
<tr> | ||
<td>48</td> | ||
<td>49</td> | ||
<td>...</td> | ||
<td>54</td> | ||
<td>55</td> | ||
</tr> | ||
<tr> | ||
<td>56</td> | ||
<td>57</td> | ||
<td>...</td> | ||
<td>62</td> | ||
<td>63</td> | ||
</tr> | ||
</tbody> | ||
</table> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for questions but still trying to wrap my head around implementation. Based off of the comment, why is this
all
here and notany
? Wouldn't any of these require there to be a cell where a column index name would be placed?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
index=False
with a single level row index and multi-level columns index with named columns but not all named...Note: missing truncation indicators in data now fixed in master.
the misalignment of the column names is due to the logic being applied within the level generating loop..
pandas/pandas/io/formats/html.py
Lines 270 to 275 in d43ac97
hence class-level variable needed to check if ANY names need to be displayed to determine alignment.
ALL condition is to determine in ANY names should be displayed given the
to_html
parameters and uses similar logic asto_string
etc.pandas/pandas/io/formats/format.py
Lines 796 to 803 in d43ac97
and the rows in
to_html
..pandas/pandas/io/formats/html.py
Lines 307 to 309 in d43ac97
There is currently no test to explicitly cover this example. so i think the best way forward is to fully parameterize the truncation tests in line with the parametrized basic_alignment tests for added assurance.
i'll make
show_col_idx_names
a class property for clarity and add a note to refactor and 'inherit' fromDataFrameFormatter
class. inherit quoted sinceHTMLFormatter
class is not directly inherited fromDataFrameFormatter
. in the first refactor just use mock inheritence like..pandas/pandas/io/formats/html.py
Lines 46 to 48 in d43ac97