Skip to content

BUG: Don't ignore na_rep in DataFrame.to_html #36690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Oct 24, 2020
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
e1614ae
BUG: Don't ignore na_rep in DataFrame.to_html
dsaxton Sep 27, 2020
f56d893
Back to list
dsaxton Sep 27, 2020
247e79b
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Sep 28, 2020
f33e4d0
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Sep 29, 2020
e83b7ed
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Sep 29, 2020
a240b49
Move test and cover
dsaxton Sep 29, 2020
7ee3ef2
Test for to_latex
dsaxton Sep 29, 2020
72a812a
More tests
dsaxton Sep 29, 2020
ca57e06
Maybe
dsaxton Sep 30, 2020
dd25388
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Sep 30, 2020
4a599b2
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Sep 30, 2020
dc6287a
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 2, 2020
faa8e2c
Note
dsaxton Oct 2, 2020
1374cdd
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 13, 2020
c81aa04
Refactor
dsaxton Oct 13, 2020
52f16fc
Move note
dsaxton Oct 13, 2020
8c46ab7
Nothing
dsaxton Oct 13, 2020
6cb161c
Fixup
dsaxton Oct 13, 2020
b166ffc
Remove
dsaxton Oct 13, 2020
199c560
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 14, 2020
5a054d7
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 14, 2020
4423dd7
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 17, 2020
0c49eb0
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 18, 2020
4b53c91
Doc
dsaxton Oct 18, 2020
87c172a
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 20, 2020
265e2a8
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 20, 2020
8f0ca15
Type
dsaxton Oct 20, 2020
7be7c38
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 21, 2020
5a50ad0
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 21, 2020
1af22a5
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 23, 2020
37cc78c
Merge remote-tracking branch 'upstream/master' into na_rep-with-float…
dsaxton Oct 23, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -337,7 +337,7 @@ Conversion
Strings
^^^^^^^
- Bug in :meth:`Series.to_string`, :meth:`DataFrame.to_string`, and :meth:`DataFrame.to_latex` adding a leading space when ``index=False`` (:issue:`24980`)
-
- Bug in :meth:`DataFrame.to_html`, :meth:`DataFrame.to_string`, and :meth:`DataFrame.to_latex` ignoring the ``na_rep`` argument when ``float_format`` was also specified (:issue:`9046`, :issue:`13828`)
-


Expand Down
9 changes: 8 additions & 1 deletion pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -1533,7 +1533,14 @@ def format_values_with(float_format):
def _format_strings(self) -> List[str]:
# shortcut
if self.formatter is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this entire section you added should be instead on L1440

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and you will want to do something like

mask = isna(values)	
values = np.array(values, dtype="object")
values[mask] = na_rep
imask = (~mask).ravel(
values.flat[imask] = np.array(	
                [formatter(val) for val in values.ravel()[imask]]
            )

you will want to factor that out into a function and use it above in 2 (or more places)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand, get_result_as_array never actually gets called here. Are you thinking the formatter should already be handling NaN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no my point is to share code; you are doing virtually the same thing, just in another way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see what you mean

return [self.formatter(x) for x in self.values]
if self.na_rep == "NaN":
return [self.formatter(x) for x in self.values]
else:
na_mask = isna(self.values)
return [
self.formatter(x) if not m else self.na_rep
for x, m in zip(self.values, na_mask)
]

return list(self.get_result_as_array())

Expand Down
11 changes: 11 additions & 0 deletions pandas/tests/io/formats/test_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -3432,3 +3432,14 @@ def test_format_remove_leading_space_dataframe(input_array, expected):
# GH: 24980
df = pd.DataFrame(input_array).to_string(index=False)
assert df == expected


@pytest.mark.parametrize("na_rep, string", [("NaN", "nan"), ("Ted", "Ted")])
Copy link
Member

@simonjayhawkins simonjayhawkins Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure about using "nan" when na_rep is "NaN", although that is the string output of "{:.2f}".format(np.nan).

Maybe could use lib.no_default for na_rep, raise if float_format also specified and use "NaN" if float_format not specified.

maybe worth considering passing np.nan to func if passed to float_format, and if result is a string assume func handles missing values, something like

try:
   func_handles_na = isinstance(func(np.nan), str)
except Execption:
   func_handles_na = False

and then use na_rep if func_handles_na is False. but I'm not sure if we gain anything.

The main issue with the custom formatters (applies to formatters kwarg as well) is that we do not give complete control to the custom formatter. off the top of my head, strings maybe trimmed, spaces added, precision changed, truncation applied.

The other issue is that the EAs use the custom formatters. So this is not necessarily an easy issue to fix in isolation.

def test_to_string_na_rep_and_float_format(na_rep, string):
# GH 13828
df = DataFrame([["A", 1.2225], ["A", None]], columns=["Group", "Data"])
result = df.to_string(na_rep=na_rep, float_format="{:.2f}".format)
expected = f""" Group Data
0 A 1.22
1 A {string}"""
assert result == expected
35 changes: 35 additions & 0 deletions pandas/tests/io/formats/test_to_html.py
Original file line number Diff line number Diff line change
Expand Up @@ -820,3 +820,38 @@ def test_html_repr_min_rows(datapath, max_rows, min_rows, expected):
with option_context("display.max_rows", max_rows, "display.min_rows", min_rows):
result = df._repr_html_()
assert result == expected


@pytest.mark.parametrize("na_rep, string", [("NaN", "nan"), ("Ted", "Ted")])
def test_to_html_na_rep_and_float_format(na_rep, string):
# https://github.com/pandas-dev/pandas/issues/13828
df = DataFrame(
[
["A", 1.2225],
["A", None],
],
columns=["Group", "Data"],
)
result = df.to_html(na_rep=na_rep, float_format="{:.2f}".format)
expected = f"""<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>Group</th>
<th>Data</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>A</td>
<td>1.22</td>
</tr>
<tr>
<th>1</th>
<td>A</td>
<td>{string}</td>
</tr>
</tbody>
</table>"""
assert result == expected
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for html tests, can use expected_html fixture. see test_to_html_justify for usage as template.

21 changes: 21 additions & 0 deletions pandas/tests/io/formats/test_to_latex.py
Original file line number Diff line number Diff line change
Expand Up @@ -1284,3 +1284,24 @@ def test_get_strrow_multindex_multicolumn(self, row_num, expected):
)

assert row_string_converter.get_strrow(row_num=row_num) == expected

@pytest.mark.parametrize("na_rep, string", [("NaN", "nan"), ("Ted", "Ted")])
def test_to_latex_na_rep_and_float_format(self, na_rep, string):
df = DataFrame(
[
["A", 1.2225],
["A", None],
],
columns=["Group", "Data"],
)
result = df.to_latex(na_rep=na_rep, float_format="{:.2f}".format)
expected = f"""\\begin{{tabular}}{{llr}}
\\toprule
{{}} & Group & Data \\\\
\\midrule
0 & A & 1.22 \\\\
1 & A & {string} \\\\
\\bottomrule
\\end{{tabular}}
"""
assert result == expected