BUG: Don't ignore na_rep in DataFrame.to_html #36690

dsaxton · 2020-09-27T21:58:25Z

closes to_html ignores na_rep when float_format set #13828
closes DataFrame.to_latex() should honor na_rep after formatter. #9046
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…_format

simonjayhawkins · 2020-09-29T18:56:53Z

Off the top of my head I recall a discussion where this may not be a bug. There may be a duplicate issue.

I've not checked the code. but if a user supplies a formatter, they are responsible for the na values too. Is that what is happening here?

dsaxton · 2020-09-29T19:14:08Z

Off the top of my head I recall a discussion where this may not be a bug. There may be a duplicate issue.

I've not checked the code. but if a user supplies a formatter, they are responsible for the na values too. Is that what is happening here?

There was an old PR similar to this that was almost merged but ended up being closed as stale, so I assume this is considered a real bug.

I could see how it might also be seen as ambiguous since NaN is technically a float, so that the user is asking for two different formats for these values (but in that case maybe the docs should explicitly clarify what happens in this situation).

simonjayhawkins · 2020-09-29T19:43:59Z

does this also close #9046? and presumably to_string does the same?

simonjayhawkins · 2020-09-29T20:07:18Z

I've not yet found the issue but from

float_format: one-parameter function, optional, default None
Formatter function to apply to columns’ elements if they are floats. The result of this function must be a unicode string.

so the changes here break existing code (this is a long standing behaviour even if considered a bug)

def my_formatter(x):
    if np.isnan(x):
        return "ted"
    else:
        return str(x)


df = pd.DataFrame(
    [
        ["A", 1.2225],
        [
            "A",
        ],
    ],
    columns=["Group", "Data"],
)
df.to_html(float_format=my_formatter)

1.1.2

this PR

…_format

dsaxton · 2020-09-29T20:40:47Z

does this also close #9046? and presumably to_string does the same?

Yes, tests added for those cases

…_format

dsaxton · 2020-09-30T04:03:20Z

so the changes here break existing code (this is a long standing behaviour even if considered a bug)

Ah I was forgetting "NaN" is actually the default and not None. Made a change which fixes this but leaves a different "bug" (albiet a more obscure one, it only ignores the specific string "NaN"; I don't think it's possible to respect both arguments in absolutely all cases):

[ins] In [1]: import numpy as np
         ...: import pandas as pd
         ...:
         ...: def my_formatter(x):
         ...:     if np.isnan(x):
         ...:         return "ted"
         ...:     else:
         ...:         return str(x)
         ...:
         ...:
         ...: df = pd.DataFrame(
         ...:     [
         ...:         ["A", 1.2225],
         ...:         [
         ...:             "A",
         ...:         ],
         ...:     ],
         ...:     columns=["Group", "Data"],
         ...: )
         ...: print(df.to_html(na_rep="NaN", float_format=my_formatter))
         ...:
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Group</th>
      <th>Data</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>A</td>
      <td>1.2225</td>
    </tr>
    <tr>
      <th>1</th>
      <td>A</td>
      <td>ted</td>
    </tr>
  </tbody>
</table>

…_format

jreback · 2020-10-01T02:08:05Z

pandas/io/formats/format.py

@@ -1533,7 +1533,14 @@ def format_values_with(float_format):
    def _format_strings(self) -> List[str]:
        # shortcut
        if self.formatter is not None:


actually this entire section you added should be instead on L1440

and you will want to do something like

mask = isna(values) values = np.array(values, dtype="object") values[mask] = na_rep imask = (~mask).ravel( values.flat[imask] = np.array( [formatter(val) for val in values.ravel()[imask]] )

you will want to factor that out into a function and use it above in 2 (or more places)

I'm not sure I understand, get_result_as_array never actually gets called here. Are you thinking the formatter should already be handling NaN?

no my point is to share code; you are doing virtually the same thing, just in another way

Ok I see what you mean

jreback · 2020-10-01T02:10:41Z

K Can add a release note once the doc is started for 1.1.4

generally we won't do older bug fixes on a point release; only regressions. the reason is sometimes bug fixes can introduce a further bug :-> want to minimize the risk.

…_format

doc/source/whatsnew/v1.2.0.rst

simonjayhawkins · 2020-10-02T10:32:55Z

pandas/tests/io/formats/test_format.py

@@ -3432,3 +3432,14 @@ def test_format_remove_leading_space_dataframe(input_array, expected):
    # GH: 24980
    df = pd.DataFrame(input_array).to_string(index=False)
    assert df == expected
+
+
+@pytest.mark.parametrize("na_rep, string", [("NaN", "nan"), ("Ted", "Ted")])


i'm not sure about using "nan" when na_rep is "NaN", although that is the string output of "{:.2f}".format(np.nan).

Maybe could use lib.no_default for na_rep, raise if float_format also specified and use "NaN" if float_format not specified.

maybe worth considering passing np.nan to func if passed to float_format, and if result is a string assume func handles missing values, something like

try: func_handles_na = isinstance(func(np.nan), str) except Execption: func_handles_na = False

and then use na_rep if func_handles_na is False. but I'm not sure if we gain anything.

The main issue with the custom formatters (applies to formatters kwarg as well) is that we do not give complete control to the custom formatter. off the top of my head, strings maybe trimmed, spaces added, precision changed, truncation applied.

The other issue is that the EAs use the custom formatters. So this is not necessarily an easy issue to fix in isolation.

simonjayhawkins · 2020-10-02T10:36:28Z

pandas/tests/io/formats/test_to_html.py

+    </tr>
+  </tbody>
+</table>"""
+    assert result == expected


for html tests, can use expected_html fixture. see test_to_html_justify for usage as template.

simonjayhawkins · 2020-10-02T11:02:00Z

so returning to issue #13828, the fact that "{:.2f}".format(np.nan)" is "nan", I think the output is correct. and this is not a bug #36690 (comment)

but raising if na_rep is specified when float_format is specified maybe a better solution.

currently, in this PR

This is inconsistent.

dsaxton · 2020-10-02T14:52:44Z

but raising if na_rep is specified when float_format is specified maybe a better solution.

Raising when both are specified makes sense to me (or rather when float_format is specified and na_rep != "NaN", unless the default were also changed), since there is in fact no way to apply this consistently

simonjayhawkins · 2020-10-02T15:24:19Z

despite my previous comments, I think we could only raise if not a mixed frame. otherwise may cause another regression.

I looked at sorting out some of the formatting in the past, and it's very easy to break things. (needs many more tests with lots of parameterisation to know for sure)

It is maybe that we do need to break things, and maybe the case in #36690 (comment) is one of those.

simonjayhawkins

Thanks @dsaxton lgtm.

regarding the corner case where a user custom function handles missing values which is now broken, any idea on how we should communicate that to the users?

…_format

dsaxton · 2020-10-14T15:11:39Z

Thanks @dsaxton lgtm.

regarding the corner case where a user custom function handles missing values which is now broken, any idea on how we should communicate that to the users?

Maybe warrants a note in user_guide/io.rst?

…_format

simonjayhawkins · 2020-10-18T11:47:33Z

Maybe warrants a note in user_guide/io.rst?

how about just a versionchanged tag for float_format with a one-liner along the lines "If a function is passed, only non-missing values are passed to the function and na_rep is used for missing values"

…_format

dsaxton · 2020-10-18T18:02:42Z

how about just a versionchanged tag for float_format with a one-liner along the lines "If a function is passed, only non-missing values are passed to the function and na_rep is used for missing values"

Added a note and versionchanged to the common docstring

…_format

jreback

minor comment, ping on greenish

jreback · 2020-10-20T23:03:45Z

pandas/io/formats/format.py

@@ -1444,8 +1449,19 @@ def get_result_as_array(self) -> np.ndarray:
        Returns the float values converted into strings using
        the parameters given at initialisation, as a numpy array
        """
+
+        def format_with_na_rep(values, formatter, na_rep):


can you type

Added some types

jreback · 2020-10-20T23:04:54Z

pandas/io/formats/format.py

+        def format_with_na_rep(values, formatter, na_rep):
+            mask = isna(values)
+            formatted = np.array(
+                [


can you use this function more generally? (e.g. maybe define it in the module); can be a followup as well.

It seems similar patterns occur elsewhere, although most are at the scalar level

…_format

dsaxton · 2020-10-23T17:03:57Z

@jreback I think this is good to go, CI failure is unrelated

jreback · 2020-10-24T02:49:17Z

thanks @dsaxton

* BUG: Don't ignore na_rep in DataFrame.to_html * Back to list * Move test and cover * Test for to_latex * More tests * Maybe * Note * Refactor * Move note * Nothing * Fixup * Remove * Doc * Type

dsaxton added 2 commits September 27, 2020 16:51

BUG: Don't ignore na_rep in DataFrame.to_html

e1614ae

Back to list

f56d893

dsaxton added Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap labels Sep 27, 2020

dsaxton added 2 commits September 28, 2020 16:49

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

247e79b

…_format

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

f33e4d0

…_format

dsaxton added 4 commits September 29, 2020 15:09

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

e83b7ed

…_format

Move test and cover

a240b49

Test for to_latex

7ee3ef2

More tests

72a812a

dsaxton added 2 commits September 29, 2020 23:02

Maybe

ca57e06

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

dd25388

…_format

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

4a599b2

…_format

jreback added this to the 1.2 milestone Oct 1, 2020

jreback requested changes Oct 1, 2020

View reviewed changes

jreback added IO LaTeX to_latex Output-Formatting __repr__ of pandas objects, to_string labels Oct 1, 2020

dsaxton added 2 commits October 1, 2020 19:11

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

dc6287a

…_format

Note

faa8e2c

simonjayhawkins reviewed Oct 2, 2020

View reviewed changes

dsaxton added 5 commits October 13, 2020 14:19

Refactor

c81aa04

Move note

52f16fc

Nothing

8c46ab7

Fixup

6cb161c

Remove

b166ffc

simonjayhawkins approved these changes Oct 14, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

199c560

…_format

dsaxton added 2 commits October 14, 2020 12:23

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

5a054d7

…_format

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

4423dd7

…_format

dsaxton added 2 commits October 18, 2020 12:05

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

0c49eb0

…_format

Doc

4b53c91

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

87c172a

…_format

jreback requested changes Oct 20, 2020

View reviewed changes

dsaxton added 6 commits October 20, 2020 18:31

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

265e2a8

…_format

Type

8f0ca15

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

7be7c38

…_format

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

5a50ad0

…_format

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

1af22a5

…_format

Merge remote-tracking branch 'upstream/master' into na_rep-with-float…

37cc78c

…_format

jreback approved these changes Oct 24, 2020

View reviewed changes

jreback merged commit d240cb8 into pandas-dev:master Oct 24, 2020

dsaxton deleted the na_rep-with-float_format branch October 24, 2020 02:51

simonjayhawkins mentioned this pull request Apr 10, 2021

REGR: object column repr not respecting float format #40850

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Don't ignore na_rep in DataFrame.to_html #36690

BUG: Don't ignore na_rep in DataFrame.to_html #36690

dsaxton commented Sep 27, 2020 •

edited

Loading

simonjayhawkins commented Sep 29, 2020

dsaxton commented Sep 29, 2020 •

edited

Loading

simonjayhawkins commented Sep 29, 2020

simonjayhawkins commented Sep 29, 2020

dsaxton commented Sep 29, 2020

dsaxton commented Sep 30, 2020 •

edited

Loading

jreback Oct 1, 2020

jreback Oct 1, 2020

dsaxton Oct 2, 2020

jreback Oct 2, 2020

dsaxton Oct 13, 2020

jreback commented Oct 1, 2020

simonjayhawkins Oct 2, 2020 •

edited

Loading

simonjayhawkins Oct 2, 2020

simonjayhawkins commented Oct 2, 2020

dsaxton commented Oct 2, 2020

simonjayhawkins commented Oct 2, 2020

simonjayhawkins left a comment

dsaxton commented Oct 14, 2020

simonjayhawkins commented Oct 18, 2020

dsaxton commented Oct 18, 2020

jreback left a comment

jreback Oct 20, 2020

dsaxton Oct 21, 2020

jreback Oct 20, 2020

dsaxton Oct 21, 2020

dsaxton commented Oct 23, 2020

jreback commented Oct 24, 2020

BUG: Don't ignore na_rep in DataFrame.to_html #36690

BUG: Don't ignore na_rep in DataFrame.to_html #36690

Conversation

dsaxton commented Sep 27, 2020 • edited Loading

simonjayhawkins commented Sep 29, 2020

dsaxton commented Sep 29, 2020 • edited Loading

simonjayhawkins commented Sep 29, 2020

simonjayhawkins commented Sep 29, 2020

dsaxton commented Sep 29, 2020

dsaxton commented Sep 30, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 1, 2020

simonjayhawkins Oct 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simonjayhawkins commented Oct 2, 2020

dsaxton commented Oct 2, 2020

simonjayhawkins commented Oct 2, 2020

simonjayhawkins left a comment

Choose a reason for hiding this comment

dsaxton commented Oct 14, 2020

simonjayhawkins commented Oct 18, 2020

dsaxton commented Oct 18, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dsaxton commented Oct 23, 2020

jreback commented Oct 24, 2020

dsaxton commented Sep 27, 2020 •

edited

Loading

dsaxton commented Sep 29, 2020 •

edited

Loading

dsaxton commented Sep 30, 2020 •

edited

Loading

simonjayhawkins Oct 2, 2020 •

edited

Loading