Add Series._repr_html_ #27228

mrocklin · 2019-07-04T12:47:50Z

closes Pandas Series should provide to_html method #8829
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback · 2019-07-04T16:45:32Z

@mrocklin we just black :-> so if you can reformat.

mrocklin · 2019-07-04T19:10:55Z

Sure. Glad to hear that y'all are adopting black. I wasn't sure how to apply black to just a diff, but hopefully this passes.

I'm also happy to track down the errors in CI, but I wanted to make sure that this is something that folks want. My assumption was that folks might have thoughts on how this gets rendered.

WillAyd · 2019-07-05T15:42:39Z

IIRC @simonjayhawkins might have thoughts here

ghost · 2019-07-05T16:12:33Z

+1, this has been a wart for a long time.

Can we have support for pd.set_option("display.show_dimension", True), please?

(@jorisvandenbossche, Series.formatter(role="dimension.html"). Last example, I promise)

mrocklin · 2019-07-05T19:03:12Z

Can we have support for pd.set_option("display.show_dimension", True), please?

Unfortunately I don't know what this means.

Hopefully though it's already handled? All this does is call to_frame().to_html() and then mucks about with the header and footer.

simonjayhawkins · 2019-07-05T20:42:06Z

IIRC @simonjayhawkins might have thoughts here

firstly to make my position clear. I am -1 on changing the default Series display in the notebook.

That said, adding a Series.to_html() method is most desirable.

I think some pd.option is required to activate html output of the Series and it needs to be indepenent of the DataFrame option.

As a quick and dirty implementation the current PR could do the job. But unfortunately it could be much more complex.

I'm against postprocessing the .html as a string. This could potentially lead to future changes and fixes in the DataFrame display breaking Series display tests. This would lead to less atomic bug fixes and without a good independant set of tests on the Series output could lead to regresssions.

But I think that potentially this postprocessing (if it was deemed an acceptable approach) could be reduced considerably..

consider this series

pd.Series([1,2,3], index=pd.Index([4,5,6], name="index"), name="Series")
index
4    1
5    2
6    3
Name: Series, dtype: int64

the DataFrame looks like.

pd.Series([1,2,3], index=pd.Index([4,5,6], name="index"), name="Series").to_frame()

	Series
index
4	1
5	2
6	3

this includes <div> and <style> tags specifically for the jupyter notebook and other formatting is applied in the <thead> section

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Series</th>
    </tr>
    <tr>
      <th>index</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>4</th>
      <td>1</td>
    </tr>
    <tr>
      <th>5</th>
      <td>2</td>
    </tr>
    <tr>
      <th>6</th>
      <td>3</td>
    </tr>
  </tbody>
</table>
</div>

so it's not necessarily as straightforward as removing it.

there are options on DataFrame.to_html() that could be used to removed the undesired column label and index name to reduce postprocessing. (I've excluded the Index name, but that could be included for parity with the current Series repr.)

print(pd.Series([1,2,3], index=pd.Index([4,5,6], name="index"), name="Series").to_frame().to_html(header=False, index_names=False, notebook=True))

4	1
5	2
6	3

SeriesFormatter already has the logic to print Name: Series, dtype: int64 at the bottom.

this could be added in a <p> tag following the table. That's how it's done for showing dimensions of a truncated DataFrame..

pd.set_option('display.max_rows',5)
pd.DataFrame(np.arange(10))

	0
0	0
1	1
...	...
8	8
9	9

10 rows × 1 columns

so the series represation could look like..

4	1
5	2
6	3

Name: Series, dtype: int64

I think also that the general concensus on formatting is that the formatting should be done in a Formatter class and not in pandas/core/series.py. This immediately starts making the problem more complex as you have the relationships between SeriesFormatter, DataFrameFormatter, TableFormatter, HTMLFormatter and NotebookFormatter to contend with.

Some parameters will need to be accepted by Series.to_html() and passed to DataFrame.to_html but not all. max_cols probably doesn't make sense.

Another issue is the Styler. With this current implementation, we would need to be able to apply a Styler to the Series and then transfer it to the DataFrame during the to_frame() operation.

jorisvandenbossche · 2019-07-05T21:37:44Z

firstly to make my position clear. I am -1 on changing the default Series display in the notebook.

Can you clarify why you wouldn't change the default repr for the notebook? (we have a html repr for DataFrame for a long time) The other arguments more seem (relevant!) discussion points about the implementation, but not about why we would / would not have a html repr for Series.

simonjayhawkins · 2019-07-05T22:03:12Z

Can you clarify why you wouldn't change the default repr for the notebook?

see #8829 (comment) : "A Series isn't a Table-like thing, it's a vector like thing" summed it up back in Nov 2014.

jorisvandenbossche · 2019-07-06T02:21:52Z

A Series isn't a Table-like thing, it's a vector like thing" summed it up back in Nov 2014.

But an html repr, although maybe using the html table construct, doesn't necessarily need to look like a table. You can make it look like a 1D vector ?

There have been several attempts / issues (open issue for this is actually #5563, see linked issues and PRs there), and I have not seen that remark come up anywhere else. I think the discussion mainly has been: we need a good design proposal as it needs to look clearly distinct from a 1-column DataFrame.

simonjayhawkins · 2019-07-06T09:26:54Z

open issue for this is actually #5563

I would suggest then to reduce the scope of this PR to address #8829 only. i.e "Pandas Series should provide to_html method" and cover #5563 in a follow-on.

Alternatively, update this PR to reference #5563 and do a precursor PR for #8829

@mrocklin ?

mrocklin · 2019-07-06T11:44:31Z

To be honest I just set this up in a spare hour of work. It seems like there is still a decision to be made among the Pandas maintainers. I might let you all handle that and then come back if you all settle things. My personal opinion is that Series should have a `_repr_html_` method that is similar (but not exactly the same as) the DataFrame version. That is my only interest in this work. I don't personally care about to_html on its own.

…

On Sat, Jul 6, 2019 at 10:27 AM Simon Hawkins ***@***.***> wrote: open issue for this is actually #5563 <#5563> I would suggest then to reduce the scope of this PR to address #8829 <#8829> only. i.e "Pandas Series should provide to_html method" and cover #5563 <#5563> in a follow-on. Alternatively, update this PR to reference #5563 <#5563> and do a precursor PR for #8829 <#8829> @mrocklin <https://github.com/mrocklin> ? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27228?email_source=notifications&email_token=AACKZTHKL4YLYHX2O4SAVCTP6BQQTA5CNFSM4H5Z2YOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZKV6OA#issuecomment-508911416>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACKZTD5YBUEFGADB6SHIETP6BQQTANCNFSM4H5Z2YOA> .

simonjayhawkins · 2019-07-06T11:51:06Z

My personal opinion is that Series should have a _repr_html_ method that is similar (but not exactly the same as) the DataFrame version.

Agreed. I think the issue is only whether it should be the default.

If it is the default, it requires a complete, integrated, maintainable and well tested and documented solution (possibly with additional options to allow disgruntled users to revert to the old behavior) from the get-go.

This makes the review process more rigorous and drawn-out.

mrocklin · 2019-07-06T12:16:19Z

Yup. Fair enough. I probably don't have more than a few hours to devote to this, so I'm probably not the right person to take this on. Happy to close.

…

On Sat, Jul 6, 2019 at 12:51 PM Simon Hawkins ***@***.***> wrote: My personal opinion is that Series should have a _repr_html_ method that is similar (but not exactly the same as) the DataFrame version. Agreed. I think the issue is only whether it should be the default. If it is the default, it requires a complete, integrated, maintainable and well tested and documented solution (possibly with additional options to allow disgruntled users to revert to the old behavior) from the get-go. This makes the review process more rigorous and drawn-out. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#27228?email_source=notifications&email_token=AACKZTGQPRVBKTR65G5PI4DP6CBNFA5CNFSM4H5Z2YOKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZKYEGI#issuecomment-508920345>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AACKZTCJF4NQJPWVKT72MQDP6CBNFANCNFSM4H5Z2YOA> .

pandas/core/series.py

jorisvandenbossche · 2019-07-06T13:20:35Z

I would suggest then to reduce the scope of this PR to address #8829 only. i.e "Pandas Series should provide to_html method" and cover #5563 in a follow-on.

Alternatively, I would personally also be fine to focus this on only adding _repr_html_, and punting on a full blown to_html. For the repr, we don't need to support all options, and for to_html, you already have the rather easy workaround of s.to_frame().to_html() if you want to export html.
(not that we don't need to cover several corner cases etc, also for the repr, of course)

possibly with additional options to allow disgruntled users to revert to the old behavior)

We already have pd.options.display.notebook_repr_html with True/False options, I think we can simply expand that to have an option for "frame-only" (the current behaviour)

simonjayhawkins · 2019-07-07T11:59:41Z

Alternatively, I would personally also be fine to focus this on only adding _repr_html_, and punting on a full blown to_html. For the repr, we don't need to support all options

Agreed. Simplifies things greatly.

Not adding a new public method reduces and simplifies documentation additions (no need to setup shared docstrings).

not adding an IO method means no need to add file-like buffer support.

so definitely worth keeping #8829 and #5563 separate

mrocklin · 2019-07-07T12:36:15Z

I've removed the to_html implementation and focused on the _repr_html_ method.

mrocklin · 2019-07-24T18:31:17Z

Checking in. Is this approach generally palatable to the Pandas maintainers?

TomAugspurger

+1 to implementing just _repr_html_ now.

On the actual repr itself, I'm concerned about losing the Series dtype in the repr. I played a bit with a <caption>, but that has some poor interactions with the width of the table.

TomAugspurger · 2019-07-29T19:28:48Z

pandas/core/series.py

+    def _repr_html_(self):
+        text = self.to_frame()._repr_html_()
+
+        lines = text.split("\n")


Answering a question I had: Does this fail on data with embedded newlines? No, to_html escapes the newlines, so we're OK.

TomAugspurger · 2019-07-29T19:34:47Z

pandas/core/series.py

@@ -1611,6 +1611,32 @@ def __repr__(self):

        return result

+    def _repr_html_(self):
+        text = self.to_frame()._repr_html_()


Rather than all the head_start, head_stop calculation, can we instead use self.to_frame().to_html(header=False, notebook=True)?

Played with this briefly, and it seems to work reasonably well.

Is there a reason why we wouldn't want to this be housed in SeriesFormatter (pandas.io.formats.format) instead? I think if we did that could consolidate logic in core (maybe move to generic) and leave dispatching to the actual formatter code

WillAyd · 2019-08-26T18:14:29Z

pandas/tests/io/formats/test_format.py

@@ -1769,6 +1769,24 @@ def test_repr_html(self, float_frame):

        tm.reset_display_options()

+    def test_series(series):
+        df = DataFrame({"abc": range(1000)})


Is it possible to create a smaller series and make a complete assertion about the content?

WillAyd · 2019-08-26T18:17:10Z

pandas/core/series.py

@@ -1611,6 +1611,32 @@ def __repr__(self):

        return result

+    def _repr_html_(self):
+        text = self.to_frame()._repr_html_()


Is there a reason why we wouldn't want to this be housed in SeriesFormatter (pandas.io.formats.format) instead? I think if we did that could consolidate logic in core (maybe move to generic) and leave dispatching to the actual formatter code

WillAyd · 2019-09-20T14:43:35Z

@mrocklin is this PR still active?

WillAyd · 2019-10-11T21:59:27Z

Closing as stale but certainly ping if you'd like to continue

Add Series._repr_html_

a7596d6

Fixes pandas-dev#8829

jreback added the Output-Formatting __repr__ of pandas objects, to_string label Jul 4, 2019

black

1129e37

mrocklin mentioned this pull request Jul 4, 2019

Pandas Series should provide to_html method #8829

Open

simonjayhawkins added the IO HTML read_html, to_html, Styler.apply, Styler.applymap label Jul 6, 2019

simonjayhawkins reviewed Jul 6, 2019

View reviewed changes

pandas/core/series.py Outdated Show resolved Hide resolved

Only handle _repr_html_, not to_html

c623e41

cleanup tests

2fbba25

TomAugspurger reviewed Jul 30, 2019

View reviewed changes

WillAyd requested changes Aug 26, 2019

View reviewed changes

WillAyd closed this Oct 11, 2019

WillAyd mentioned this pull request Oct 28, 2019

initial HTML rendering for Series #29248

Closed

5 tasks

WillAyd mentioned this pull request Nov 4, 2019

Series repr html only #29383

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Series._repr_html_ #27228

Add Series._repr_html_ #27228

mrocklin commented Jul 4, 2019

jreback commented Jul 4, 2019

mrocklin commented Jul 4, 2019

WillAyd commented Jul 5, 2019

ghost commented Jul 5, 2019 •

edited by ghost

Loading

mrocklin commented Jul 5, 2019

simonjayhawkins commented Jul 5, 2019

jorisvandenbossche commented Jul 5, 2019

simonjayhawkins commented Jul 5, 2019

jorisvandenbossche commented Jul 6, 2019

simonjayhawkins commented Jul 6, 2019

mrocklin commented Jul 6, 2019 via email

simonjayhawkins commented Jul 6, 2019

mrocklin commented Jul 6, 2019 via email

jorisvandenbossche commented Jul 6, 2019

simonjayhawkins commented Jul 7, 2019

mrocklin commented Jul 7, 2019

mrocklin commented Jul 24, 2019

TomAugspurger left a comment •

edited

Loading

TomAugspurger Jul 29, 2019

TomAugspurger Jul 29, 2019

TomAugspurger Jul 30, 2019

WillAyd Aug 26, 2019

WillAyd Aug 26, 2019

WillAyd Aug 26, 2019

WillAyd commented Sep 20, 2019

WillAyd commented Oct 11, 2019

Add Series._repr_html_ #27228

Add Series._repr_html_ #27228

Conversation

mrocklin commented Jul 4, 2019

jreback commented Jul 4, 2019

mrocklin commented Jul 4, 2019

WillAyd commented Jul 5, 2019

ghost commented Jul 5, 2019 • edited by ghost Loading

mrocklin commented Jul 5, 2019

simonjayhawkins commented Jul 5, 2019

jorisvandenbossche commented Jul 5, 2019

simonjayhawkins commented Jul 5, 2019

jorisvandenbossche commented Jul 6, 2019

simonjayhawkins commented Jul 6, 2019

mrocklin commented Jul 6, 2019 via email

simonjayhawkins commented Jul 6, 2019

mrocklin commented Jul 6, 2019 via email

jorisvandenbossche commented Jul 6, 2019

simonjayhawkins commented Jul 7, 2019

mrocklin commented Jul 7, 2019

mrocklin commented Jul 24, 2019

TomAugspurger left a comment • edited Loading

Choose a reason for hiding this comment

TomAugspurger Jul 29, 2019

Choose a reason for hiding this comment

TomAugspurger Jul 29, 2019

Choose a reason for hiding this comment

TomAugspurger Jul 30, 2019

Choose a reason for hiding this comment

WillAyd Aug 26, 2019

Choose a reason for hiding this comment

WillAyd Aug 26, 2019

Choose a reason for hiding this comment

WillAyd Aug 26, 2019

Choose a reason for hiding this comment

WillAyd commented Sep 20, 2019

WillAyd commented Oct 11, 2019

ghost commented Jul 5, 2019 •

edited by ghost

Loading

TomAugspurger left a comment •

edited

Loading