Skip to content

Add Series._repr_html_ #27228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

mrocklin
Copy link
Contributor

@mrocklin mrocklin commented Jul 4, 2019

Screen Shot 2019-07-04 at 1 45 05 PM

@jreback
Copy link
Contributor

jreback commented Jul 4, 2019

@mrocklin we just black :-> so if you can reformat.

@jreback jreback added the Output-Formatting __repr__ of pandas objects, to_string label Jul 4, 2019
@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 4, 2019

Sure. Glad to hear that y'all are adopting black. I wasn't sure how to apply black to just a diff, but hopefully this passes.

I'm also happy to track down the errors in CI, but I wanted to make sure that this is something that folks want. My assumption was that folks might have thoughts on how this gets rendered.

@WillAyd
Copy link
Member

WillAyd commented Jul 5, 2019

IIRC @simonjayhawkins might have thoughts here

@ghost
Copy link

ghost commented Jul 5, 2019

+1, this has been a wart for a long time.

Can we have support for pd.set_option("display.show_dimension", True), please?

(@jorisvandenbossche, Series.formatter(role="dimension.html"). Last example, I promise)

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 5, 2019

Can we have support for pd.set_option("display.show_dimension", True), please?

Unfortunately I don't know what this means.

Hopefully though it's already handled? All this does is call to_frame().to_html() and then mucks about with the header and footer.

@simonjayhawkins
Copy link
Member

IIRC @simonjayhawkins might have thoughts here

firstly to make my position clear. I am -1 on changing the default Series display in the notebook.

That said, adding a Series.to_html() method is most desirable.

I think some pd.option is required to activate html output of the Series and it needs to be indepenent of the DataFrame option.

As a quick and dirty implementation the current PR could do the job. But unfortunately it could be much more complex.

I'm against postprocessing the .html as a string. This could potentially lead to future changes and fixes in the DataFrame display breaking Series display tests. This would lead to less atomic bug fixes and without a good independant set of tests on the Series output could lead to regresssions.

But I think that potentially this postprocessing (if it was deemed an acceptable approach) could be reduced considerably..

consider this series

pd.Series([1,2,3], index=pd.Index([4,5,6], name="index"), name="Series")
index
4    1
5    2
6    3
Name: Series, dtype: int64

the DataFrame looks like.

pd.Series([1,2,3], index=pd.Index([4,5,6], name="index"), name="Series").to_frame()
Series
index
4 1
5 2
6 3

this includes <div> and <style> tags specifically for the jupyter notebook and other formatting is applied in the <thead> section

<div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Series</th>
    </tr>
    <tr>
      <th>index</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>4</th>
      <td>1</td>
    </tr>
    <tr>
      <th>5</th>
      <td>2</td>
    </tr>
    <tr>
      <th>6</th>
      <td>3</td>
    </tr>
  </tbody>
</table>
</div>

so it's not necessarily as straightforward as removing it.

there are options on DataFrame.to_html() that could be used to removed the undesired column label and index name to reduce postprocessing. (I've excluded the Index name, but that could be included for parity with the current Series repr.)

print(pd.Series([1,2,3], index=pd.Index([4,5,6], name="index"), name="Series").to_frame().to_html(header=False, index_names=False, notebook=True))
4 1
5 2
6 3

SeriesFormatter already has the logic to print Name: Series, dtype: int64 at the bottom.

this could be added in a <p> tag following the table. That's how it's done for showing dimensions of a truncated DataFrame..

pd.set_option('display.max_rows',5)
pd.DataFrame(np.arange(10))
0
0 0
1 1
... ...
8 8
9 9

10 rows × 1 columns

so the series represation could look like..

4 1
5 2
6 3

Name: Series, dtype: int64

I think also that the general concensus on formatting is that the formatting should be done in a Formatter class and not in pandas/core/series.py. This immediately starts making the problem more complex as you have the relationships between SeriesFormatter, DataFrameFormatter, TableFormatter, HTMLFormatter and NotebookFormatter to contend with.

Some parameters will need to be accepted by Series.to_html() and passed to DataFrame.to_html but not all. max_cols probably doesn't make sense.

Another issue is the Styler. With this current implementation, we would need to be able to apply a Styler to the Series and then transfer it to the DataFrame during the to_frame() operation.

@jorisvandenbossche
Copy link
Member

firstly to make my position clear. I am -1 on changing the default Series display in the notebook.

Can you clarify why you wouldn't change the default repr for the notebook? (we have a html repr for DataFrame for a long time) The other arguments more seem (relevant!) discussion points about the implementation, but not about why we would / would not have a html repr for Series.

@simonjayhawkins
Copy link
Member

Can you clarify why you wouldn't change the default repr for the notebook?

see #8829 (comment) : "A Series isn't a Table-like thing, it's a vector like thing" summed it up back in Nov 2014.

@jorisvandenbossche
Copy link
Member

A Series isn't a Table-like thing, it's a vector like thing" summed it up back in Nov 2014.

But an html repr, although maybe using the html table construct, doesn't necessarily need to look like a table. You can make it look like a 1D vector ?

There have been several attempts / issues (open issue for this is actually #5563, see linked issues and PRs there), and I have not seen that remark come up anywhere else. I think the discussion mainly has been: we need a good design proposal as it needs to look clearly distinct from a 1-column DataFrame.

@simonjayhawkins
Copy link
Member

open issue for this is actually #5563

I would suggest then to reduce the scope of this PR to address #8829 only. i.e "Pandas Series should provide to_html method" and cover #5563 in a follow-on.

Alternatively, update this PR to reference #5563 and do a precursor PR for #8829

@mrocklin ?

@simonjayhawkins simonjayhawkins added the IO HTML read_html, to_html, Styler.apply, Styler.applymap label Jul 6, 2019
@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 6, 2019 via email

@simonjayhawkins
Copy link
Member

My personal opinion is that Series should have a _repr_html_ method that is similar (but not exactly the same as) the DataFrame version.

Agreed. I think the issue is only whether it should be the default.

If it is the default, it requires a complete, integrated, maintainable and well tested and documented solution (possibly with additional options to allow disgruntled users to revert to the old behavior) from the get-go.

This makes the review process more rigorous and drawn-out.

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 6, 2019 via email

@jorisvandenbossche
Copy link
Member

I would suggest then to reduce the scope of this PR to address #8829 only. i.e "Pandas Series should provide to_html method" and cover #5563 in a follow-on.

Alternatively, I would personally also be fine to focus this on only adding _repr_html_, and punting on a full blown to_html. For the repr, we don't need to support all options, and for to_html, you already have the rather easy workaround of s.to_frame().to_html() if you want to export html.
(not that we don't need to cover several corner cases etc, also for the repr, of course)

possibly with additional options to allow disgruntled users to revert to the old behavior)

We already have pd.options.display.notebook_repr_html with True/False options, I think we can simply expand that to have an option for "frame-only" (the current behaviour)

@simonjayhawkins
Copy link
Member

Alternatively, I would personally also be fine to focus this on only adding _repr_html_, and punting on a full blown to_html. For the repr, we don't need to support all options

Agreed. Simplifies things greatly.

Not adding a new public method reduces and simplifies documentation additions (no need to setup shared docstrings).

not adding an IO method means no need to add file-like buffer support.

so definitely worth keeping #8829 and #5563 separate

@mrocklin
Copy link
Contributor Author

mrocklin commented Jul 7, 2019

I've removed the to_html implementation and focused on the _repr_html_ method.

@mrocklin
Copy link
Contributor Author

Checking in. Is this approach generally palatable to the Pandas maintainers?

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to implementing just _repr_html_ now.

On the actual repr itself, I'm concerned about losing the Series dtype in the repr. I played a bit with a <caption>, but that has some poor interactions with the width of the table.

def _repr_html_(self):
text = self.to_frame()._repr_html_()

lines = text.split("\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering a question I had: Does this fail on data with embedded newlines? No, to_html escapes the newlines, so we're OK.

@@ -1611,6 +1611,32 @@ def __repr__(self):

return result

def _repr_html_(self):
text = self.to_frame()._repr_html_()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than all the head_start, head_stop calculation, can we instead use self.to_frame().to_html(header=False, notebook=True)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Played with this briefly, and it seems to work reasonably well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we wouldn't want to this be housed in SeriesFormatter (pandas.io.formats.format) instead? I think if we did that could consolidate logic in core (maybe move to generic) and leave dispatching to the actual formatter code

@@ -1769,6 +1769,24 @@ def test_repr_html(self, float_frame):

tm.reset_display_options()

def test_series(series):
df = DataFrame({"abc": range(1000)})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create a smaller series and make a complete assertion about the content?

@@ -1611,6 +1611,32 @@ def __repr__(self):

return result

def _repr_html_(self):
text = self.to_frame()._repr_html_()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we wouldn't want to this be housed in SeriesFormatter (pandas.io.formats.format) instead? I think if we did that could consolidate logic in core (maybe move to generic) and leave dispatching to the actual formatter code

@WillAyd
Copy link
Member

WillAyd commented Sep 20, 2019

@mrocklin is this PR still active?

@WillAyd
Copy link
Member

WillAyd commented Oct 11, 2019

Closing as stale but certainly ping if you'd like to continue

@WillAyd WillAyd closed this Oct 11, 2019
@WillAyd WillAyd mentioned this pull request Oct 28, 2019
5 tasks
@WillAyd WillAyd mentioned this pull request Nov 4, 2019
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pandas Series should provide to_html method
6 participants