Skip to content

Add _repr_html method to make DataFrames nice in IPython notebook. #772

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 13, 2012

Conversation

ellisonbg
Copy link
Contributor

This makes DataFrame's show up and nice HTML tables in the IPython notebook. This initial implementation is very plain - but better than the plaintext output. We need to think more about how we want to handle these things, but this is a start.

@takluyver
Copy link
Contributor

Good idea, I'd been thinking about this recently.

If the dataframe is longer than some limit, the standard __repr__ will switch to a brief form just showing info about each column, rather than showing a massive table. I don't think to_html() currently does that, but it should probably be implemented for this.

@ellisonbg
Copy link
Contributor Author

I think It would be nice to put the table in a scrollable div of fixed size. I haven't looked at to_html yet, but do you think that approach makes sense?

@takluyver
Copy link
Contributor

That mitigates the problem, but just testing, I can quickly make a DataFrame with 10M rows. That HTML will probably take some time to generate, and take over a lot of memory (first on the server, then in the browser). So I think there needs to be some cut-off.

Another option would be to do a head view - so show the first, say, 50 rows, and have something at the bottom that indicates there's more.

@lodagro
Copy link
Contributor

lodagro commented Feb 10, 2012

to_html() does not have the __repr__() cleverness to switch between a brief form or a full dump. It is not really needed i think, there is plenty of stuff one can do in the html world to display tables in whatever form. FYI I'm using to_html() combined with mako

Now for the ipython notebook, it sounds like a good idea to do something extra on top of to_html() (like indeed for e.g a scrollable div) to handle large DataFrames. This can be done without changing to_html() itself.

@ellisonbg
Copy link
Contributor Author

A scrollable div could definitely be done on top of to_html. But I
worry that large objects will still be problematic. A 1M row frame
would be sending a massive chunk of HTML to the browser.

On Fri, Feb 10, 2012 at 2:27 PM, Wouter Overmeire
[email protected]
wrote:

to_html() does not have the __repr__() cleverness to switch between a brief form or a full dump. It is not really needed i think, there is plenty of stuff one can do in the html world to display tables in whatever form. FYI I'm using to_html() combined with mako

Now for the ipython notebook, it sounds like a good idea to do something extra on top of to_html() (like indeed for e.g a scrollable div) to handle large DataFrames. This can be done without changing to_html() itself.


Reply to this email directly or view it on GitHub:
#772 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
[email protected] and [email protected]

@lodagro lodagro merged commit 4103ce9 into pandas-dev:master Feb 13, 2012
@lodagro
Copy link
Contributor

lodagro commented Feb 13, 2012

I pulled your code, added a scrollable div and fall over to info representation for large DataFrames ( b570153). Also added a little unit test.

@takluyver
Copy link
Contributor

I see the mechanics of the fallback for large dataframes returns text in a <pre> tag. I think there's a neater way of falling back to a text repr, though I forget whether it's to return None or raise an error. Brian will know, I'm sure.

@wesm
Copy link
Member

wesm commented Feb 13, 2012

@lodagro the summary repr from _repr_html is missing the class header

<class 'pandas.core.frame.DataFrame'>

@wesm
Copy link
Member

wesm commented Feb 13, 2012

Also maybe it should use print_config.max_columns instead of 20?

@lodagro
Copy link
Contributor

lodagro commented Feb 13, 2012

Reason i did not use print_config variables is because it is in a scrollable div.
Meaning that with the default setting of print_config.max_columns, there would be no vertical scroll bar. Since repr would switch over to summary view if DataFrame is wider than terminal.

Ok, i will use the same switch over between full and summary for _repr_html_ and __repr__ and have a look at the missing class header.

@wesm
Copy link
Member

wesm commented Feb 13, 2012

Ah, that's a good point. Maybe just have no limit then on the number of columns since you can scroll right?

@lodagro
Copy link
Contributor

lodagro commented Feb 13, 2012

I did put a limit on rows/columns to avoid sending massive amounts of html to the notebook, in case of very large dataframes. So what do we chose?

Now that i think of it, what about adding some css styling? I don't know if the notebook can handle it. I can give it a try.
Any preferences on style?

@wesm
Copy link
Member

wesm commented Feb 13, 2012

No preferences there but go right ahead. I have never been the best with those kinds of aesthetics

@ellisonbg
Copy link
Contributor Author

A few points:

  • I think we do want to limit the number of rows that are sent to the browser, probably at around 100-500.
  • I like the idea of having to_html obey the print_config.max_columns option for this.
  • The generated HTML should be the same regardless of whether the full frame or a subset of rows is displayed. We should not fall back to the other representation if it is big. This will require modifying the logic in to_html.
  • For this particular view, I would not do any additional styling - the notebook already has css to style these tables and leaving it alone will allow it to be consistent with the rest of the notebook.
  • The scrollable div should be pretty small - small enough that it easily fits in browser window at one time. Otherwise we will run into weird double scrolling issues when both the div and window are trying to scroll.

@wesm
Copy link
Member

wesm commented Feb 23, 2012

Any plans to add an option to the notebook to disable enriched repr? I actually rather like the plain text output for demos

@ellisonbg
Copy link
Contributor Author

We have thought a little about this, but haven't implemented anything yet.

On Wed, Feb 22, 2012 at 7:38 PM, Wes McKinney
[email protected]
wrote:

Any plans to add an option to the notebook to disable enriched repr? I actually rather like the plain text output for demos


Reply to this email directly or view it on GitHub:
#772 (comment)

Brian E. Granger
Cal Poly State University, San Luis Obispo
[email protected] and [email protected]

@lodagro
Copy link
Contributor

lodagro commented Feb 23, 2012

Was patiently waiting for feedback, apparently my last comments did not reach github, so repeating

  • If print_config max_rows and max_columns are to be used, i would remove the scrollable div. Also I think there is no way to avoid double scrolling. Even if you make the div small, browser window can always be smaller, and a very small div looks rather impractical when using a full screen browser window.
  • Keeping the summary fall back option (although plain text), makes the html representation consistent with plain text display. I would keep this.
  • no css styling
  • had a look at why for the summary view the class header (=first line) is not printed. Apparently '<pre>' + multi_line_string + '</pre>' (currently used) is not the same as '<pre>\n' + multi_line_string + '\n</pre>'. From html point of view, both should be ok.
  • print_config could have an extra option notebook_html, to be used in _repr_html_(), when False _repr_html_() falls back to __repr__() wrapped in pre tag. This way html representation can be disabled (kinda).
  • probably need to do this for Series as well.

@ellisonbg
Copy link
Contributor Author

I don't really have time to work on this right now but a few comments:

  • I do think a scrollable div should be used, even if max_rows/max_columns are used.
  • I agree about keeping the summary view.
  • I agree that there should be a notebook_html print config option. When it is False, _repr_html_ should just return None at the standard __repr__ will be used.
  • Yep on Series as well.

lodagro added a commit to lodagro/pandas that referenced this pull request Feb 24, 2012
lodagro added a commit that referenced this pull request Feb 27, 2012
DataFrame_repr_html changes according to #772 discussion.
dan-nadler pushed a commit to dan-nadler/pandas that referenced this pull request Sep 23, 2019
Add global settings for caching list_libraries
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants