-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Series repr html only #29383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Series repr html only #29383
Conversation
Wasn't sure where to add the whatsnew entry so left it out for now. Grateful for any advice. |
This PR supercedes #29248 which was a messy attempt at creating full HTML functionality for the Series class by creating a new HTMLColumnFormatter class and changing SeriesFormatter. This PR is much smaller in scope and hopefully easier to debug/merge/maintain. The scope can be expanded later to give full to_html functionality. |
The HTML rendering looks just like that proposed in #29248: however rather than doing this via an independent formatter, it just uses DataFrame.to_html without any header or footer information, and then inserts a new footer into the HTML. |
table_id=None, | ||
render_links=False, | ||
) | ||
html = formatter.to_html(notebook=True).split("\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this approach drastically different than what was proposed in #27228 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, the only difference is in using the Formatter class rather than just invoking it via to_html
. This is to allow support for respecting the min_rows, max_rows and show_dimensions options (which can't be specified in DataFrame.to_html
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would prefer to define a SeriesFormatter that inherits a common base class with DataFrameFormatter (callit GenericFromatter). This should't be much more complicated and would be way less fragile than this.
else: | ||
assert "<td>{}</td>".format(val) in sm_html | ||
|
||
assert "<p>Name: <b>{}</b>".format(small.name) in sm_html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you construct sm_html literally instead? I think small enough to do; would make for stronger assertions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit for this now - personally I still prefer the old approach which is just checking for the meaningful content and is relaxed about other things like HTML indentation which don't affect the meaning of the output. I also think it makes the tests more readable. But let me know which you prefer and I can always revert if you want.
@WillAyd are you happy with the changes? I think everything is in place now but please let me know if any further work is needed. I made an assumption about where the What's new entry needs to go, let me know if this is wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonjayhawkins any thoughts on this?
# TODO: Full independent HTML generation in SeriesFormatter, rather | ||
# than depending on a limited subset of functionality via to_frame(). | ||
|
||
if get_option("display.notebook_repr_html"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a follow up I think should put this in the NDFrame superclass shared by DataFrame and Series rather than copy paste
Are you happy with everything in this PR now @WillAyd @simonjayhawkins? I noticed that some of the regression tests are failing again but I'm confident this isn't due to any of this work: Everything was passing then I had to merge in some other updates due to a clash when I updated the whatsnew RST with a new entry for my change. It looks like all the issues that are failing are from other tickets that are being worked on. All the actual code changes are in 4d40bb6, which passes all regression tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you merge master again? Will probably solve CI issue
footer.append("Length: {rows}".format(rows=len(self))) | ||
footer.append("dtype: <tt>{dtype}</tt>".format(dtype=self.dtype)) | ||
|
||
html.insert(tbl_end + 1, "<p>{footer}</p>".format(footer=", ".join(footer))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this still insert paragraph tags when no footer is required at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but a footer will always be required so there's no need to check len(footer)
I believe. Just like the string representation, there will always be at least a dtype
displayed in the footer. The footer is the main thing that distinguishes between a Series
and a single-column DataFrame
(the only other difference is the lack of column headers in Series
).
I've merged in the latest master and all CI checks pass now.
table_id=None, | ||
render_links=False, | ||
) | ||
html = formatter.to_html(notebook=True).split("\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we would prefer to define a SeriesFormatter that inherits a common base class with DataFrameFormatter (callit GenericFromatter). This should't be much more complicated and would be way less fragile than this.
This is exactly what I did originally, minus the base class (#29248), but it @WillAyd asked for it to be replaced with this PR because they preferred a smaller change, which could be expanded on in future. This PR is small and contained and would be easy to replace with a SeriesFormatter in the future. Making a SeriesFormatter with a common Base class to DataFrameFormatter is a lot of work, unless you're happy to have a lot of repeated code. This is why the original PR was rejected. If you now want a series formatter, please take a look at #29248 and decide if you want to accept that instead. If you want a more complete implementation with a shared base class then this is a large task that I wouldn't have time for right now. Please have a think and maybe discuss it with @WillAyd and then let me know how you want to proceed. I'd ask you to reconsider this PR though - it leaves a small footprint on the project, is well tested and will be easy to remove when you have a more complete solution. A complete solution will probably take a long time to implement and you'll be without the feature for a long time. |
@WillAyd @jreback have you had any thoughts on how you'd like to take this PR forward? Keen to get this feature into pandas but at the moment I've got some conflicting guidance on how to progress. If you need to think more about how you'd like to take this forward that's fine, but it would be good to know if it can be implemented in its current shape or if it needs to be reworked again first. |
html = formatter.to_html(notebook=True).split("\n") | ||
|
||
# Find out where the column ends - we will insert footer information here. | ||
tbl_end = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@big-o I think would be better on this PR if you can do this inside of DataFrameFormatter in its construction itself, rather than 'finding' then end later. is this possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean adding an extra footer
parameter to either DataFrameFormatter.__init__
or DataFrameFormatter.to_html
? Since the DataFrameFormatter
isn't limited to single column, data, what should be footer look like for frames with multiple columns? Should there even be one? Should an Exception be raised? Would appreciate some clear guidance on what is expected before I start on another rewrite, as I only have limited time to spend on this.
cc @jorisvandenbossche if you'd look here |
can you merge master and we'll look again |
Done - I'm assuming you don't want any changes other than those necessary for the merge right now as I haven't heard anything regarding which approach to take. Please let me know if you want any further changes and what they should be if so. |
I looked into it originally, as that was my plan too. It's quite a complicated class for someone who is new to the codebase so I think it would take me quite a while to unpick. That's why I originally went for a slightly messy compromise with #29248, which was rejected by @WillAyd who suggested the approach in this PR. I know this PR isn't perfect but it gets a new feature in with almost no changes to the original codebase, and only small additions that can be easily removed and replaced with a proper I'd be happy to write the |
@big-o looks like the desired design is to have the GenericFormatter as a base which SeriesFormatter / DataFrameFormatter inherit from - is that something you are still looking at? |
Closing as I think stale but ping if you'd like to give the suggested design approach a shot |
Sorry for the slow response; my time has been limited lately. I need to prioritize other things right now but if I get time to look into it again I'll leave a comment here. |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff