Skip to content

Series repr html only #29383

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 18 commits into from
Closed

Conversation

big-o
Copy link

@big-o big-o commented Nov 3, 2019

@big-o big-o mentioned this pull request Nov 3, 2019
5 tasks
@big-o
Copy link
Author

big-o commented Nov 3, 2019

Wasn't sure where to add the whatsnew entry so left it out for now. Grateful for any advice.

@big-o
Copy link
Author

big-o commented Nov 3, 2019

This PR supercedes #29248 which was a messy attempt at creating full HTML functionality for the Series class by creating a new HTMLColumnFormatter class and changing SeriesFormatter. This PR is much smaller in scope and hopefully easier to debug/merge/maintain. The scope can be expanded later to give full to_html functionality.

@big-o
Copy link
Author

big-o commented Nov 3, 2019

The HTML rendering looks just like that proposed in #29248:

series1a

however rather than doing this via an independent formatter, it just uses DataFrame.to_html without any header or footer information, and then inserts a new footer into the HTML.

@jreback jreback added the IO HTML read_html, to_html, Styler.apply, Styler.applymap label Nov 4, 2019
table_id=None,
render_links=False,
)
html = formatter.to_html(notebook=True).split("\n")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this approach drastically different than what was proposed in #27228 ?

Copy link
Author

@big-o big-o Nov 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, the only difference is in using the Formatter class rather than just invoking it via to_html. This is to allow support for respecting the min_rows, max_rows and show_dimensions options (which can't be specified in DataFrame.to_html).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would prefer to define a SeriesFormatter that inherits a common base class with DataFrameFormatter (callit GenericFromatter). This should't be much more complicated and would be way less fragile than this.

else:
assert "<td>{}</td>".format(val) in sm_html

assert "<p>Name: <b>{}</b>".format(small.name) in sm_html
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you construct sm_html literally instead? I think small enough to do; would make for stronger assertions

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a commit for this now - personally I still prefer the old approach which is just checking for the meaningful content and is relaxed about other things like HTML indentation which don't affect the meaning of the output. I also think it makes the tests more readable. But let me know which you prefer and I can always revert if you want.

@pep8speaks
Copy link

pep8speaks commented Nov 5, 2019

Hello @big-o! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-12-29 22:58:31 UTC

@big-o
Copy link
Author

big-o commented Nov 11, 2019

@WillAyd are you happy with the changes? I think everything is in place now but please let me know if any further work is needed. I made an assumption about where the What's new entry needs to go, let me know if this is wrong.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins any thoughts on this?

# TODO: Full independent HTML generation in SeriesFormatter, rather
# than depending on a limited subset of functionality via to_frame().

if get_option("display.notebook_repr_html"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a follow up I think should put this in the NDFrame superclass shared by DataFrame and Series rather than copy paste

@big-o
Copy link
Author

big-o commented Nov 17, 2019

Are you happy with everything in this PR now @WillAyd @simonjayhawkins? I noticed that some of the regression tests are failing again but I'm confident this isn't due to any of this work: Everything was passing then I had to merge in some other updates due to a clash when I updated the whatsnew RST with a new entry for my change. It looks like all the issues that are failing are from other tickets that are being worked on.

All the actual code changes are in 4d40bb6, which passes all regression tests.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge master again? Will probably solve CI issue

footer.append("Length: {rows}".format(rows=len(self)))
footer.append("dtype: <tt>{dtype}</tt>".format(dtype=self.dtype))

html.insert(tbl_end + 1, "<p>{footer}</p>".format(footer=", ".join(footer)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this still insert paragraph tags when no footer is required at all?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but a footer will always be required so there's no need to check len(footer) I believe. Just like the string representation, there will always be at least a dtype displayed in the footer. The footer is the main thing that distinguishes between a Series and a single-column DataFrame (the only other difference is the lack of column headers in Series).

I've merged in the latest master and all CI checks pass now.

table_id=None,
render_links=False,
)
html = formatter.to_html(notebook=True).split("\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would prefer to define a SeriesFormatter that inherits a common base class with DataFrameFormatter (callit GenericFromatter). This should't be much more complicated and would be way less fragile than this.

@big-o
Copy link
Author

big-o commented Nov 20, 2019

This is exactly what I did originally, minus the base class (#29248), but it @WillAyd asked for it to be replaced with this PR because they preferred a smaller change, which could be expanded on in future.

This PR is small and contained and would be easy to replace with a SeriesFormatter in the future. Making a SeriesFormatter with a common Base class to DataFrameFormatter is a lot of work, unless you're happy to have a lot of repeated code. This is why the original PR was rejected.

If you now want a series formatter, please take a look at #29248 and decide if you want to accept that instead. If you want a more complete implementation with a shared base class then this is a large task that I wouldn't have time for right now.

Please have a think and maybe discuss it with @WillAyd and then let me know how you want to proceed.

I'd ask you to reconsider this PR though - it leaves a small footprint on the project, is well tested and will be easy to remove when you have a more complete solution. A complete solution will probably take a long time to implement and you'll be without the feature for a long time.

@big-o
Copy link
Author

big-o commented Nov 29, 2019

@WillAyd @jreback have you had any thoughts on how you'd like to take this PR forward? Keen to get this feature into pandas but at the moment I've got some conflicting guidance on how to progress. If you need to think more about how you'd like to take this forward that's fine, but it would be good to know if it can be implemented in its current shape or if it needs to be reworked again first.

html = formatter.to_html(notebook=True).split("\n")

# Find out where the column ends - we will insert footer information here.
tbl_end = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@big-o I think would be better on this PR if you can do this inside of DataFrameFormatter in its construction itself, rather than 'finding' then end later. is this possible?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean adding an extra footer parameter to either DataFrameFormatter.__init__ or DataFrameFormatter.to_html? Since the DataFrameFormatter isn't limited to single column, data, what should be footer look like for frames with multiple columns? Should there even be one? Should an Exception be raised? Would appreciate some clear guidance on what is expected before I start on another rewrite, as I only have limited time to spend on this.

@jreback
Copy link
Contributor

jreback commented Dec 1, 2019

cc @jorisvandenbossche if you'd look here

@jreback
Copy link
Contributor

jreback commented Dec 27, 2019

can you merge master and we'll look again

@big-o
Copy link
Author

big-o commented Dec 29, 2019

can you merge master and we'll look again

Done - I'm assuming you don't want any changes other than those necessary for the merge right now as I haven't heard anything regarding which approach to take. Please let me know if you want any further changes and what they should be if so.

@WillAyd
Copy link
Member

WillAyd commented Feb 2, 2020

@big-o any chance you took a look at the design suggested by @jreback to have a GenericFormatter that DataFrameFormatter and SeriesFormatter would both inherit from? The latter are already defined in format.py so wondering if you have ideas on how to better construct a class hierarchy for them

@big-o
Copy link
Author

big-o commented Feb 3, 2020

I looked into it originally, as that was my plan too. It's quite a complicated class for someone who is new to the codebase so I think it would take me quite a while to unpick. That's why I originally went for a slightly messy compromise with #29248, which was rejected by @WillAyd who suggested the approach in this PR.

I know this PR isn't perfect but it gets a new feature in with almost no changes to the original codebase, and only small additions that can be easily removed and replaced with a proper SeriesFormatter at a later date.

I'd be happy to write the SeriesFormatter, but it will take some time for me to implement - as well as the time needed to figure it all out and make the code changes necessary, I also need to fit this work around my job. In the meantime people could be making use of the feature via this PR without the rest of the codebase being affected, so would it be possible to go along with the original decision of taking this more limited approach, giving a workable feature for people to use until the more complete solution comes along?

@WillAyd
Copy link
Member

WillAyd commented Jul 29, 2020

@big-o looks like the desired design is to have the GenericFormatter as a base which SeriesFormatter / DataFrameFormatter inherit from - is that something you are still looking at?

@WillAyd
Copy link
Member

WillAyd commented Sep 10, 2020

Closing as I think stale but ping if you'd like to give the suggested design approach a shot

@WillAyd WillAyd closed this Sep 10, 2020
@big-o
Copy link
Author

big-o commented Sep 11, 2020

Sorry for the slow response; my time has been limited lately. I need to prioritize other things right now but if I get time to look into it again I'll leave a comment here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Series do not display HTML repr
4 participants