Skip to content

BUG: Convert <br> to space in pd.read_html #45972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Apr 10, 2022

Conversation

abmyii
Copy link
Contributor

@abmyii abmyii commented Feb 13, 2022

@abmyii abmyii force-pushed the read_html-br-to-space branch from fc15428 to 0c9e027 Compare February 13, 2022 17:11
@attack68
Copy link
Contributor

attack68 commented Feb 14, 2022

since <br> is a line break is it better to convert it to \n?

@abmyii
Copy link
Contributor Author

abmyii commented Feb 14, 2022

since <br> is a line break is it better to convert it to \n?

I do convert it to \n in the code - however pandas either replaces it or renders it as a space. I'm not sure if it is worth forcing the matter. Thoughts?

@jreback jreback added the IO HTML read_html, to_html, Styler.apply, Styler.applymap label Feb 27, 2022
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Mar 30, 2022
@abmyii
Copy link
Contributor Author

abmyii commented Mar 30, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

Ping @attack68

@attack68
Copy link
Contributor

Can you merge master.

This looks good to me, the solution opens the door to a list of general replacements if we also want to go that way in the future.

Copy link
Contributor

@attack68 attack68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs merge

@abmyii
Copy link
Contributor Author

abmyii commented Mar 30, 2022

needs merge

OK now?

@abmyii abmyii requested a review from attack68 April 7, 2022 14:17
@jreback jreback added this to the 1.5 milestone Apr 9, 2022
@jreback
Copy link
Contributor

jreback commented Apr 9, 2022

@attack68 ok here?

Copy link
Contributor

@attack68 attack68 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jreback jreback merged commit cb980fa into pandas-dev:main Apr 10, 2022
@jreback
Copy link
Contributor

jreback commented Apr 10, 2022

thanks @abmyii

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
@HymanHuang
Copy link

since <br> is a line break is it better to convert it to \n?

I do convert it to \n in the code - however pandas either replaces it or renders it as a space. I'm not sure if it is worth forcing the matter. Thoughts?

function _remove_whitespace caused it

image

@attack68 @abmyii

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pd.read_html() convert <br> to space
4 participants