read_html providing title from a attribute as well as the text - in effect duplicating output #20027
Labels
IO HTML
read_html, to_html, Styler.apply, Styler.applymap
Output-Formatting
__repr__ of pandas objects, to_string
Milestone
Code Sample
url = """https://en.wikipedia.org/wiki/List_of_winners_of_the_Boston_Marathon"""
tables = pd.read_html(url, header=0)
print(tables[0].head())
Problem description
The above code ''should' just extract the displayed text in the HTML table; what's in the dataframe should be what's displayed on screen. This isn't what happens. If the HTML contains a hyperlink with a title attribute, this is picked up and added to the dataframe, duplicating the data.
Expected Output
Output
Here's the actual output, the duplication is in the Athlete and Country/State columns.
The text was updated successfully, but these errors were encountered: