read_html providing title from a attribute as well as the text - in effect duplicating output

#### Code Sample

url = """https://en.wikipedia.org/wiki/List_of_winners_of_the_Boston_Marathon"""
tables = pd.read_html(url, header=0)
print(tables[0].head())

#### Problem description

The above code ''should' just extract the displayed text in the HTML table; what's in the dataframe should be what's displayed on screen. This isn't what happens. If the HTML contains a hyperlink with a title attribute, this is picked up and added to the dataframe, duplicating the data.

#### Expected Output

```
   Year                   Athlete  \
0  1897          John J. McDermott   
1  1898         Ronald J. MacDonald   
2  1899          Lawrence Brignolia   
3  1900         John "Jack" Caffery   
4  1901         John "Jack" Caffery   

                      Country/State     Time        Notes  
0                United States (NY)  2:55:10          NaN  
1                     Canada Canada  2:42:00          NaN  
2                United States (MA)  2:54:38          NaN  
3                            Canada  2:39:44          NaN  
4                            Canada  2:29:23  2nd victory 
```

#### Output

Here's the actual output, the duplication is in the Athlete and Country/State columns.

```
   Year                                  Athlete  
0  1897      McDermott, John J.John J. McDermott   
1  1898  MacDonald, Ronald J.Ronald J. MacDonald   
2  1899    Brignolia, LawrenceLawrence Brignolia   
3  1900         Caffery, JohnJohn "Jack" Caffery   
4  1901         Caffery, JohnJohn "Jack" Caffery   

                      Country/State     Time        Notes  
0  United States United States (NY)  2:55:10          NaN  
1                     Canada Canada  2:42:00          NaN  
2  United States United States (MA)  2:54:38          NaN  
3                     Canada Canada  2:39:44          NaN  
4                     Canada Canada  2:29:23  2nd victory 
```






Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

read_html providing title from a attribute as well as the text - in effect duplicating output #20027

Code Sample

Problem description

Expected Output

Output

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

read_html providing title from a attribute as well as the text - in effect duplicating output #20027

Description

Code Sample

Problem description

Expected Output

Output

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions