Skip to content

BUG: read_html does not parse correctly the header of non-string columns #5048

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alefnula opened this issue Sep 29, 2013 · 2 comments · Fixed by #4770
Closed

BUG: read_html does not parse correctly the header of non-string columns #5048

alefnula opened this issue Sep 29, 2013 · 2 comments · Fixed by #4770
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap
Milestone

Comments

@alefnula
Copy link
Contributor

I presume that the problem is that the data is first parsed and then the header is selected out. But when the dtype of the column is a number type the item that should become the column name, since it's not a valid number, becomes NaN.

Sample data:

data1 = io.StringIO(u'''<table>
    <thead>
        <tr>
            <th>Country</th>
            <th>Municipality</th>
            <th>Year</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Ukraine</td>
            <th>Odessa</th>
            <td>1944</td>
        </tr>
    </tbody>
</table>''')
data2 = io.StringIO(u'''
<table>
    <tbody>
        <tr>
            <th>Country</th>
            <th>Municipality</th>
            <th>Year</th>
        </tr>
        <tr>
            <td>Ukraine</td>
            <th>Odessa</th>
            <td>1944</td>
        </tr>
    </tbody>
</table>''')

Output:

>>> pd.read_html(data1)[0]
   Country Municipality  Year
0  Ukraine       Odessa  1944
>>> pd.read_html(data2, header=0)[0]
0  Country Municipality   NaN
1  Ukraine       Odessa  1944
@ghost ghost assigned cpcloud Sep 29, 2013
@cpcloud
Copy link
Member

cpcloud commented Sep 29, 2013

@alefnula Excellent. You essentially wrote the test for me :)

@cpcloud
Copy link
Member

cpcloud commented Sep 29, 2013

great this is now fixed in my refactor ... didn't have to do anything :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO HTML read_html, to_html, Styler.apply, Styler.applymap
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants