-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
No way to force read numerics as string in read_html
#10534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks. Nice to add PR is welcome:) |
Just stumbled across this page with the same issue. @gte620v can you explain how to accomplish the raw html parsing given your PR? Thanks! |
Should be something like this: https://github.com/gte620v/pandas/blob/5cb8243f2dd31cc2155627f29cfc89bbf6d4b84b/pandas/io/tests/test_html.py#L715 Just use a converter to convert to |
@gte620v thanks for the info. It sounds like you can easily convert back to string, but can't prevent the automatic parsing in the first place. For example, keeping the leading zeros in an integer. Thanks again! |
@stevenmanton No, it does not convert back to string, it will prevent that it is parsed as numeric in the first place. In any case, leading zeros are preserved if you use
If you try that example, you will see that the leading zeros are preserved. |
As @jorisvandenbossche said, the converter does what you want. I made the PR to solve this exact problem. |
Thanks for the clarification guys. I saw "converter" I assumed it was parsing to string back from the inferred type. I'll use this fix :-) |
Should we have "dtypes" be an alias for "converters", to match pd.read_csv argument ? |
I tried using your solulition:-
But it removes the "," from the column values. |
@tuhinsharma121 That seems like a bug (the returned values are strings, but indeed should not remove the ","). Could you open a new issue for that? |
Same problem here. Looks like it tries to parse the numbers before converting them to strings. |
Any solution to the "," problem? |
Use converters converters = {
'col1': str,
'col2': str,
}
df = pd.read_html(str(table), converters=converters) |
can I work on a PR for this? |
When HTML table shows
01
in cell,read_html
reads it and interpret it as float and removes0
of01
.Options to read them as string?
The text was updated successfully, but these errors were encountered: