-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Adding additional keywords to read_html for #13461 #13575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Adding additional keywords to read_html for #13461 #13575
Conversation
@gte620v The suggestion in #10534 was to add In the same line, I think it would be useful to at once evaluate the other keywords to see which ones could be useful for html parsing. |
</tr> | ||
</tbody> | ||
</table>""" | ||
raw_data = np.array([[u'R_l0_g0', '0.763', 0.233], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls compare with expected DataFrame
.
My initial impression is that dtype can not be easily dropped in since it is not supported by the python engine parser: https://github.com/pydata/pandas/blob/master/pandas/io/parsers.py#L463-L470 For reasons I haven't poked around enough to understand, the
Sure, I'll take a look. It seems a bit cumbersome having to copy over args in 3 or 4 places to add them to Please let me know if you have any suggestions. |
It would be easiest to just pass around kwargs from Is there an existing better way to do that other than explicitly calling out the arguments in each of these places? If not, I can probably work about a better way to do it. Let me know if you have any thoughts.
|
Current coverage is 84.53%@@ master #13575 diff @@
==========================================
Files 138 141 +3
Lines 51157 51147 -10
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43132 43235 +103
+ Misses 8025 7912 -113
Partials 0 0
|
@jorisvandenbossche I added support for a few more arguments: I also consolidated the argument definitions from being called in several places to only being enumerated in the I could not easily add dtype support since read_html uses the python engine. Let me know if you want me to make more changes. |
- The ``pd.read_html()`` has gained support for the ``converters`` option (:issue:`13461`) | ||
- The ``pd.read_html()`` has gained support for the ``keep_default_na`` option (:issue:`13461`) | ||
- The ``pd.read_html()`` has gained support for the ``squeeze`` option (:issue:`13461`) | ||
- The ``pd.read_html()`` has gained support for the ``date_parser`` option (:issue:`13461`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you combine this in one entry?
@gte620v The way of passing through looks good! Thanks Regarding the extra keywords, maybe we should evaluate each keyword on its usefulness to add (sorry if I was a bit fast in pushing you to add more keywords):
OK regarding not adding |
Sure, will do! Let me know definitively about squeeze and date_parser. I can leave or remove. |
@jreback Opinion on adding |
|
@jreback Thanks. I removed squeeze and date_parser. I think it is good to go. Let me know if I misunderstood. |
@@ -207,6 +207,8 @@ Other enhancements | |||
- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``na_filter`` option (:issue:`13321`) | |||
- The ``pd.read_csv()`` with ``engine='python'`` has gained support for the ``memory_map`` option (:issue:`13381`) | |||
|
|||
- The ``pd.read_html()`` has gained support for the ``na_values``, ``converters``, ``keep_default_na`` options (:issue:`13461`) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need an update in the docs themsleves? e.g. an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: added to io.rst
Sure, I can do that. |
I added examples to the docs. Let me know if you want a modification or if I need to add something someplace else. |
@@ -1959,6 +1959,29 @@ Specify an HTML attribute | |||
dfs2 = read_html(url, attrs={'class': 'sortable'}) | |||
print(np.array_equal(dfs1[0], dfs2[0])) # Should be True | |||
|
|||
Specify values that should be converted to NaN | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a version added tag for these
minor comment. ping when green. |
@gte620v Thanks a lot! |
My pleasure! |
git diff upstream/master | flake8 --diff