-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DOC: read_excel doc - fixed formatting and added examples #18753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
First |
Codecov Report
@@ Coverage Diff @@
## master #18753 +/- ##
=======================================
Coverage 91.58% 91.58%
=======================================
Files 150 150
Lines 48972 48972
=======================================
Hits 44851 44851
Misses 4121 4121
Continue to review full report at Codecov.
|
Thanks @chris-b1 ! To make sure I understand: You'd like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice examples. pls add a whatsnew note in 0.22.0, you can put this in other api changes section, just say comment arg is exposed as a named parameter in read_excel.
pandas/tests/io/test_excel.py
Outdated
@@ -1862,6 +1862,62 @@ def test_invalid_columns(self): | |||
with pytest.raises(KeyError): | |||
write_frame.to_excel(path, 'test1', columns=['C', 'D']) | |||
|
|||
def test_comment_arg(self): | |||
# Test the comment argument functionality to read_excel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add issue number here as a comment
pandas/tests/io/test_excel.py
Outdated
# Read file without comment arg | ||
read_frame = read_excel(path, 'test_c') | ||
read_frame_commented = read_excel(path, 'test_c', comment='#') | ||
tm.assert_class_equal(read_frame, read_frame_commented) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be consistent with other tests on nameing, e.g.
write_frame -> df
result = read_excel
use assert_frame_equal
pandas/tests/io/test_excel.py
Outdated
# Create file to read in | ||
write_frame = DataFrame({'A': ['one', '#one', 'one'], | ||
'B': ['two', 'two', '#two']}) | ||
write_frame.to_excel(path, 'test_c') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same on naming
pandas/tests/io/test_excel.py
Outdated
|
||
# Test that all-comment lines at EoF are ignored | ||
read_frame_short = read_excel(path, comment='#') | ||
assert (read_frame_short.shape == write_frame.iloc[0:1, :].shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comapre using assert_frame_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm with @jreback's comments addressed
Thanks for the comments @jreback, that should be all done now. |
I spent a bunch of time lining up the arguments in read_excel to match read_csv Would it be possible to keep following this pattern? Looks like |
Good point @alysivji, correct now? |
LGTM! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. just a small doc comments. ping on green.
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -193,6 +193,7 @@ Other API Changes | |||
- Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`) | |||
- :func:`pandas.merge` now raises a ``ValueError`` when trying to merge on incompatible data types (:issue:`9780`) | |||
- :func:`wide_to_long` previously kept numeric-like suffixes as ``object`` dtype. Now they are cast to numeric if possible (:issue:`17627`) | |||
- comment arg is exposed as a named parameter in :func:`read_excel` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverse this statement, in :func:`read_excel`, the
comment argument is now exposed as a named parameter
(and need double-backticks)
uhoh, seems I mocked something up. |
pandas/io/excel.py
Outdated
@@ -223,7 +301,8 @@ def read_excel(io, | |||
parse_dates=False, | |||
date_parser=None, | |||
thousands=None, | |||
skipfooter=0, | |||
comment=None, | |||
skip_footer=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tests are failing because this parameter got renamed, should still be skipfooter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
doc/source/whatsnew/v0.22.0.txt
Outdated
@@ -198,6 +198,7 @@ Other API Changes | |||
- Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`) | |||
- :func:`pandas.merge` now raises a ``ValueError`` when trying to merge on incompatible data types (:issue:`9780`) | |||
- :func:`wide_to_long` previously kept numeric-like suffixes as ``object`` dtype. Now they are cast to numeric if possible (:issue:`17627`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase on master. can you move to 0.23 (docs were renamed), prob easiest to just check this file from master and past in new one
pandas/io/excel.py
Outdated
>>> df_out = pd.DataFrame([('string1', 1), | ||
... ('string2', 2), | ||
... ('string3', 3)], | ||
... columns=('Name', 'Value')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use a list for the column names (both are OK, but a list is IMO more idiomatic as tuples are also used for single labels of multi-indexed columns
pandas/io/excel.py
Outdated
@@ -137,7 +137,7 @@ | |||
na_values : scalar, str, list-like, or dict, default None | |||
Additional strings to recognize as NA/NaN. If dict passed, specific | |||
per-column NA values. By default the following values are interpreted | |||
as NaN: '""" + fill("', '".join(sorted(_NA_VALUES)), 70) + """'. | |||
as NaN: '""" + fill("', '".join(sorted(_NA_VALUES)), 999) + """'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see you mentioned this in the very first post of the PR. But, I don't think this is the fully correct solution, as this will make the line too long.
Apparently for the read_csv
docstring this is working properly, so maybe check how they did it there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the long line is not a problem, as line breaks in the input doc string do not translate to line breaks in the compiled output. I have tested compiling the docs locally and the output looked fine for me. However, since you referred to the read_csv doc string, I had a look at that. It uses , subsequent_indent=" "
, so I will change it here to keep things consistent.
pandas/io/excel.py
Outdated
|
||
Index and header can be specified via the `index_col` and `header` arguments | ||
|
||
>>> pd.read_excel(open('tmp.xlsx','rb'), index_col=None, header=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use here (and in the following ones as well) just the "'tmp.xlsx'" (without the open). It is fine to show this ability (above), but I would use the simpler case for the rest of the examples
pandas/io/excel.py
Outdated
as strings or lists of strings! | ||
|
||
>>> pd.read_excel(open('tmp.xlsx','rb'), | ||
... true_values='2', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has no effect on the example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right! true_values
is not working as I expected. Did I misunderstand the argument or is it broken?
610073a
to
fda8fa2
Compare
Hello @JanLauGe! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on December 30, 2017 at 12:36 Hours UTC |
pandas/io/excel.py
Outdated
@@ -132,12 +132,13 @@ | |||
nrows : int, default None | |||
Number of rows to parse | |||
|
|||
.. versionadded:: 0.23.0 | |||
.. versionadded:: 0.22.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this (0.22 is a special release)
pandas/io/excel.py
Outdated
skip_footer : int, default 0 | ||
|
||
.. deprecated:: 0.23.0 | ||
.. deprecated:: 0.22.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
pushed a fix for the lint issue. ping on green. |
thanks @JanLauGe |
Fixes a formatting bug in the
read_excel
docs that caused a line break and bold print in list of_NA_VALUES
. Adds examples in theread_excel
docstring.comment
? #18735git diff master -u -- "*.py" | flake8 --diff