Skip to content

DOC: read_excel doc - fixed formatting and added examples #18753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Dec 30, 2017

Conversation

JanLauGe
Copy link
Contributor

@JanLauGe JanLauGe commented Dec 12, 2017

Fixes a formatting bug in the read_excel docs that caused a line break and bold print in list of _NA_VALUES. Adds examples in the read_excel docstring.

@JanLauGe
Copy link
Contributor Author

First pandas PR, please let me know about anything that should be done differently.
Thank you!

@chris-b1
Copy link
Contributor

Thanks @JanLauGe! to close #18735 I'd like to see the following

  1. add comment in code to the explicit named keyword arguments to the read_excel function, and subsequent internal calls

  2. add comment to the docstring for the same

  3. add a few tests for comment functionality

@codecov
Copy link

codecov bot commented Dec 13, 2017

Codecov Report

Merging #18753 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #18753   +/-   ##
=======================================
  Coverage   91.58%   91.58%           
=======================================
  Files         150      150           
  Lines       48972    48972           
=======================================
  Hits        44851    44851           
  Misses       4121     4121
Flag Coverage Δ
#multiple 89.94% <ø> (ø) ⬆️
#single 41.72% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update faeac49...6afed06. Read the comment docs.

@jreback jreback added the IO Excel read_excel, to_excel label Dec 13, 2017
@JanLauGe
Copy link
Contributor Author

Thanks @chris-b1 ! To make sure I understand: You'd like comment to be a named argument instead of a keyword argument?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice examples. pls add a whatsnew note in 0.22.0, you can put this in other api changes section, just say comment arg is exposed as a named parameter in read_excel.

@@ -1862,6 +1862,62 @@ def test_invalid_columns(self):
with pytest.raises(KeyError):
write_frame.to_excel(path, 'test1', columns=['C', 'D'])

def test_comment_arg(self):
# Test the comment argument functionality to read_excel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add issue number here as a comment

# Read file without comment arg
read_frame = read_excel(path, 'test_c')
read_frame_commented = read_excel(path, 'test_c', comment='#')
tm.assert_class_equal(read_frame, read_frame_commented)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be consistent with other tests on nameing, e.g.
write_frame -> df
result = read_excel

use assert_frame_equal

# Create file to read in
write_frame = DataFrame({'A': ['one', '#one', 'one'],
'B': ['two', 'two', '#two']})
write_frame.to_excel(path, 'test_c')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same on naming


# Test that all-comment lines at EoF are ignored
read_frame_short = read_excel(path, comment='#')
assert (read_frame_short.shape == write_frame.iloc[0:1, :].shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comapre using assert_frame_equal

Copy link
Contributor

@chris-b1 chris-b1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm with @jreback's comments addressed

@jreback jreback added the Docs label Dec 13, 2017
@JanLauGe
Copy link
Contributor Author

Thanks for the comments @jreback, that should be all done now.
Please let me know if there is anything else outstanding.

@alysivji
Copy link
Contributor

I spent a bunch of time lining up the arguments in read_excel to match read_csv

Would it be possible to keep following this pattern? Looks like comment can go between thousands and skip_footer

@JanLauGe
Copy link
Contributor Author

Good point @alysivji, correct now?

@alysivji
Copy link
Contributor

LGTM!

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. just a small doc comments. ping on green.

@@ -193,6 +193,7 @@ Other API Changes
- Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`)
- :func:`pandas.merge` now raises a ``ValueError`` when trying to merge on incompatible data types (:issue:`9780`)
- :func:`wide_to_long` previously kept numeric-like suffixes as ``object`` dtype. Now they are cast to numeric if possible (:issue:`17627`)
- comment arg is exposed as a named parameter in :func:`read_excel`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverse this statement, in :func:`read_excel`, the comment argument is now exposed as a named parameter (and need double-backticks)

@jreback jreback added this to the 0.22.0 milestone Dec 18, 2017
@JanLauGe
Copy link
Contributor Author

uhoh, seems I mocked something up.
Sorry @jreback, can you explain to me why is AppVeyor suddenly failing now?

@@ -223,7 +301,8 @@ def read_excel(io,
parse_dates=False,
date_parser=None,
thousands=None,
skipfooter=0,
comment=None,
skip_footer=0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests are failing because this parameter got renamed, should still be skipfooter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

@@ -198,6 +198,7 @@ Other API Changes
- Rearranged the order of keyword arguments in :func:`read_excel()` to align with :func:`read_csv()` (:issue:`16672`)
- :func:`pandas.merge` now raises a ``ValueError`` when trying to merge on incompatible data types (:issue:`9780`)
- :func:`wide_to_long` previously kept numeric-like suffixes as ``object`` dtype. Now they are cast to numeric if possible (:issue:`17627`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rebase on master. can you move to 0.23 (docs were renamed), prob easiest to just check this file from master and past in new one

>>> df_out = pd.DataFrame([('string1', 1),
... ('string2', 2),
... ('string3', 3)],
... columns=('Name', 'Value'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use a list for the column names (both are OK, but a list is IMO more idiomatic as tuples are also used for single labels of multi-indexed columns

@@ -137,7 +137,7 @@
na_values : scalar, str, list-like, or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific
per-column NA values. By default the following values are interpreted
as NaN: '""" + fill("', '".join(sorted(_NA_VALUES)), 70) + """'.
as NaN: '""" + fill("', '".join(sorted(_NA_VALUES)), 999) + """'.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see you mentioned this in the very first post of the PR. But, I don't think this is the fully correct solution, as this will make the line too long.
Apparently for the read_csv docstring this is working properly, so maybe check how they did it there?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the long line is not a problem, as line breaks in the input doc string do not translate to line breaks in the compiled output. I have tested compiling the docs locally and the output looked fine for me. However, since you referred to the read_csv doc string, I had a look at that. It uses , subsequent_indent=" ", so I will change it here to keep things consistent.


Index and header can be specified via the `index_col` and `header` arguments

>>> pd.read_excel(open('tmp.xlsx','rb'), index_col=None, header=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use here (and in the following ones as well) just the "'tmp.xlsx'" (without the open). It is fine to show this ability (above), but I would use the simpler case for the rest of the examples

as strings or lists of strings!

>>> pd.read_excel(open('tmp.xlsx','rb'),
... true_values='2',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has no effect on the example?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! true_values is not working as I expected. Did I misunderstand the argument or is it broken?

@pep8speaks
Copy link

pep8speaks commented Dec 29, 2017

Hello @JanLauGe! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 30, 2017 at 12:36 Hours UTC

@@ -132,12 +132,13 @@
nrows : int, default None
Number of rows to parse

.. versionadded:: 0.23.0
.. versionadded:: 0.22.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert this (0.22 is a special release)

skip_footer : int, default 0

.. deprecated:: 0.23.0
.. deprecated:: 0.22.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback
Copy link
Contributor

jreback commented Dec 30, 2017

pushed a fix for the lint issue. ping on green.

@jreback jreback merged commit e92d788 into pandas-dev:master Dec 30, 2017
@jreback
Copy link
Contributor

jreback commented Dec 30, 2017

thanks @JanLauGe

hexgnu pushed a commit to hexgnu/pandas that referenced this pull request Jan 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

read_excel kwarg for comment?
6 participants