Skip to content

DOC: update the to_excel docstring #20185

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 24, 2018

Conversation

robmarkcole
Copy link
Contributor

@robmarkcole robmarkcole commented Mar 10, 2018

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

  • [ x] PR title is "DOC: update the docstring"
  • [x ] The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
  • [x ] The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
  • [ x] The html version looks good: python doc/make.py --single <your-function-or-method>
  • [x ] It has been proofread on language by another sprint participant
Errors found:
	Missing description for See Also "pd.read_excel" reference

Copy link

@cmdelatorre cmdelatorre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the docstring validtion script output? There are many things to fix there (No extended summary, no examples...)

@robmarkcole
Copy link
Contributor Author

robmarkcole commented Mar 10, 2018

@cmdelatorre I'm aware this is an incomplete PR and I'm sorry about that but we ran out of time on the sprint. I've added an extended summary covering the basics.

.. versionadded:: 0.20.0
.. versionadded:: 0.20.0.

Returns
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the function doesn't return anything I think it's better to omit this section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

``io.excel.xlsx.writer``, ``io.excel.xls.writer``, and
``io.excel.xlsm.writer``.
merge_cells : boolean, default True
Write MultiIndex and Hierarchical Rows as merged cells.
encoding: string, default None
encoding of the resulting excel file. Only necessary for xlwt,
encoding : string, default 'ascii'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you figure this out? Reading the code of DataFrame.to_excel it looks like this parameter is ignored!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct

float_format : string, default None
Format string for floating point numbers
Format string for floating point numbers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention some examples and/or link to where the format of this parameter is documented?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting takes place here using the string modulo operator which is described here. I've added an example.

float_format : string, default None
Format string for floating point numbers
Format string for floating point numbers.
columns : sequence, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sequence of what? I think this should be "list of str or list of int" to select columns.

Also, add "If not specified, all columns will be written".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I think this parameter probably makes no sense for Series.to_excel and the docstring is shared. We should find a way to not show this parameter in the docs in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added list of strings


See Also
--------
pd.read_excel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.read_excel

engine : string, default None
write engine to use - you can also set this via the options
Write engine to use - you can also set this via the options
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which options? I understand the string is the fully qualified import part for the write engine class. Are the ones shown there examples? Can we specify that in the docs?

Copy link
Contributor Author

@robmarkcole robmarkcole Mar 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears the options are 'openpyxl', 'xlsxwriter', 'xlwt', although in my testing only 'openpyxl', 'xlsxwriter', work

@codecov
Copy link

codecov bot commented Mar 13, 2018

Codecov Report

Merging #20185 into master will decrease coverage by 0.17%.
The diff coverage is 93.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20185      +/-   ##
==========================================
- Coverage      92%   91.82%   -0.18%     
==========================================
  Files         170      153      -17     
  Lines       50609    49256    -1353     
==========================================
- Hits        46561    45229    -1332     
+ Misses       4048     4027      -21
Flag Coverage Δ
#multiple 90.21% <93.33%> (-0.2%) ⬇️
#single 41.9% <23.33%> (-0.29%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 95.94% <93.33%> (-0.53%) ⬇️
pandas/util/_doctools.py 0% <0%> (-12.88%) ⬇️
pandas/core/arrays/base.py 84.14% <0%> (-3.72%) ⬇️
pandas/io/formats/printing.py 89.38% <0%> (-3.71%) ⬇️
pandas/core/groupby/groupby.py 92.55% <0%> (-3.58%) ⬇️
pandas/core/dtypes/missing.py 91.07% <0%> (-2.5%) ⬇️
pandas/core/reshape/tile.py 93.37% <0%> (-1.37%) ⬇️
pandas/core/indexes/interval.py 93.08% <0%> (-1.05%) ⬇️
pandas/plotting/_core.py 82.5% <0%> (-0.99%) ⬇️
pandas/util/testing.py 84.73% <0%> (-0.96%) ⬇️
... and 94 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e48393...b2a6496. Read the comment docs.


Parameters
----------
excel_writer : string or ExcelWriter object
File path or existing ExcelWriter
File path or existing ExcelWriter.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we link to the class? I believe

:class:`pandas.ExcelWriter`

should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in the extended description we say that an ExcelWriter object is necessary to be able to use .to_excel, but in the parameter description we say that we also accept a "file path". So the ExcelWriter is not really needed? What happens if we specify a file path? I guess that a new ExcelWriter gets automatically created with default options.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this is allowed df1.to_excel("example.xlsx", sheet_name='Sheet3')

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the ExcelWriter is not required

Write %(klass)s to an excel sheet.

To write a %(klass)s to an excel .xlsx file it is necessary to first create
an ExcelWriter object with a target file name, and specify a sheet in the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since ExcelWriter is a class I believe it should be between backticks to make it itallic like

`ExcelWriter`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

... columns=['col 1', 'col 2'])
>>> writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
>>> df1.to_excel(writer, sheet_name='Sheet1')
>>> writer.save()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've observed this is only needed if we are creating an ExcelWriter explicitly. If one specifies a file path, like df.to_excel('my_file.xml') it is not needed.

Perhaps it makes sense to show first an example with a file path, which is simpler, and then an example with an ExcelWriter where we can specify more custom options?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

>>> writer2.save()

Limit floats to a fixed precision using float_format. For example
float_format="%.2f" will format 0.1234 to 0.12.

For compatibility with to_csv, to_excel serializes lists and dicts to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a link to the method pandas.DataFrame.to_csv here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

To write a %(klass)s to an excel .xlsx file it is necessary to first create
an ExcelWriter object with a target file name, and specify a sheet in the
file to write to. Multiple sheets may be written to by
specifying unique sheet_name. With all data written to the file it is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sheet name between backquotes since it's a variable name, see http://numpydoc.readthedocs.io/en/latest/format.html#sections

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@robmarkcole
Copy link
Contributor Author

robmarkcole commented Mar 14, 2018

I've completely wiped by env to try and fix this error but still it persists. Means I am not able to run the validation script:

(pandas_dev) Robins-MacBook-Air:pandas robincole$ python scripts/validate_docstrings.py pandas.DataFrame.to_excel
Traceback (most recent call last):
  File "scripts/validate_docstrings.py", line 37, in <module>
    import pandas
  File "/Users/robincole/Documents/Github/pandas/pandas/__init__.py", line 42, in <module>
    from pandas.core.api import *
  File "/Users/robincole/Documents/Github/pandas/pandas/core/api.py", line 10, in <module>
    from pandas.core.groupby import Grouper
  File "/Users/robincole/Documents/Github/pandas/pandas/core/groupby.py", line 48, in <module>
    from pandas.core.frame import DataFrame
  File "/Users/robincole/Documents/Github/pandas/pandas/core/frame.py", line 75, in <module>
    from pandas.core.series import Series
  File "/Users/robincole/Documents/Github/pandas/pandas/core/series.py", line 115, in <module>
    class Series(base.IndexOpsMixin, generic.NDFrame):
  File "/Users/robincole/Documents/Github/pandas/pandas/core/series.py", line 2942, in Series
    @Appender(generic._shared_docs['to_excel'] % _shared_doc_kwargs)
TypeError: not enough arguments for format string

Hopefully I can fix this

@pep8speaks
Copy link

pep8speaks commented Mar 14, 2018

Hello @robmarkcole! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 24, 2018 at 16:02 Hours UTC

Write %(klass)s to an excel sheet.

To write a %(klass)s to an excel .xlsx file it is necessary to first create
an `ExcelWriter` object with a target file name, and specify a sheet in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, this is not required.
But I think it is needed if you want to write multiple sheets, so that would be good for in the examples.

See http://pandas-docs.github.io/pandas-docs-travis/io.html#writing-excel-files-to-disk

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

See Also
--------
pandas.read_excel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add pandas.ExcelWriter ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

Limit floats to a fixed precision using float_format. For example
float_format="%.2f" will format 0.1234 to 0.12.

For compatibility with `to_csv <https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html>`__ ,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can make this :meth:`~DataFrame.to_csv` to auto-generate the link

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

data without rewriting the whole workbook.

Limit floats to a fixed precision using float_format. For example
float_format="%.2f" will format 0.1234 to 0.12.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think this one is generating the problem you have, because due to the % the templating / shared_docs system thinks it has to fill in something here, which it can't ...
You can fix it by doing "%%.2f" (and in the actual docstring it will look like a single %)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@TomAugspurger
Copy link
Contributor

@robmarkcole were you able to sort out your issues? That error means that you're failing to pass a keyword to some other place that's using this shared docstring. Do you have time to finish this off?

@robmarkcole
Copy link
Contributor Author

@TomAugspurger I want able to solve and an not sure how I will. I am on holiday this week without computer so will have to take a look next week. Sorry for the delay

@jorisvandenbossche
Copy link
Member

See my comment here #20185 (comment) for a solution.

Sorry for the delay

No problem!

@robmarkcole
Copy link
Contributor Author

@TomAugspurger @jorisvandenbossche sorry this fell off my radar after my holiday, have addressed your comments now. I'm still unable to run the validation script but hopefully all changes are OK

%(versionadded_to_excel)s
Write %(klass)s to an excel sheet.

To write a single %(klass)s to an excel .xlsx file it is only necessary to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be simplified if someone addresses #21835 with a See Also link to ExcelWriter instead of all of this detail (one or two lines may still suffice)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a little verbose, but I wouldn't cut this down that drastically IMO.

* Added notes
* Moved float_format section
* Removed "default None"
@TomAugspurger
Copy link
Contributor

Thanks @robmarkcole!

@TomAugspurger TomAugspurger merged commit 171076c into pandas-dev:master Jul 24, 2018
@robmarkcole robmarkcole deleted the to_excel_docstring_fixes branch July 25, 2018 04:49
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants