DOC: update the to_excel docstring #20185

robmarkcole · 2018-03-10T15:52:45Z

Checklist for the pandas documentation sprint (ignore this if you are doing
an unrelated PR):

[ x] PR title is "DOC: update the docstring"
[x ] The validation script passes: scripts/validate_docstrings.py <your-function-or-method>
[x ] The PEP8 style check passes: git diff upstream/master -u -- "*.py" | flake8 --diff
[ x] The html version looks good: python doc/make.py --single <your-function-or-method>
[x ] It has been proofread on language by another sprint participant

Errors found:
	Missing description for See Also "pd.read_excel" reference

cmdelatorre

What about the docstring validtion script output? There are many things to fix there (No extended summary, no examples...)

robmarkcole · 2018-03-10T20:02:21Z

@cmdelatorre I'm aware this is an incomplete PR and I'm sorry about that but we ran out of time on the sprint. I've added an extended summary covering the basics.

dukebody · 2018-03-11T21:56:53Z

pandas/core/generic.py

-        .. versionadded:: 0.20.0
+        .. versionadded:: 0.20.0.
+
+    Returns


If the function doesn't return anything I think it's better to omit this section.

dukebody · 2018-03-11T22:06:20Z

pandas/core/generic.py

        ``io.excel.xlsx.writer``, ``io.excel.xls.writer``, and
        ``io.excel.xlsm.writer``.
    merge_cells : boolean, default True
        Write MultiIndex and Hierarchical Rows as merged cells.
-    encoding: string, default None
-        encoding of the resulting excel file. Only necessary for xlwt,
+    encoding : string, default 'ascii'


How did you figure this out? Reading the code of DataFrame.to_excel it looks like this parameter is ignored!

You are correct

dukebody · 2018-03-11T22:08:22Z

pandas/core/generic.py

    float_format : string, default None
-        Format string for floating point numbers
+        Format string for floating point numbers.


Can you mention some examples and/or link to where the format of this parameter is documented?

Formatting takes place here using the string modulo operator which is described here. I've added an example.

dukebody · 2018-03-11T22:10:51Z

pandas/core/generic.py

    float_format : string, default None
-        Format string for floating point numbers
+        Format string for floating point numbers.
    columns : sequence, optional


Sequence of what? I think this should be "list of str or list of int" to select columns.

Also, add "If not specified, all columns will be written".

BTW I think this parameter probably makes no sense for Series.to_excel and the docstring is shared. We should find a way to not show this parameter in the docs in this case.

Added list of strings

dukebody · 2018-03-11T22:15:23Z

pandas/core/generic.py

+
+    See Also
+    --------
+    pd.read_excel


fix formatting: https://python-sprints.github.io/pandas/guide/pandas_docstring.html#section-5-see-also

pandas.read_excel

dukebody · 2018-03-11T22:18:03Z

pandas/core/generic.py

    engine : string, default None
-        write engine to use - you can also set this via the options
+        Write engine to use - you can also set this via the options


Which options? I understand the string is the fully qualified import part for the write engine class. Are the ones shown there examples? Can we specify that in the docs?

It appears the options are 'openpyxl', 'xlsxwriter', 'xlwt', although in my testing only 'openpyxl', 'xlsxwriter', work

codecov · 2018-03-13T21:10:08Z

Codecov Report

Merging #20185 into master will decrease coverage by 0.17%.
The diff coverage is 93.33%.

@@            Coverage Diff             @@
##           master   #20185      +/-   ##
==========================================
- Coverage      92%   91.82%   -0.18%     
==========================================
  Files         170      153      -17     
  Lines       50609    49256    -1353     
==========================================
- Hits        46561    45229    -1332     
+ Misses       4048     4027      -21

Flag	Coverage Δ
#multiple	`90.21% <93.33%> (-0.2%)`	⬇️
#single	`41.9% <23.33%> (-0.29%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`95.94% <93.33%> (-0.53%)`	⬇️
pandas/util/_doctools.py	`0% <0%> (-12.88%)`	⬇️
pandas/core/arrays/base.py	`84.14% <0%> (-3.72%)`	⬇️
pandas/io/formats/printing.py	`89.38% <0%> (-3.71%)`	⬇️
pandas/core/groupby/groupby.py	`92.55% <0%> (-3.58%)`	⬇️
pandas/core/dtypes/missing.py	`91.07% <0%> (-2.5%)`	⬇️
pandas/core/reshape/tile.py	`93.37% <0%> (-1.37%)`	⬇️
pandas/core/indexes/interval.py	`93.08% <0%> (-1.05%)`	⬇️
pandas/plotting/_core.py	`82.5% <0%> (-0.99%)`	⬇️
pandas/util/testing.py	`84.73% <0%> (-0.96%)`	⬇️
... and 94 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3e48393...b2a6496. Read the comment docs.

dukebody · 2018-03-14T07:02:31Z

pandas/core/generic.py


    Parameters
    ----------
    excel_writer : string or ExcelWriter object
-        File path or existing ExcelWriter
+        File path or existing ExcelWriter.


Can we link to the class? I believe

:class:`pandas.ExcelWriter`

should work.

Also, in the extended description we say that an ExcelWriter object is necessary to be able to use .to_excel, but in the parameter description we say that we also accept a "file path". So the ExcelWriter is not really needed? What happens if we specify a file path? I guess that a new ExcelWriter gets automatically created with default options.

Yes this is allowed df1.to_excel("example.xlsx", sheet_name='Sheet3')

Yes, the ExcelWriter is not required

dukebody · 2018-03-14T07:11:21Z

pandas/core/generic.py

+    Write %(klass)s to an excel sheet.
+
+    To write a %(klass)s to an excel .xlsx file it is necessary to first create
+    an ExcelWriter object with a target file name, and specify a sheet in the


Since ExcelWriter is a class I believe it should be between backticks to make it itallic like

`ExcelWriter`

dukebody · 2018-03-14T07:13:17Z

pandas/core/generic.py

+    ...                   columns=['col 1', 'col 2'])
+    >>> writer = pd.ExcelWriter('output.xlsx', engine='xlsxwriter')
+    >>> df1.to_excel(writer, sheet_name='Sheet1')
+    >>> writer.save()


I've observed this is only needed if we are creating an ExcelWriter explicitly. If one specifies a file path, like df.to_excel('my_file.xml') it is not needed.

Perhaps it makes sense to show first an example with a file path, which is simpler, and then an example with an ExcelWriter where we can specify more custom options?

dukebody · 2018-03-14T07:14:01Z

pandas/core/generic.py

+    >>> writer2.save()
+
+    Limit floats to a fixed precision using float_format. For example
+    float_format="%.2f" will format 0.1234 to 0.12.

    For compatibility with to_csv, to_excel serializes lists and dicts to


Can we add a link to the method pandas.DataFrame.to_csv here as well?

dukebody · 2018-03-14T07:16:10Z

pandas/core/generic.py

+    To write a %(klass)s to an excel .xlsx file it is necessary to first create
+    an ExcelWriter object with a target file name, and specify a sheet in the
+    file to write to. Multiple sheets may be written to by
+    specifying unique sheet_name. With all data written to the file it is


Sheet name between backquotes since it's a variable name, see http://numpydoc.readthedocs.io/en/latest/format.html#sections

robmarkcole · 2018-03-14T09:34:54Z

I've completely wiped by env to try and fix this error but still it persists. Means I am not able to run the validation script:

(pandas_dev) Robins-MacBook-Air:pandas robincole$ python scripts/validate_docstrings.py pandas.DataFrame.to_excel
Traceback (most recent call last):
  File "scripts/validate_docstrings.py", line 37, in <module>
    import pandas
  File "/Users/robincole/Documents/Github/pandas/pandas/__init__.py", line 42, in <module>
    from pandas.core.api import *
  File "/Users/robincole/Documents/Github/pandas/pandas/core/api.py", line 10, in <module>
    from pandas.core.groupby import Grouper
  File "/Users/robincole/Documents/Github/pandas/pandas/core/groupby.py", line 48, in <module>
    from pandas.core.frame import DataFrame
  File "/Users/robincole/Documents/Github/pandas/pandas/core/frame.py", line 75, in <module>
    from pandas.core.series import Series
  File "/Users/robincole/Documents/Github/pandas/pandas/core/series.py", line 115, in <module>
    class Series(base.IndexOpsMixin, generic.NDFrame):
  File "/Users/robincole/Documents/Github/pandas/pandas/core/series.py", line 2942, in Series
    @Appender(generic._shared_docs['to_excel'] % _shared_doc_kwargs)
TypeError: not enough arguments for format string

Hopefully I can fix this

pep8speaks · 2018-03-14T10:19:14Z

Hello @robmarkcole! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 24, 2018 at 16:02 Hours UTC

jorisvandenbossche · 2018-03-14T15:18:23Z

pandas/core/generic.py

+    Write %(klass)s to an excel sheet.
+
+    To write a %(klass)s to an excel .xlsx file it is necessary to first create
+    an `ExcelWriter` object with a target file name, and specify a sheet in the


As discussed, this is not required.
But I think it is needed if you want to write multiple sheets, so that would be good for in the examples.

See http://pandas-docs.github.io/pandas-docs-travis/io.html#writing-excel-files-to-disk

jorisvandenbossche · 2018-03-14T15:21:42Z

pandas/core/generic.py

+    See Also
+    --------
+    pandas.read_excel
+


Can you add pandas.ExcelWriter ?

jorisvandenbossche · 2018-03-14T15:24:15Z

pandas/core/generic.py

+    Limit floats to a fixed precision using float_format. For example
+    float_format="%.2f" will format 0.1234 to 0.12.
+
+    For compatibility with `to_csv <https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html>`__ ,


you can make this :meth:`~DataFrame.to_csv` to auto-generate the link

jorisvandenbossche · 2018-03-14T15:29:05Z

pandas/core/generic.py

+    data without rewriting the whole workbook.
+
+    Limit floats to a fixed precision using float_format. For example
+    float_format="%.2f" will format 0.1234 to 0.12.


So I think this one is generating the problem you have, because due to the % the templating / shared_docs system thinks it has to fill in something here, which it can't ...
You can fix it by doing "%%.2f" (and in the actual docstring it will look like a single %)

TomAugspurger · 2018-03-16T20:15:44Z

@robmarkcole were you able to sort out your issues? That error means that you're failing to pass a keyword to some other place that's using this shared docstring. Do you have time to finish this off?

robmarkcole · 2018-03-16T22:27:13Z

@TomAugspurger I want able to solve and an not sure how I will. I am on holiday this week without computer so will have to take a look next week. Sorry for the delay

jorisvandenbossche · 2018-03-16T23:03:49Z

See my comment here #20185 (comment) for a solution.

Sorry for the delay

No problem!

@jorisvandenbossche

Address comments of @jorisvandenbossche

robmarkcole · 2018-04-04T04:34:37Z

@TomAugspurger @jorisvandenbossche sorry this fell off my radar after my holiday, have addressed your comments now. I'm still unable to run the validation script but hopefully all changes are OK

WillAyd · 2018-07-10T06:24:29Z

pandas/core/generic.py

-    %(versionadded_to_excel)s
+    Write %(klass)s to an excel sheet.
+
+    To write a single %(klass)s to an excel .xlsx file it is only necessary to


I think this can be simplified if someone addresses #21835 with a See Also link to ExcelWriter instead of all of this detail (one or two lines may still suffice)

It is a little verbose, but I wouldn't cut this down that drastically IMO.

…cel_docstring_fixes

* Added notes * Moved float_format section * Removed "default None"

TomAugspurger · 2018-07-24T16:02:34Z

Thanks @robmarkcole!

Fix to_excel parameters

3cdb8b3

cmdelatorre suggested changes Mar 10, 2018

View reviewed changes

robmarkcole added 2 commits March 10, 2018 20:26

Extend example, add Also

0e9f733

Extend summary

8eb8330

dukebody reviewed Mar 11, 2018

View reviewed changes

jorisvandenbossche added the Docs label Mar 12, 2018

Address reviewer comments

48116e2

dukebody reviewed Mar 14, 2018

View reviewed changes

Address second round of comments

69921c6

jorisvandenbossche reviewed Mar 14, 2018

View reviewed changes

Address comments

6755281

Address comments of @jorisvandenbossche

WillAyd reviewed Jul 10, 2018

View reviewed changes

TomAugspurger added 2 commits July 24, 2018 10:55

Merge remote-tracking branch 'upstream/master' into robmarkcole-to_ex…

ede409d

…cel_docstring_fixes

Updated

b2a6496

* Added notes * Moved float_format section * Removed "default None"

TomAugspurger merged commit 171076c into pandas-dev:master Jul 24, 2018

robmarkcole deleted the to_excel_docstring_fixes branch July 25, 2018 04:49

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

DOC: update the to_excel docstring (pandas-dev#20185)

3c269e1

DOC: update the to_excel docstring #20185

DOC: update the to_excel docstring #20185

Conversation

robmarkcole commented Mar 10, 2018 • edited Loading

cmdelatorre left a comment

Choose a reason for hiding this comment

robmarkcole commented Mar 10, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robmarkcole Mar 13, 2018 • edited Loading

Choose a reason for hiding this comment

codecov bot commented Mar 13, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

robmarkcole commented Mar 14, 2018 • edited Loading

pep8speaks commented Mar 14, 2018 • edited Loading

Comment last updated on July 24, 2018 at 16:02 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Mar 16, 2018

robmarkcole commented Mar 16, 2018

jorisvandenbossche commented Mar 16, 2018

robmarkcole commented Apr 4, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Jul 24, 2018

robmarkcole commented Mar 10, 2018 •

edited

Loading

robmarkcole commented Mar 10, 2018 •

edited

Loading

robmarkcole Mar 13, 2018 •

edited

Loading

codecov bot commented Mar 13, 2018 •

edited

Loading

robmarkcole commented Mar 14, 2018 •

edited

Loading

pep8speaks commented Mar 14, 2018 •

edited

Loading