DOC: update docs for read_csv().na_values and keep_default_na #14030

OXPHOS · 2016-08-18T08:05:13Z

closes DOC: read_excel() documentation for na_values should show default na values #13967
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

jreback · 2016-08-18T10:42:23Z

pandas/io/excel.py

@@ -25,6 +25,7 @@
 import pandas.core.common as com
 from warnings import warn
 from distutils.version import LooseVersion
+from parsers import _NA_VALUES


add the full import path. Actually we should move these to pandas.io.common.

codecov-io · 2016-08-18T19:02:34Z

Current coverage is 85.25% (diff: 100%)

Merging #14030 into master will decrease coverage by <.01%

@@             master     #14030   diff @@
==========================================
  Files           139        139          
  Lines         50380      50383     +3   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42953      42955     +2   
- Misses         7427       7428     +1   
  Partials          0          0

Powered by Codecov. Last update 453bc26...ff1964b

OXPHOS · 2016-08-18T19:26:17Z

Do I need add what's new for this? And if so, in which what's new file?

Also I am not sure about this error:

2016-08-18 18:50:05 (7.84 MB/s) - 'miniconda.sh' saved [21310143/21310143]
ERROR: File or directory already exists: /Users/travis/miniconda

jorisvandenbossche · 2016-08-18T20:02:34Z

whatsnew is not needed. And don't worry about the travis error, that is something unrelated.

jorisvandenbossche · 2016-08-18T20:10:50Z

pandas/io/excel.py

+    na_values : str or list-like or dict, default None
+        Additional strings to recognize as NA/NaN. If dict passed, specific
+        per-column NA values. By default the following values are interpreted
+        as NaN: `'""" + "'`, `'".join(sorted(_NA_VALUES)) + """'`.


I am afraid this does not work with docstrings. You will have to hardcode them like it is done in the read_csv docstring

OXPHOS · 2016-08-19T07:36:22Z

@jorisvandenbossche Hey I made some changes. Is there anyway I can preview it?

jorisvandenbossche · 2016-08-19T07:38:05Z

You can just look at the docstring in the console to see if it looks ok (with ipython: pd.read_excel?)

jreback · 2016-08-19T10:15:23Z

pandas/io/excel.py

 def read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0,
               index_col=None, names=None, parse_cols=None, parse_dates=False,
               date_parser=None, na_values=None, thousands=None,
               convert_float=True, has_index_names=None, converters=None,
               engine=None, squeeze=False, **kwds):
-    """


I don't think its necessary to make this an Appender doc-string. We generally only do that if you need it more than once. @jorisvandenbossche ?

Well, it's to be able to automatically inject the NA_VALUES instead of manually writing it in the docstring.

It's a bit a trade-off here: is the added complexity of moving the docstring away from the actual function worth it for the 1 line of na values that is written manually (and has to be updated if the default recognized NaN values are changed).

oh I c. Then this is ok. We want to have the common NA values in a single place

jorisvandenbossche · 2016-08-19T11:08:26Z

pandas/io/excel.py

+na_values : str or list-like or dict, default None
+    Additional strings to recognize as NA/NaN. If dict passed, specific
+    per-column NA values. By default the following values are interpreted
+    as NaN: `'""" + "'`, `'".join(sorted(_NA_VALUES)) + """'`.


I think the backticks are not needed around the strings. For the rest, the docstring looks fine in the console!

OXPHOS · 2016-08-20T00:31:15Z

I squashed and removed the backticks.

I did test:

In [12]: _NA_VALUES = set([
   ....:     '-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A',
   ....:     'N/A', 'NA', '#NA', 'NULL', 'NaN', '-NaN', 'nan', '-nan', ''
   ....: ])

In [13]: s = """as NaN: '""" + "', '".join(sorted(_NA_VALUES)) + """'."""

In [14]: print s
as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'nan'.

I didn't check the doc with installation and pd.read_excel? because I start getting weird error out of nowhere. I am trying to fix it but I have no clue how this is happening yet. So should I post the problem in somewhere if I have no luck hacking it? Like google group or irc channel? Thanks!

OXPHOS · 2016-08-20T08:11:46Z

Okay I fixed it. Something's wrong with Cython.
Ready to merge!

jreback · 2016-08-20T13:26:32Z

thanks!

Can you do a followup where we make this wrap on the NaNs (they generate an extra long line)

na_values : str or list-like or dict, default None
    Additional strings to recognize as NA/NaN. If dict passed, specific
    per-column NA values. By default the following values are interpreted
    as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'nan'.
thousands : str, default None
    Thousands separator for parsing string columns to numeric.  Note that
    this parameter is only necessary for columns stored as TEXT in Excel,
    any numeric columns will automatically be parsed, regardless of display
    format.

OXPHOS · 2016-08-22T05:56:42Z

Sure I'll try!

sinhrks added the Docs label Aug 18, 2016

jreback reviewed Aug 18, 2016
View reviewed changes

jreback added the IO Excel read_excel, to_excel label Aug 18, 2016

OXPHOS force-pushed the excel_na branch from cd1ba19 to e67209f Compare August 18, 2016 18:39

jorisvandenbossche reviewed Aug 18, 2016
View reviewed changes

OXPHOS force-pushed the excel_na branch from e67209f to 095322e Compare August 19, 2016 04:22

jreback reviewed Aug 19, 2016
View reviewed changes

jreback added this to the 0.19.0 milestone Aug 19, 2016

jorisvandenbossche reviewed Aug 19, 2016
View reviewed changes

GH13967: move around _NA_VALUES and add doc for read_csv().na_values

ff1964b

OXPHOS force-pushed the excel_na branch from 095322e to ff1964b Compare August 19, 2016 18:14

jreback closed this in 5c78ee6 Aug 20, 2016

jorisvandenbossche mentioned this pull request Aug 22, 2016

Document the "default NaN values" in Parser #5153

Closed

OXPHOS mentioned this pull request Aug 23, 2016

DOC: split docstring into multiple lines in excel.py #14073

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC: update docs for read_csv().na_values and keep_default_na #14030

DOC: update docs for read_csv().na_values and keep_default_na #14030

OXPHOS commented Aug 18, 2016

Uh oh!

jreback Aug 18, 2016

Uh oh!

codecov-io commented Aug 18, 2016 •

edited

Loading

Uh oh!

OXPHOS commented Aug 18, 2016 •

edited

Loading

Uh oh!

jorisvandenbossche commented Aug 18, 2016

Uh oh!

jorisvandenbossche Aug 18, 2016

Uh oh!

OXPHOS commented Aug 19, 2016

Uh oh!

jorisvandenbossche commented Aug 19, 2016 •

edited

Loading

Uh oh!

jreback Aug 19, 2016

Uh oh!

jorisvandenbossche Aug 19, 2016

Uh oh!

jreback Aug 19, 2016

Uh oh!

jorisvandenbossche Aug 19, 2016

Uh oh!

OXPHOS commented Aug 20, 2016 •

edited

Loading

Uh oh!

OXPHOS commented Aug 20, 2016

Uh oh!

jreback commented Aug 20, 2016

Uh oh!

OXPHOS commented Aug 22, 2016

Uh oh!

Uh oh!

Uh oh!

DOC: update docs for read_csv().na_values and keep_default_na #14030

DOC: update docs for read_csv().na_values and keep_default_na #14030

Conversation

OXPHOS commented Aug 18, 2016

Uh oh!

jreback Aug 18, 2016

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Aug 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current coverage is 85.25% (diff: 100%)

Uh oh!

OXPHOS commented Aug 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche commented Aug 18, 2016

Uh oh!

jorisvandenbossche Aug 18, 2016

Choose a reason for hiding this comment

Uh oh!

OXPHOS commented Aug 19, 2016

Uh oh!

jorisvandenbossche commented Aug 19, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback Aug 19, 2016

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Aug 19, 2016

Choose a reason for hiding this comment

Uh oh!

jreback Aug 19, 2016

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Aug 19, 2016

Choose a reason for hiding this comment

Uh oh!

OXPHOS commented Aug 20, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

OXPHOS commented Aug 20, 2016

Uh oh!

jreback commented Aug 20, 2016

Uh oh!

OXPHOS commented Aug 22, 2016

Uh oh!

Uh oh!

codecov-io commented Aug 18, 2016 •

edited

Loading

OXPHOS commented Aug 18, 2016 •

edited

Loading

jorisvandenbossche commented Aug 19, 2016 •

edited

Loading

OXPHOS commented Aug 20, 2016 •

edited

Loading