Skip to content

DOC: update docs for read_csv().na_values and keep_default_na #14030

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

OXPHOS
Copy link
Contributor

@OXPHOS OXPHOS commented Aug 18, 2016

@sinhrks sinhrks added the Docs label Aug 18, 2016
@@ -25,6 +25,7 @@
import pandas.core.common as com
from warnings import warn
from distutils.version import LooseVersion
from parsers import _NA_VALUES
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the full import path. Actually we should move these to pandas.io.common.

@jreback jreback added the IO Excel read_excel, to_excel label Aug 18, 2016
@codecov-io
Copy link

codecov-io commented Aug 18, 2016

Current coverage is 85.25% (diff: 100%)

Merging #14030 into master will decrease coverage by <.01%

@@             master     #14030   diff @@
==========================================
  Files           139        139          
  Lines         50380      50383     +3   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42953      42955     +2   
- Misses         7427       7428     +1   
  Partials          0          0          

Powered by Codecov. Last update 453bc26...ff1964b

@OXPHOS
Copy link
Contributor Author

OXPHOS commented Aug 18, 2016

Do I need add what's new for this? And if so, in which what's new file?

Also I am not sure about this error:

2016-08-18 18:50:05 (7.84 MB/s) - 'miniconda.sh' saved [21310143/21310143]
ERROR: File or directory already exists: /Users/travis/miniconda

@jorisvandenbossche
Copy link
Member

whatsnew is not needed. And don't worry about the travis error, that is something unrelated.

na_values : str or list-like or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific
per-column NA values. By default the following values are interpreted
as NaN: `'""" + "'`, `'".join(sorted(_NA_VALUES)) + """'`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am afraid this does not work with docstrings. You will have to hardcode them like it is done in the read_csv docstring

@OXPHOS
Copy link
Contributor Author

OXPHOS commented Aug 19, 2016

@jorisvandenbossche Hey I made some changes. Is there anyway I can preview it?

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Aug 19, 2016

You can just look at the docstring in the console to see if it looks ok (with ipython: pd.read_excel?)

def read_excel(io, sheetname=0, header=0, skiprows=None, skip_footer=0,
index_col=None, names=None, parse_cols=None, parse_dates=False,
date_parser=None, na_values=None, thousands=None,
convert_float=True, has_index_names=None, converters=None,
engine=None, squeeze=False, **kwds):
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its necessary to make this an Appender doc-string. We generally only do that if you need it more than once. @jorisvandenbossche ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's to be able to automatically inject the NA_VALUES instead of manually writing it in the docstring.

It's a bit a trade-off here: is the added complexity of moving the docstring away from the actual function worth it for the 1 line of na values that is written manually (and has to be updated if the default recognized NaN values are changed).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I c. Then this is ok. We want to have the common NA values in a single place

@jreback jreback added this to the 0.19.0 milestone Aug 19, 2016
na_values : str or list-like or dict, default None
Additional strings to recognize as NA/NaN. If dict passed, specific
per-column NA values. By default the following values are interpreted
as NaN: `'""" + "'`, `'".join(sorted(_NA_VALUES)) + """'`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the backticks are not needed around the strings. For the rest, the docstring looks fine in the console!

@OXPHOS
Copy link
Contributor Author

OXPHOS commented Aug 20, 2016

I squashed and removed the backticks.

I did test:

In [12]: _NA_VALUES = set([
   ....:     '-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN', '#N/A N/A', '#N/A',
   ....:     'N/A', 'NA', '#NA', 'NULL', 'NaN', '-NaN', 'nan', '-nan', ''
   ....: ])

In [13]: s = """as NaN: '""" + "', '".join(sorted(_NA_VALUES)) + """'."""

In [14]: print s
as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'nan'.

I didn't check the doc with installation and pd.read_excel? because I start getting weird error out of nowhere. I am trying to fix it but I have no clue how this is happening yet. So should I post the problem in somewhere if I have no luck hacking it? Like google group or irc channel? Thanks!

@OXPHOS
Copy link
Contributor Author

OXPHOS commented Aug 20, 2016

Okay I fixed it. Something's wrong with Cython.
Ready to merge!

@jreback jreback closed this in 5c78ee6 Aug 20, 2016
@jreback
Copy link
Contributor

jreback commented Aug 20, 2016

thanks!

Can you do a followup where we make this wrap on the NaNs (they generate an extra long line)

na_values : str or list-like or dict, default None
    Additional strings to recognize as NA/NaN. If dict passed, specific
    per-column NA values. By default the following values are interpreted
    as NaN: '', '#N/A', '#N/A N/A', '#NA', '-1.#IND', '-1.#QNAN', '-NaN', '-nan', '1.#IND', '1.#QNAN', 'N/A', 'NA', 'NULL', 'NaN', 'nan'.
thousands : str, default None
    Thousands separator for parsing string columns to numeric.  Note that
    this parameter is only necessary for columns stored as TEXT in Excel,
    any numeric columns will automatically be parsed, regardless of display
    format.

@OXPHOS
Copy link
Contributor Author

OXPHOS commented Aug 22, 2016

Sure I'll try!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: read_excel() documentation for na_values should show default na values
5 participants