DOC/BUG: NA string in data gets read as NaN #4318

brainysmurf · 2013-07-22T17:16:59Z

better docs for keep_default_na and na_values when used together

I have a field set where "NA" is meaningful, but not meaning "not applicable" (it refers to the name of my school's Homeroom). There's no obvious way to handle this edge case...
On 0.11.0

The text was updated successfully, but these errors were encountered:

jreback · 2013-07-22T17:19:33Z

pass keep-default_na=False to read_csv then it will only recognize NA's that you explicity pass in na_values (none by default)

brainysmurf · 2013-07-22T17:21:22Z

Ah. The doc for that isn't clear:
If na_values are specified and keep_default_na is False the default NaN
values are overridden, otherwise they're appended to

jreback · 2013-07-22T17:24:25Z

so I think if you want NO values converted to NA's

do keep_default_na=False, na_values=[].....otherwise put what you want in na_values
(e.g. you need to have na_values be not None)

maybe should put an example....(and/or make a bit clearer in docs)

let us know if that works

brainysmurf · 2013-07-23T02:13:41Z

Definitely worked. I just think this might be a common problem, the docs don't make it clear at all about the behaviour, really convoluted. How about this:
To customize what constitutes a null value, set keep_na_values to false and pass the list of values that are to be interpreted as null to na_values. Or, if more values than the default are to be interpreted as null, keep_na_values as True (the default) and pass the additional values to na_values.

jreback · 2013-07-23T02:19:42Z

great
want to so a PR for this? (prob doc string and io.rst), maybe even add an example of doing this

brainysmurf · 2013-07-23T02:26:15Z

Sure, happy to contribute.

hayd · 2013-07-23T14:37:42Z

(if you use keep_default_na=False, I don't think you need to add na_values=[]... the default behaviour is to have no NA values.)

jreback · 2013-07-23T15:33:20Z

actually...just tested with with keep_default_na=False and not passing na_values which defaults it to None

unfortunately

na_values = set(['None', None])

which prob works, but seems wrong....

(just needs a small change in pandas.io.parsers._clean_na_values to fix this

hayd · 2013-07-23T16:03:41Z

Good spot! (I had only checked for "NA", which is NaN'd normally but left with keep_default_na=False.)

jreback · 2013-07-26T19:05:42Z

@brainysmurf

ok I fixed the bug in #4374, what do you think we should change in the docstring/docs to make more clear? (If you put an updated text I can just paste it in)

do we want to throw in an example?

you are also welcome to do a PR here if you'd like...

lmk

brainysmurf · 2013-07-27T04:25:30Z

Hi not sure what a PR is.

"To customize what constitutes being read as NaN, use combination of keep_default_na and na_values. To completely override the behavior, set the former to False and send the overriding list of values that will be read as NaN to the latter. To simply add more values to the defaults (which are ['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN','#N/A N/A', 'NA', '#NA', 'NULL', 'NaN','nan']) you can pass the list that will be appended to na_values.

pd.read_csv(path, keep_default_na=False, na_values=[""]) # nothing is a NaN
pd.read_csv(path, keep_default_na=False, na_values=["NA", "0"]) # NA and 0 strings are NaN
pd.read_csv(path, na_values=["Nope"]) # string "Nope" in addition to default values are treated as NaN

cpcloud · 2013-07-27T12:46:04Z

a PR is a pull request 😄

https://help.github.com/articles/using-pull-requests

jreback · 2013-07-29T16:14:46Z

@brainysmurf see the PR #4374, added some nice shiny docs on na_values/infinity parsing, should be a lot more clear. but pls comment if you feel that is not the case

jreback mentioned this issue Jul 26, 2013

BUG: Fixed passing keep_default_na=False when na_values=None (GH4318) #4374

Merged

jreback closed this as completed in #4374 Jul 30, 2013

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC/BUG: NA string in data gets read as NaN #4318

DOC/BUG: NA string in data gets read as NaN #4318

brainysmurf commented Jul 22, 2013

jreback commented Jul 22, 2013

brainysmurf commented Jul 22, 2013

jreback commented Jul 22, 2013

brainysmurf commented Jul 23, 2013

jreback commented Jul 23, 2013

brainysmurf commented Jul 23, 2013

hayd commented Jul 23, 2013

jreback commented Jul 23, 2013

hayd commented Jul 23, 2013

jreback commented Jul 26, 2013

brainysmurf commented Jul 27, 2013

cpcloud commented Jul 27, 2013

jreback commented Jul 29, 2013

DOC/BUG: NA string in data gets read as NaN #4318

DOC/BUG: NA string in data gets read as NaN #4318

Comments

brainysmurf commented Jul 22, 2013

jreback commented Jul 22, 2013

brainysmurf commented Jul 22, 2013

jreback commented Jul 22, 2013

brainysmurf commented Jul 23, 2013

jreback commented Jul 23, 2013

brainysmurf commented Jul 23, 2013

hayd commented Jul 23, 2013

jreback commented Jul 23, 2013

hayd commented Jul 23, 2013

jreback commented Jul 26, 2013

brainysmurf commented Jul 27, 2013

cpcloud commented Jul 27, 2013

jreback commented Jul 29, 2013