Skip to content

DOC/BUG: NA string in data gets read as NaN #4318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brainysmurf opened this issue Jul 22, 2013 · 13 comments · Fixed by #4374
Closed

DOC/BUG: NA string in data gets read as NaN #4318

brainysmurf opened this issue Jul 22, 2013 · 13 comments · Fixed by #4374
Labels
Bug Docs IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@brainysmurf
Copy link

better docs for keep_default_na and na_values when used together

I have a field set where "NA" is meaningful, but not meaning "not applicable" (it refers to the name of my school's Homeroom). There's no obvious way to handle this edge case...
On 0.11.0

@jreback
Copy link
Contributor

jreback commented Jul 22, 2013

pass keep-default_na=False to read_csv then it will only recognize NA's that you explicity pass in na_values (none by default)

@brainysmurf
Copy link
Author

Ah. The doc for that isn't clear:
If na_values are specified and keep_default_na is False the default NaN
values are overridden, otherwise they're appended to

@jreback
Copy link
Contributor

jreback commented Jul 22, 2013

so I think if you want NO values converted to NA's

do keep_default_na=False, na_values=[].....otherwise put what you want in na_values
(e.g. you need to have na_values be not None)

maybe should put an example....(and/or make a bit clearer in docs)

let us know if that works

@brainysmurf
Copy link
Author

Definitely worked. I just think this might be a common problem, the docs don't make it clear at all about the behaviour, really convoluted. How about this:
To customize what constitutes a null value, set keep_na_values to false and pass the list of values that are to be interpreted as null to na_values. Or, if more values than the default are to be interpreted as null, keep_na_values as True (the default) and pass the additional values to na_values.

@jreback
Copy link
Contributor

jreback commented Jul 23, 2013

great
want to so a PR for this? (prob doc string and io.rst), maybe even add an example of doing this

@brainysmurf
Copy link
Author

Sure, happy to contribute.

@hayd
Copy link
Contributor

hayd commented Jul 23, 2013

(if you use keep_default_na=False, I don't think you need to add na_values=[]... the default behaviour is to have no NA values.)

@jreback
Copy link
Contributor

jreback commented Jul 23, 2013

actually...just tested with with keep_default_na=False and not passing na_values which defaults it to None

unfortunately

na_values = set(['None', None])

which prob works, but seems wrong....

(just needs a small change in pandas.io.parsers._clean_na_values to fix this

@hayd
Copy link
Contributor

hayd commented Jul 23, 2013

Good spot! (I had only checked for "NA", which is NaN'd normally but left with keep_default_na=False.)

@jreback
Copy link
Contributor

jreback commented Jul 26, 2013

@brainysmurf

ok I fixed the bug in #4374, what do you think we should change in the docstring/docs to make more clear? (If you put an updated text I can just paste it in)

do we want to throw in an example?

you are also welcome to do a PR here if you'd like...

lmk

@brainysmurf
Copy link
Author

Hi not sure what a PR is.

"To customize what constitutes being read as NaN, use combination of keep_default_na and na_values. To completely override the behavior, set the former to False and send the overriding list of values that will be read as NaN to the latter. To simply add more values to the defaults (which are ['-1.#IND', '1.#QNAN', '1.#IND', '-1.#QNAN','#N/A N/A', 'NA', '#NA', 'NULL', 'NaN','nan']) you can pass the list that will be appended to na_values.

pd.read_csv(path, keep_default_na=False, na_values=[""]) # nothing is a NaN
pd.read_csv(path, keep_default_na=False, na_values=["NA", "0"]) # NA and 0 strings are NaN
pd.read_csv(path, na_values=["Nope"]) # string "Nope" in addition to default values are treated as NaN

@cpcloud
Copy link
Member

cpcloud commented Jul 27, 2013

a PR is a pull request 😄

https://help.github.com/articles/using-pull-requests

@jreback
Copy link
Contributor

jreback commented Jul 29, 2013

@brainysmurf see the PR #4374, added some nice shiny docs on na_values/infinity parsing, should be a lot more clear. but pls comment if you feel that is not the case

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Docs IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants