Skip to content

BUG: na_values with a dict of scalars #7119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jseabold opened this issue May 13, 2014 · 9 comments
Closed

BUG: na_values with a dict of scalars #7119

jseabold opened this issue May 13, 2014 · 9 comments
Labels
Bug IO CSV read_csv, to_csv

Comments

@jseabold
Copy link
Contributor

I can't get na_values to work with a dictionary.

from StringIO import StringIO
import pandas as pd

dta = pd.read_csv(StringIO("""var1,var2,var3,var4
1,2,3,MISSING
2,3,4,4.5
"""), na_values={3 : 'MISSING'})

dta = pd.read_csv(StringIO("""var1,var2,var3,var4
1,2,3,MISSING
2,3,4,4.5
"""), na_values={'var4' : 'MISSING'})

dta = pd.read_csv(StringIO("""var1,var2,var3,var4
1,2,3,MISSING
2,3,4,4.5
"""), na_values=['MISSING'])

On

[~/]
[10]: pd.version.version
[10]: '0.13.1-753-g4614ac8'
@jreback
Copy link
Contributor

jreback commented May 13, 2014

try na_values={'var4' : ['MISSING'] }

a bit non-intuitive, but that's what we ar testing on...prob a but not to accept a scalar

@jreback jreback added this to the 0.14.1 milestone May 13, 2014
@jreback jreback changed the title na_values does not work as expected when specifying columns in read_ BUG: na_values with a dict of scalars May 13, 2014
@jreback
Copy link
Contributor

jreback commented May 13, 2014

In [9]: read_csv(StringIO(data),na_values=dict(var4 = ['MISSING']))
Out[9]: 
   var1  var2  var3  var4
0     1     2     3   NaN
1     2     3     4   4.5

[2 rows x 4 columns]

@jseabold
Copy link
Contributor Author

There was similar wart-y code very early on that required lists even for single variables. E.g., in dropna, etc. though I don't remember where exactly. Probably worth a patch.

@jreback
Copy link
Contributor

jreback commented May 13, 2014

docs say list of strings but not hard to accept a dict of scalars/lists for a particular column (as a single scalar is accepted too)

@jreback jreback modified the milestones: 0.15.0, 0.14.1 Jun 10, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@gfyoung
Copy link
Member

gfyoung commented Sep 11, 2016

@jreback : #14056 resolves the second example. However, the behaviour for the first one is not entirely clear from documentation. Should we be able to accept column indices? At the moment, the answer appears to be no:

>>> from pandas.compat import StringIO
>>> from pandas import read_csv
>>> data = 'a\nfoo\n1'
>>>
>>> read_csv(StringIO(data), na_values={0: 'foo'}, engine='c')
...
TypeError: Expected list, got set
>>> read_csv(StringIO(data), na_values={0: 'foo'}, engine='python')
     a
0  foo
1    1

@jreback
Copy link
Contributor

jreback commented Sep 11, 2016

@gfyoung I think column indices are ok, except if a header is passed (in which case they must be the names; easiest to avoid conflict here).

@gfyoung
Copy link
Member

gfyoung commented Sep 11, 2016

@jreback : Hmm...okay. In any case, I think this issue should be closed in favor of another one that specifically states that we can correctly process column indices.

@jreback
Copy link
Contributor

jreback commented Sep 11, 2016

ok can u make one?

@gfyoung
Copy link
Member

gfyoung commented Sep 11, 2016

@jreback : Done --> #14203.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

3 participants