Skip to content

na_filter=False ignored when index_col set #5239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cancan101 opened this issue Oct 16, 2013 · 3 comments
Closed

na_filter=False ignored when index_col set #5239

cancan101 opened this issue Oct 16, 2013 · 3 comments
Labels
Bug IO CSV read_csv, to_csv Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Milestone

Comments

@cancan101
Copy link
Contributor

Given the following CSV file:

u1,u2,u3,d1,d2,d3,d4
Good Things,C,,1,1,1,1
Good Things,R,,1,1,1,1
Bad Things,C,,1,1,1,1
Bad Things,T,,1,1,1,1
Okay Things,N,B,1,1,1,1
Okay Things,N,D,1,1,1,1
Okay Things,B,,1,1,1,1
Okay Things,D,,1,1,1,1

First I parse with na_filter=True:

In [13]: pd.read_csv("/home/alex/nan_issue.csv", na_filter=True)
Out[13]: 
            u1 u2   u3  d1  d2  d3  d4
0  Good Things  C  NaN   1   1   1   1
1  Good Things  R  NaN   1   1   1   1
2   Bad Things  C  NaN   1   1   1   1
3   Bad Things  T  NaN   1   1   1   1
4  Okay Things  N    B   1   1   1   1
5  Okay Things  N    D   1   1   1   1
6  Okay Things  B  NaN   1   1   1   1
7  Okay Things  D  NaN   1   1   1   1

then I parse with na_filter=False:

In [12]: pd.read_csv("/home/alex/nan_issue.csv", na_filter=False)
Out[12]: 
            u1 u2 u3  d1  d2  d3  d4
0  Good Things  C      1   1   1   1
1  Good Things  R      1   1   1   1
2   Bad Things  C      1   1   1   1
3   Bad Things  T      1   1   1   1
4  Okay Things  N  B   1   1   1   1
5  Okay Things  N  D   1   1   1   1
6  Okay Things  B      1   1   1   1
7  Okay Things  D      1   1   1   1

then index_cols set:

In [11]: pd.read_csv("/home/alex/nan_issue.csv", na_filter=False,index_col=[0,1,2],)
Out[11]: 
                    d1  d2  d3  d4
u1          u2 u3                 
Good Things C  NaN   1   1   1   1
            R  NaN   1   1   1   1
Bad Things  C  NaN   1   1   1   1
            T  NaN   1   1   1   1
Okay Things N  B     1   1   1   1
               D     1   1   1   1
            B  NaN   1   1   1   1
            D  NaN   1   1   1   1

Finally setting na_values=[], keep_default_na=False seems to fix the issue:

In [14]: pd.read_csv("/home/alex/nan_issue.csv", na_filter=False,index_col=[0,1,2],na_values=[], keep_default_na=False)
Out[14]: 
                   d1  d2  d3  d4
u1          u2 u3                
Good Things C       1   1   1   1
            R       1   1   1   1
Bad Things  C       1   1   1   1
            T       1   1   1   1
Okay Things N  B    1   1   1   1
               D    1   1   1   1
            B       1   1   1   1
            D       1   1   1   1
@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

There are very limited tests with na_filter=False and its a pretty silly parameter, so moving to low-priority. You are welcome to do a PR if you'd like.

@cancan101
Copy link
Contributor Author

TBI I can work around for now. My observation is that there probably too many parameters on the method having to do with handling of nans.It would be great to clean this up.
keep_default_na, na_filter, na_values

There is also interaction between these parameters.

@jreback
Copy link
Contributor

jreback commented Oct 16, 2013

sure.....though to be honest, easiest just to drop na_filter...but if you come up with a better API, gr8

gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 5, 2017
@gfyoung gfyoung added Bug Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate and removed Prio-low labels Nov 5, 2017
@gfyoung gfyoung modified the milestones: Someday, Next Major Release Nov 5, 2017
@jreback jreback modified the milestones: Next Major Release, 0.21.1 Nov 6, 2017
gfyoung added a commit to forking-repos/pandas that referenced this issue Nov 6, 2017
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue Dec 8, 2017
TomAugspurger pushed a commit that referenced this issue Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO CSV read_csv, to_csv Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

3 participants