Skip to content

Unexpected behavior -- crosstab with dropna=False #13556

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cotterman opened this issue Jul 4, 2016 · 1 comment
Closed

Unexpected behavior -- crosstab with dropna=False #13556

cotterman opened this issue Jul 4, 2016 · 1 comment
Labels
Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@cotterman
Copy link

Summary of issue

By the name of it, I expected using the 'dropna=False' option with crosstab would have given me results that include rows for which my column value is missing. However, I instead found that missing values continue to be ignored.

Code Sample

 df = pd.DataFrame({'geography': ['tropic', 'tropic', 'desert', None, 'tundra', 'tundra'], 
                    'dragons': ['many', None, 'few', 'few', 'many', None]})
 print(df)
 pd.crosstab(df.geography, df.dragons)
 pd.crosstab(df.geography, df.dragons, dropna=False)
 pd.crosstab(df.dragons, df.geography, dropna=False)

Expected Output

geography  NA  desert  tropic  tundra
dragons                              
NA          0       0       1       1
few         1       1       0       0
many        0       0       1       1

Actual output

In [210]:  print(df)
  dragons geography
0    many    tropic
1    None    tropic
2     few    desert
3     few      None
4    many    tundra
5    None    tundra

In [211]:  pd.crosstab(df.geography, df.dragons)
Out[211]: 
dragons    few  many
geography           
desert       1     0
tropic       0     1
tundra       0     1

In [212]:  pd.crosstab(df.geography, df.dragons, dropna=False)
Out[212]: 
dragons    few  many
geography           
desert       1     0
tropic       0     1
tundra       0     1

In [213]:  pd.crosstab(df.dragons, df.geography, dropna=False)
Out[213]: 
geography  desert  tropic  tundra
dragons                          
few             1       0       0
many            0       1       1

This work-around gives me what I expected.

pd.crosstab(df.dragons.fillna('NA'), df.geography.fillna('NA'), dropna=False)

I'd hate for others to also not realize that the NAs are being dropped from the output, even when dropna=False. Perhaps altering the documentation or the behavior of this function would help.

@sinhrks
Copy link
Member

sinhrks commented Jul 4, 2016

Thanks for the report. Dupe with #10772.

@sinhrks sinhrks closed this as completed Jul 4, 2016
@sinhrks sinhrks added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Duplicate Report Duplicate issue or pull request labels Jul 4, 2016
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Jul 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

3 participants