Skip to content

add dropna=False to crosstab example #21413

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 12, 2018
Merged

Conversation

testvinder
Copy link
Contributor

@testvinder testvinder commented Jun 10, 2018

>>> foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
>>> bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])
>>> crosstab(foo, bar)  # 'c' and 'f' are not represented in the data,
    ...                     # but they still will be counted in the output
col_0  d  e  f
row_0
a      1  0  0
b      0  1  0
c      0  0  0

The above example code does not produce the output shown because dropna=True is default. Changing crosstab(foo, bar) to crosstab(foo, bar, dropna=False) fixes that and produces the shown output (which is also the expected and correct output).

  • closes #xxxx
  • tests added / passed
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@pep8speaks
Copy link

pep8speaks commented Jun 10, 2018

Hello @testvinder! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 11, 2018 at 09:29 Hours UTC

Copy link
Contributor

@uds5501 uds5501 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please refer to the comments.

@gfyoung , are the suggestions viable?

@@ -429,8 +429,9 @@ def crosstab(index, columns, values=None, rownames=None, colnames=None,

>>> foo = pd.Categorical(['a', 'b'], categories=['a', 'b', 'c'])
>>> bar = pd.Categorical(['d', 'e'], categories=['d', 'e', 'f'])
>>> crosstab(foo, bar) # 'c' and 'f' are not represented in the data,
... # but they still will be counted in the output
>>> crosstab(foo, bar, dropna=False) # 'c' and 'f' are not represented
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@testvinder , just a suggestion, you could've actually just changed the output regarding the given code and mentioned in comments about the default value of dropnabeing set to True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually just add an example instead of modifying this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@uds5501 I actually think it's better to set ´dropnatoFalse`. The whole point of specifying the extra categories is to include them in the crosstab. It's a small change but the original is very confusing because the output cannot be reproduced.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@testvinder : Ah, yes, your point is well-taken. Can you update the original example with the correct output and add your example with dropna=False ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfyoung Sure, will do.

@codecov
Copy link

codecov bot commented Jun 10, 2018

Codecov Report

Merging #21413 into master will decrease coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #21413      +/-   ##
==========================================
- Coverage   91.89%   91.89%   -0.01%     
==========================================
  Files         153      153              
  Lines       49596    49596              
==========================================
- Hits        45576    45574       -2     
- Misses       4020     4022       +2
Flag Coverage Δ
#multiple 90.29% <ø> (-0.01%) ⬇️
#single 41.86% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/reshape/pivot.py 97.03% <ø> (ø) ⬆️
pandas/util/testing.py 84.6% <0%> (-0.21%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 415012f...4d7e36f. Read the comment docs.

@gfyoung gfyoung added the Categorical Categorical Data Type label Jun 10, 2018
@gfyoung gfyoung requested review from jreback and removed request for jreback June 10, 2018 22:04
@gfyoung
Copy link
Member

gfyoung commented Jun 11, 2018

I completely misread this PR and thought this was a larger change than it actually was. Sorry!

@testvinder
Copy link
Contributor Author

@gfyoung yes, it's really just a tiny change that makes the documentation more clear.

@gfyoung gfyoung changed the base branch from 0.23.x to master June 11, 2018 08:22
@gfyoung gfyoung changed the base branch from master to 0.23.x June 11, 2018 08:22
@gfyoung
Copy link
Member

gfyoung commented Jun 11, 2018

@testvinder : I just realized: your PR should be against the master branch, not 0.23.x. Could you fix that?

@testvinder testvinder changed the base branch from 0.23.x to master June 11, 2018 08:47
@testvinder
Copy link
Contributor Author

@gfyoung Done:-) Sorry for the mistake. This is all new to me.

@gfyoung gfyoung changed the base branch from master to 0.23.x June 11, 2018 09:24
@gfyoung gfyoung changed the base branch from 0.23.x to master June 11, 2018 09:28
@gfyoung
Copy link
Member

gfyoung commented Jun 11, 2018

@testvinder : I re-applied your commits against master because the base-switching was a lot uglier than I was hoping it would be.

Copy link
Member

@gfyoung gfyoung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the base fixed, LGTM!

cc @jreback

@gfyoung gfyoung added the Docs label Jun 11, 2018
@jreback jreback added this to the 0.23.2 milestone Jun 12, 2018
@jreback jreback merged commit ffffa5c into pandas-dev:master Jun 12, 2018
@jreback
Copy link
Contributor

jreback commented Jun 12, 2018

thanks @testvinder

@jorisvandenbossche
Copy link
Member

@jreback just a small note, can you make sure the final commits gets a reasonable message? (normally you should see it in the github merge box), because now it is "Reapply all patches by @testvinder against master" :-)

@jorisvandenbossche jorisvandenbossche modified the milestones: 0.23.2, 0.24.0 Jun 12, 2018
@jreback
Copy link
Contributor

jreback commented Jun 12, 2018

normally i remove all comments - guess missed that one

@jorisvandenbossche
Copy link
Member

Yeah, it's not about the comments (those go into the commit message body) but the title. But it's a bit strange github did it this wrongly, because normally the title is fine (probably because the commit was rebased and squashed)

david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants