-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
crosstab's dependency on common index produces undesirable error #20496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think it's by design, if I'm reading the docs at http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.crosstab.html correctly
That could be rephrased slightly to mention alignment, which is the term we usually use.
I don't think that's desired here. Alignment is the default behavior in pandas, and I think it's expected by most people who've used pandas for a while. Warnings are too easy to be annoying or missed in non-interactive settings. |
Thank you for the comments. In this case, may I suggest some updates to the documentation? It would also be helpful to mention the behavior when Python lists and/or numpy arrays are provided. Also, the explanations of |
Agreed with your suggestions. Perhaps you could make a PR improving them? :)
…On Tue, Mar 27, 2018 at 5:04 PM, Jian Shi ***@***.***> wrote:
Thank you for the comments.
In this case, may I suggest some updates to the documentation? It would
also be helpful to mention the behavior when Python lists and/or numpy
arrays are provided.
Also, the explanations of values and aggfunc parameters are quite
obscure, and there are no examples below to demonstrate them. May I also
suggest improving this?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#20496 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIrbZlwdHYp9igH5s3UUiqVQzpvwCks5tird2gaJpZM4S8bQf>
.
|
I can help with explaining the behaviors when passing numpy arrays or Python lists, but I really don't understand how to use |
Likewise, I’ve never used crosstab.
Incremental improvements are certainly welcome.
…________________________________
From: Jian Shi <[email protected]>
Sent: Tuesday, March 27, 2018 5:12:15 PM
To: pandas-dev/pandas
Cc: Tom Augspurger; Comment
Subject: Re: [pandas-dev/pandas] crosstab's dependency on common index produces undesirable error (#20496)
I can help with explaining the behaviors when passing numpy arrays or Python lists, but I really don't understand how to use values and aggfunc.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#20496 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ABQHIk3ulqEY9QDngXRe74zPkFt6nThxks5tirk_gaJpZM4S8bQf>.
|
Code Sample, a copy-pastable example if possible
I found that if passing two pandas Series (with same length, but different indices) to crosstab(), the cross tabulation result becomes incorrect.
The output:
Problem description
The second output is problematic (look at the margins and the total count, 8), because when
crosstab()
aggregatesx
andy
as Series, it only looks at the elements with common indices, which has the potential to omit some (even all) values.On the other hand, if either
x
ory
is passed in as a numpy array (i.e., without any index), then the numpy array "adopts" the index of the other array, resulting in a correct result.I am not sure whether such a behavior is by design or not. If so, maybe a warning can be raised telling users to expect strange cross tabulation results?
Output of
pd.show_versions()
python: 3.6.3.final.0
pandas: 0.22.0
numpy: 1.13.3
The text was updated successfully, but these errors were encountered: