-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Pandas pivot_table MultiIndex and dropna=False generates all combinations of modalities instead of keeping existing one only #18030
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Diving into In Lines 4430 to 4437 in 52fe6bc
In the function definition: pandas/pandas/core/reshape/pivot.py Lines 26 to 28 in 52fe6bc
We can search for pandas/pandas/core/reshape/pivot.py Lines 99 to 113 in 52fe6bc
And after all, columns of NaN are purged: pandas/pandas/core/reshape/pivot.py Lines 136 to 140 in 52fe6bc
After reading these portions of code, I can tell:
Questions:
Fix proposal: Actually I need to inhibate the final May be a finer condition (or an extra switch) in the Cartesian Product part will do the trick. |
@jlandercy thanks for diving in! But actually, it seems you can get your desired result (I think) with the underlying groupby + unstack:
Is it correct that this is what you want? |
Dear @jorisvandenbossche, Yes this is the desired output. Thank you for pointing out this alternative of fill/replace. Unfortunately, this will not fit in my project as this. Simply because I extensively use What I do not understand is: why there is a Cartesian Product on Level Modalities of index and columns? Do you have any idea? What could be the reason or usage of this feature? I am really intrigued with this. And I found it a little bit counter intuitive by now. Best regards, |
Yes, I hit this problem today. Boy -- So annoying. |
ETA is when someone submits a patch, how about it @Sarnath |
Challenge accepted. I have no idea about Pandas development. Let me know how to go about it. I will get started! Cheers! |
https://pandas.pydata.org/pandas-docs/stable/contributing.html |
The docs at `/stable` are a bit out of date. Try
http://pandas-docs.github.io/pandas-docs-travis/development/contributing.html
…On Thu, Jan 24, 2019 at 8:22 AM Sarnath ***@***.***> wrote:
https://pandas.pydata.org/pandas-docs/stable/contributing.html
I will go with this. Let me know if something else is expected.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18030 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIpNh-lo-Lkffo7n_AIs3Rw-a3UB6ks5vGcGNgaJpZM4QK-7n>
.
|
Sure, Thanks! I will start in a week. I hope i can find a place where I can sync with other developers. I will first read the guide. Thanks for your time! |
Any news on this? |
Still open, and will be closed when it's fixed. @marchezinixd are you interested in working on it? |
Is there any update on this? Can the cartesian product be turned into a generator so that the memory allocation won't fail? Also, is the reindexing at that point necessary when dropna is False? |
I'm also having this issue. I resorted to replace missing column names with "" so that they're included in the pivot_table. |
Minimal Verifiable Working Example
Bellow you will find a Minimal Verifiable Working Example that reproduces the behaviour I am considering in this issue:
Trial input looks like (
df
):Misbehaved output looks like (
cross3
):Expected output is similar to
cross2
but withNaN
value instead of string and looks like:Problem description
I have the need:
What seems to be the problem, is the creation of all combination of level modalities (instead of keep the existing one only) which drastically increases the amount of Memory without necessity (those combinations are not present in original data).
Maybe it is a bug, maybe it is the designed behaviour. Just wanted to notice it because it has surprised me, and now I am looking to a clean way to circonvolve this behaviour.
How have I found it:
I first had a
Memory Error
with small queries (about 1000 rows and 25 channels), then I reduced the amount of rows and columns, and I finally dumped it to JSON in order to get the following MVWE above.Expected Output
To my understanding, the following command:
Should return the same as:
A small DataFrame with no extra columns and NaN value not dropped, it should look like:
Without generating combination of level modalities that does not exists in input data.
This will also prevent raising a
MemoryError
for reasonable amount of data.Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: