-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
row sum and column mean works but row mean gives all NaN, on heterogenous data types #33202
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
sum of object types generally might work but mean does not. try doing .infer_object() after transpose why would you do this in any event? transpose in mixed types is really odd |
@jreback Thanks for your reply! I encounter this because I am doing clinical data statistical analysis. In clinical data, some are floating point, some are integers, some are time durations, etc., so when I compute the average value over some period of time, this bug surfaces. Currently, transpose=>column-mean=>transpose-back is my work around for row-mean, because on exactly the same data, averaging the column has no problem, only averaging the row has the problem. |
pandas is column based; mixing dtypes while possible is not recommended; column based dtypes are efficient for both storage and computation doing numeric operations in heterogeneous data doesn’t make any sense |
So the issue is closed without getting the bug solved.That is why I have suggested in #6963, "for maximum usage compatibility, we should treat a table as a symmetric rank-2 tensor". Due to low-level CPU optimization mechanical constraint, sum/mean/std/max/min/etc along row direction may incur a performance penalty, this is acceptable, but the operation should not fail and should work consistent with column operations. Thus, incompetent programming and inherently deficient/defective architectural design does not allow this bug to be solved that easily ^_^ |
When different DataFrame rows are of different types, row sum works but row mean gives all NaN values.
As shown above, the last statement shows the bug by returning all NaN values which is incorrect. Interestingly, if the DataFrame entries are of the same type (uncomment the 2nd line), this bug does not occur.
The text was updated successfully, but these errors were encountered: