-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: STD modifies groupby target column when as_index=False #10355
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Something like this would fix it. care to do a pull-requests (and add some tests)?
|
Hi, |
see contributing docs here |
woaw, I didn't consider becoming a code contributer when mentioning the bug. I don't think I would be the correct person for that. I'd probably do more damage than good. |
best way to start! give it a shot. |
Other case were it raises an error (when grouping by non-numerical columns): #16799 |
This issue still exists in pandas 0.22. Doing std() after groupby tries to apply std() on the column being grouped by and raises an error if the column is 'str' for example. This happens when using drop_index=True in the groupby() call. How can I contribute to fix this issue? |
I put a patch that might work, needs tests, see the contributing docs here:http://pandas-docs.github.io/pandas-docs-travis/contributing.html |
This code returns "ValueError: cannot insert a, already exists" error on pandas 0.22 with python 3.6.4. import pandas as pd
df = pd.DataFrame({
'a' : [1,1,1,2,2,2,3,3,3],
'b' : [1,2,3,4,5,6,7,8,9],
})
df.groupby('a', as_index=False).agg({'a': 'count'}) Do you think the root cause is the same? Output of INSTALLED VERSIONScommit: None pandas: 0.22.0 |
For #10355 (comment), as I have found another similar behavior, I have created a new issue here #20566. |
I can confirm the same problem as reported by @TakaakiFuruse (see two posts above), with pandas 0.22 and python 3.6.2. With the same example dataframe as he has, we see that the
However, applying
Clearly,
|
xref #14547 for other tests
In pandas 0.16.2 (and already in 0.16.0), using std() for aggregation after a groupby( 'my_column', as_index=False) modifies 'my_column' by taking its sqrt(). Example:
The square root values of 'a' are returned instead of 1, 2, 3.
INSTALLED VERSIONS
commit: None
python: 2.7.9.final.0
python-bits: 32
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 45 Stepping 7, GenuineIntel
byteorder: little
LC_ALL: None
LANG: fr_CH
pandas: 0.16.2
nose: 1.3.4
Cython: 0.22
numpy: 1.9.2
scipy: 0.15.1
statsmodels: None
IPython: 3.0.0
sphinx: 1.3.1
patsy: 0.3.0
dateutil: 2.4.0
pytz: 2015.2
bottleneck: None
tables: 3.1.1
numexpr: 2.4
matplotlib: 1.4.3
openpyxl: None
xlrd: 0.9.3
xlwt: None
xlsxwriter: 0.7.1
lxml: None
bs4: 4.3.2
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 0.9.9
pymysql: None
psycopg2: None
The text was updated successfully, but these errors were encountered: