You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the results of a groupby operation return the same results as what was in a the group in the first place, the index is left identical to the object being grouped. This doesn't sound so horrible until you realize that it is inconsistent with very comparable operations. This is observed in snippet 1. However, snippet 2 puts a finer point on it. The same code sample produces randomly different results.
Expected Output
A
0 1 0
0 1
1 3 2
2 3
Name: A, dtype: int64
A
0 1 1
0 0
1 3 3
2 2
Name: A, dtype: int64
The nlargest and nsmallest should indeed be consistent with each other, and produce always the hierarchical index I think.
The apply one is a more difficult issue. It tries to infer what to do based on the return value. It is true that it is no good that the output shape is not consistent in your example, but of course, the sample has also a random aspect, so not sure how this could be solved.
agree with @jorisvandenbossche here. These are implemented using .apply, which does best-efforts to coerce the final shape. But these should actually use a slightly lower level API which will direct the reshaping in a consistent manner.
from this SO question
snippet 1
snippet 2
Problem description
When the results of a
groupby
operation return the same results as what was in a the group in the first place, the index is left identical to the object being grouped. This doesn't sound so horrible until you realize that it is inconsistent with very comparable operations. This is observed in snippet 1. However, snippet 2 puts a finer point on it. The same code sample produces randomly different results.Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 16.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.0
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.1
numpy: 1.11.1
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.4.6
patsy: 0.4.1
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: 1.1.0
tables: 3.2.3.1
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: 2.3.2
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.1
html5lib: 0.999999999
httplib2: None
apiclient: None
sqlalchemy: 1.0.13
pymysql: None
psycopg2: None
jinja2: 2.8
boto: 2.42.0
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: