-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Shifting GroupBy Object with freq
inserts unwanted index level
#23918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So @WillAyd here is what I can tell so far, pandas/pandas/core/groupby/groupby.py Lines 705 to 707 in d43ac97
In this method mutated returns as True in the case where the index is shifted, which then passes into line 712 as not_indexed_same. In this case keys is the same value for both the case where it has expected and unexpected benefits
One question is, is this expected behavior here? This then passes here: pandas/pandas/core/groupby/groupby.py Lines 886 to 888 in d43ac97
If this block is hit, it's good. However, the problem we see is here: pandas/pandas/core/groupby/groupby.py Lines 906 to 918 in d43ac97
If we debug into it we can see what is happening group_keys
> Int64Index([1], dtype='int64') So this seems to be working as expected, it's just being given group_keys that don't really make sense. In fact, we can generalize this problem a little by noticing it's 'unrelated' to the shift function so to speak.
Essentially any groupby operation that mutates the dataframe induces this strange index behavior. However, what is interesting is that in all cases the group_keys are wrong (I'm assuming Int64Index([1], dtype='int64') is wrong). However, it's only the cases where the data is also mutated that it hits the conditional logic on groupby.py 906 (as included above), where this wrong key gives us unexpected output. This originates from the call in the apply method in ops.py, specifically here: pandas/pandas/core/groupby/ops.py Lines 152 to 154 in d43ac97
Where self.levels[0] returns Int64Index([1], dtype='int64'. Anyway, I probably need to read more unit tests to understand the expected behavior and when and why having conditional logic for mutated indices is necessary, and when and why the group_keys works as intended. I'll continue looking into this and add updates. pandas/pandas/core/groupby/ops.py Lines 164 to 167 in d43ac97
|
@WillAyd on main i'm seeing a result that looks reasonable but also doesn't match your Expected. can you take a look There's also a test_pct_change with an xfail that points back to this issue that looks like the current (raising) behavior is correct. any idea whats going on there? |
IMO somewhat unexpected behavior depending on the arguments provided to shift after a groupby:
This is arguably in contrast to #22053 which is asking for the behavior of the last item consistently, but I think that is the one that is actually incorrect out of all examples and causes misalignment with the index of the original caller.
cc @SimonAlecks who originally found this in #21235
Expected Output
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: 1f1f705
python: 3.6.7.final.0
python-bits: 64
OS: Darwin
OS-release: 18.2.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.24.0.dev0+1123.g1f1f7053d.dirty
pytest: 4.0.0
pip: 18.1
setuptools: 40.6.2
Cython: 0.29
numpy: 1.15.4
scipy: 1.1.0
pyarrow: 0.11.1
xarray: 0.11.0
IPython: 7.1.1
sphinx: 1.8.2
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: 3.4.4
numexpr: 2.6.8
feather: None
matplotlib: 3.0.1
openpyxl: 2.5.9
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.1.2
lxml: 4.2.5
bs4: 4.6.3
html5lib: 1.0.1
sqlalchemy: 1.2.14
pymysql: 0.9.2
psycopg2: None
jinja2: 2.10
s3fs: 0.1.6
fastparquet: 0.1.6
pandas_gbq: None
pandas_datareader: None
gcsfs: 0.2.0
The text was updated successfully, but these errors were encountered: