-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrame.agg with multiple cum functions creates wrong result #35490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@qinxuye Thanks for the report Please read this guide detailing how to provide the necessary information for us to reproduce your bug. can you make the code sample more minimal and also please indicate what the expected output should be (or state same as 1.0.5) |
it appears this change in behaviour is a result of #30858. So this change is not intentional. marking as regression. cc @MarcoGorelli b6222ec is the first bad commit
|
Thanks for the CC, and I'm sorry for the breakage caused! I'll look into this next week |
Thanks @MarcoGorelli . and no need to apologise! |
I think the problem's here result = self._wrap_series_output(
output=output, index=self.grouper.result_index
) where we have (Pdb) output
{OutputKey(label='cumsum', position=0): 0 3
1 4
2 5
3 3
4 5
5 7
6 6
7 8
8 11
Name: a, dtype: int64}
(Pdb) result
b
1 4
3 3
4 5
5 7
6 6
Name: cumsum, dtype: int64
(Pdb) self.grouper.result_index
Int64Index([1, 3, 4, 5, 6], dtype='int64', name='b') BTW, |
@simonjayhawkins should the expected output be the same as it was in 1.0.5? TBH I find it strange that groupby.agg returns something with an index different from self.grouper.result_index, does that happen with other aggregations? My (unqualified) opinion is that:
|
I've not delved much into the groupby code. @WillAyd wdyt? |
I think should revert to the 1.0.5 behavior
I agree in principal but probably out of scope for this issue - can you see if there's one already out there or if not open one? |
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.
Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.
Code Sample, a copy-pastable example
Cumulative functions should generate the DataFrame with the same length.
Problem description
[this should explain why the current behaviour is a problem and why the expected output is a better solution]
In pandas 1.0.5, the result is
Expected Output
Output of
pd.show_versions()
In [5]: pd.show_versions()
/Users/qinxuye/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/setuptools/distutils_patch.py:26: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
"Distutils was imported before Setuptools. This usage is discouraged "
ImportError Traceback (most recent call last)
in
----> 1 pd.show_versions()
~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/util/_print_versions.py in show_versions(as_json)
104 """
105 sys_info = _get_sys_info()
--> 106 deps = _get_dependency_info()
107
108 if as_json:
~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/util/_print_versions.py in _get_dependency_info()
82 for modname in deps:
83 mod = import_optional_dependency(
---> 84 modname, raise_on_missing=False, on_version="ignore"
85 )
86 result[modname] = _get_version(mod) if mod else None
~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/compat/_optional.py in import_optional_dependency(name, extra, raise_on_missing, on_version)
97 minimum_version = VERSIONS.get(name)
98 if minimum_version:
---> 99 version = _get_version(module)
100 if distutils.version.LooseVersion(version) < minimum_version:
101 assert on_version in {"warn", "raise", "ignore"}
~/miniconda3/envs/test_pandas_1.1/lib/python3.7/site-packages/pandas/compat/_optional.py in _get_version(module)
42
43 if version is None:
---> 44 raise ImportError(f"Can't determine version for {module.name}")
45 return version
46
ImportError: Can't determine version for numba
The text was updated successfully, but these errors were encountered: