Skip to content

BUG: groupby _cython_agg_blocks implicitly assumes unique columns #31735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jbrockmendel opened this issue Feb 6, 2020 · 4 comments · Fixed by #36389
Closed

BUG: groupby _cython_agg_blocks implicitly assumes unique columns #31735

jbrockmendel opened this issue Feb 6, 2020 · 4 comments · Fixed by #36389
Labels
Bug Groupby Internals Related to non-user accessible pandas implementation Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@jbrockmendel
Copy link
Member

xref #31616, the two test cases that adds both have unique columns. Editing test_agg_split_object_part_datetime to make columns non-unique breaks it:

df = pd.DataFrame(
            {
                "A": pd.date_range("2000", periods=4),
                "B": ["a", "b", "c", "d"],
                "C": [1, 2, 3, 4],
                "D": ["b", "c", "d", "e"],
                "E": pd.date_range("2000", periods=4),
                "F": [1, 2, 3, 4],
            }
).astype(object)
df.columns = ["A", "B", "B", "D", "E", "F"]

>>> result = df.groupby([0, 0, 0, 0]).min()
pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

cc @WillAyd @TomAugspurger

@WillAyd
Copy link
Member

WillAyd commented Feb 7, 2020

Is this getting thrown from _wrap_agged_blocks? There was a lot of work outside of the block code to use OutputKey in groupby.base as a key and have the wrap_* functions be able to work with that

@WillAyd WillAyd added Groupby Internals Related to non-user accessible pandas implementation labels Feb 7, 2020
@jbrockmendel
Copy link
Member Author

Is this getting thrown from _wrap_agged_blocks?

No, it is in the _cython_agg_blocks call, full traceback:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pandas/core/groupby/groupby.py", line 1392, in func
    min_count=min_count,
  File "pandas/core/groupby/generic.py", line 1007, in _cython_agg_general
    how, alt=alt, numeric_only=numeric_only, min_count=min_count
  File "pandas/core/groupby/generic.py", line 1059, in _cython_agg_blocks
    result = s.aggregate(lambda x: alt(x, axis=self.axis))
  File "pandas/core/groupby/generic.py", line 963, in aggregate
    result = self._aggregate_multiple_funcs([func], _axis=self.axis)
  File "pandas/core/base.py", line 555, in _aggregate_multiple_funcs
    return concat(results, keys=keys, axis=1, sort=False)
  File "pandas/core/reshape/concat.py", line 283, in concat
    sort=sort,
  File "pandas/core/reshape/concat.py", line 452, in __init__
    self.new_axes = self._get_new_axes()
  File "pandas/core/reshape/concat.py", line 515, in _get_new_axes
    for i in range(ndim)
  File "pandas/core/reshape/concat.py", line 515, in <listcomp>
    for i in range(ndim)
  File "pandas/core/reshape/concat.py", line 571, in _get_concat_axis
    indexes, self.keys, self.levels, self.names
  File "pandas/core/reshape/concat.py", line 671, in _make_concat_multiindex
    mapped = level.get_indexer(hlevel)
  File "pandas/core/indexes/base.py", line 2990, in get_indexer
    "Reindexing only valid with uniquely valued Index objects"
pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects

@mroeschke mroeschke added the Bug label Apr 27, 2020
@phofl phofl added the Needs Tests Unit test(s) needed to prevent regressions label Sep 13, 2020
@phofl
Copy link
Member

phofl commented Sep 13, 2020

Seems to work now. I get the following output on master:

           A  B  B  D          E  F
0 2000-01-01  a  1  b 2000-01-01  1

@jbrockmendel
Copy link
Member Author

this was likely fixed by the recent changes to operate via BlockManager.apply; IIRC those didnt implement any tests specific to this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Groupby Internals Related to non-user accessible pandas implementation Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants