Skip to content

BUG: DataFrame.groupby(., dropna=True, axis=0) incorrectly throws ShapeError #35751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 88 commits into from
Dec 19, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
0484244
handle dropna=False in _selected_obj, _set_result_index_ordered
arw2019 Aug 15, 2020
a335744
add whatsnew
arw2019 Aug 16, 2020
640ec38
rewrote _set_result_index_ordered
arw2019 Aug 16, 2020
099e30c
added dropna=False to tests reliant on that
arw2019 Aug 16, 2020
8a13d06
delete second reindexing in _transform_general
arw2019 Aug 16, 2020
394feb6
restore + change test_evaluate_with_empty_groups
arw2019 Aug 16, 2020
e1cafd4
remove calls to obj.dropna()
arw2019 Aug 17, 2020
0df329c
separate tests for dataframe/series slices
arw2019 Aug 17, 2020
77e7fc7
fix series indexing
arw2019 Aug 21, 2020
16544ea
fix DataFrameGroupBy._transform_general
arw2019 Aug 21, 2020
46e5f66
move DataFrameGroupBy._transform_general logic to _set_result_index_o…
arw2019 Aug 21, 2020
6fca785
handle edge case (datetime index, no NaNs)
arw2019 Aug 21, 2020
6341d97
merge with upstream/master
arw2019 Sep 2, 2020
269516b
move logic to BaseGrouper and _set_index_ordered
arw2019 Sep 4, 2020
6819ac6
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 4, 2020
2ec491d
add dropna args to tests
arw2019 Sep 4, 2020
249fc2a
add dropna=False arg to test
arw2019 Sep 4, 2020
b1cafad
feedback
arw2019 Sep 11, 2020
bef1437
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 11, 2020
9abc8c4
revert change to test
arw2019 Sep 11, 2020
c63a24c
feedback
arw2019 Sep 11, 2020
9791e1e
revert changes to test_transform
arw2019 Sep 11, 2020
21a6fbb
revert changes to test
arw2019 Sep 11, 2020
8afb6e2
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 12, 2020
239e16a
feedback
arw2019 Sep 12, 2020
0cdea22
add non-RangeIndex test cases
arw2019 Sep 12, 2020
ee73640
merge with master
arw2019 Sep 14, 2020
c07df76
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 18, 2020
ca2f898
add comments/clean up _set_result_index_ordered
arw2019 Sep 18, 2020
e15df1a
revert accidental change to _transform_general signature
arw2019 Sep 19, 2020
342540f
rewrite _set_result_index_ordered using auxiliary method
arw2019 Sep 19, 2020
90e687b
rewrite whatsnew note
arw2019 Sep 19, 2020
a10a933
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 19, 2020
bddfa81
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 19, 2020
2adba09
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 19, 2020
531414f
feedback
arw2019 Sep 21, 2020
10ee18a
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 22, 2020
2fcfda0
rename variables (possibly) for clarity
arw2019 Sep 22, 2020
1969bc4
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 22, 2020
2972ee4
merge with upstream/master
arw2019 Sep 26, 2020
62caeb6
simplify _set_result_index_ordered
arw2019 Sep 26, 2020
557903f
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 26, 2020
deb1b09
minimize diff
arw2019 Sep 26, 2020
3d579c5
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 26, 2020
ec85d7f
merge with master
arw2019 Sep 29, 2020
f6a9724
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Sep 30, 2020
bfe6cde
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Oct 5, 2020
b6fd41c
add type hints to _set_result_index_ordered
arw2019 Oct 5, 2020
983bb8e
revert type hints
arw2019 Oct 9, 2020
1884133
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Oct 9, 2020
f207709
merge master
arw2019 Oct 11, 2020
4422a21
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Oct 17, 2020
daa60a6
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Oct 18, 2020
5658c12
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Oct 18, 2020
4326b79
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Oct 19, 2020
e12e8d9
merge with master
arw2019 Oct 23, 2020
9e6a130
BUG: fix merge mistake
arw2019 Oct 30, 2020
1770cc2
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Oct 30, 2020
6a005dc
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Oct 30, 2020
74dbe4f
Merge branch 'GH35612' of https://github.com/arw2019/pandas into GH35612
arw2019 Oct 30, 2020
0e9db9c
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Oct 31, 2020
8be535c
feedback: rewrite _set_result_index_ordered
arw2019 Oct 31, 2020
91940c7
Merge branch 'GH35612' of https://github.com/arw2019/pandas into GH35612
arw2019 Oct 31, 2020
85d2165
CI: fix pd namespace usage
arw2019 Oct 31, 2020
c31b49f
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Oct 31, 2020
21bfc82
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Nov 1, 2020
7f67086
merge with master
arw2019 Nov 2, 2020
5555585
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Nov 4, 2020
5ac7fbf
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Nov 5, 2020
1e7ab91
DOC: fix use of pd namespace in whatsnew
arw2019 Nov 5, 2020
96b5af4
DOC: fix typo in whatsnew entry
arw2019 Nov 5, 2020
9cf9e05
Merge remote-tracking branch 'upstream/master' into GH35612
arw2019 Nov 7, 2020
f5a1635
REF (feedback): _set_result_index_ordered
arw2019 Nov 8, 2020
bd1abf9
DOC: @rhshadrach fix
arw2019 Nov 10, 2020
a789b6a
REF: restore casing on rows_dropped
arw2019 Nov 10, 2020
2ba8b44
Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…
arw2019 Nov 11, 2020
95b86ba
Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…
arw2019 Nov 23, 2020
7881134
merge with master
arw2019 Dec 3, 2020
15aa56e
Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…
arw2019 Dec 5, 2020
08f0abd
Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…
arw2019 Dec 5, 2020
8ab9baa
Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…
arw2019 Dec 10, 2020
faf6570
move whatsnew to 1.3
arw2019 Dec 13, 2020
7bd2a9a
Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…
arw2019 Dec 13, 2020
24bb112
minimize diff
arw2019 Dec 13, 2020
4377b63
Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…
arw2019 Dec 16, 2020
de86144
Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…
arw2019 Dec 17, 2020
9bc9ce4
Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…
arw2019 Dec 18, 2020
1ea9d29
Merge branch 'master' of https://github.com/pandas-dev/pandas into GH…
arw2019 Dec 19, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,7 +221,7 @@ Indexing
Missing
^^^^^^^

- Bug in :meth:`SeriesGroupBy.transform` now correctly handles missing values for `dropna=False` (:issue:`35014`)
- Bug in :meth:`SeriesGroupBy.transform` and :meth:`DataFrameGroupBy.transform` now correctly handle missing values (:issue:`35014` and :issue:`35612`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing values in the grouper? with dropna?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rewrote, mentioned grouper. Also separated this entry from #35014

-

MultiIndex
Expand Down
7 changes: 6 additions & 1 deletion pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1677,12 +1677,17 @@ def _gotitem(self, key, ndim: int, subset=None):
exclusions=self.exclusions,
as_index=self.as_index,
observed=self.observed,
dropna=self.dropna,
)
elif ndim == 1:
if subset is None:
subset = self.obj[key]
return SeriesGroupBy(
subset, selection=key, grouper=self.grouper, observed=self.observed
subset,
selection=key,
grouper=self.grouper,
observed=self.observed,
dropna=self.dropna,
)

raise AssertionError("invalid ndim for _gotitem")
Expand Down
15 changes: 11 additions & 4 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -637,10 +637,12 @@ def _selected_obj(self):

if self._selection is None or isinstance(self.obj, Series):
if self._group_selection is not None:
return self.obj[self._group_selection]
return self.obj
result = self.obj[self._group_selection]
result = self.obj
else:
return self.obj[self._selection]
result = self.obj[self._selection]

return result.dropna() if self.dropna else result

def _reset_group_selection(self):
"""
Expand Down Expand Up @@ -690,7 +692,12 @@ def _set_result_index_ordered(self, result):
result.set_axis(index, axis=self.axis, inplace=True)
result = result.sort_index(axis=self.axis)

result.set_axis(self.obj._get_axis(self.axis), axis=self.axis, inplace=True)
if hasattr(self, "_selected_obj"):
labels = self._selected_obj._get_axis(self.axis)
else:
labels = self.obj._get_axis(self.axis)

result.set_axis(labels, axis=self.axis, inplace=True)
return result

def _dir_additions(self):
Expand Down
9 changes: 2 additions & 7 deletions pandas/tests/groupby/test_groupby_dropna.py
Original file line number Diff line number Diff line change
Expand Up @@ -165,12 +165,7 @@ def test_groupby_dropna_series_by(dropna, expected):
@pytest.mark.parametrize(
"dropna,df_expected,s_expected",
[
pytest.param(
True,
pd.DataFrame({"B": [2, 2, 1]}),
pd.Series(data=[2, 2, 1], name="B"),
marks=pytest.mark.xfail(raises=ValueError),
),
(True, pd.DataFrame({"B": [2, 2, 1]}), pd.Series(data=[2, 2, 1], name="B"),),
(
False,
pd.DataFrame({"B": [2, 2, 1, 1]}),
Expand All @@ -179,7 +174,7 @@ def test_groupby_dropna_series_by(dropna, expected):
],
)
def test_slice_groupby_then_transform(dropna, df_expected, s_expected):
# GH35014
# GH35014 & GH35612

df = pd.DataFrame({"A": [0, 0, 1, None], "B": [1, 2, 3, None]})
gb = df.groupby("A", dropna=dropna)
Expand Down