Skip to content

BUG: value_counts can handle the case even with empty groups (#28479) #28634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Nov 7, 2019
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
b28456a
BUG: value_counts can handle the case even with empty groups (#28479)
dongho-jung Sep 25, 2019
3ef5e8a
.
dongho-jung Sep 26, 2019
40475e8
removing consecutive duplicates was the same as just unique
dongho-jung Sep 27, 2019
f1f104a
get the performance while handling the exception explicitly
dongho-jung Sep 28, 2019
6fbaaa6
get rid of try-except
dongho-jung Oct 3, 2019
be018a0
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
dongho-jung Oct 3, 2019
3a7f71e
make test more idiomatic
dongho-jung Oct 3, 2019
e045290
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
dongho-jung Oct 14, 2019
d101a73
Merge origin/master into fix-GH28479-1
dongho-jung Oct 14, 2019
31461d2
solve merge confilcts
dongho-jung Oct 14, 2019
11df012
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
andrew-buzzni Oct 18, 2019
8005603
move the logic into the BinGrouper.recons_labels
dongho-jung Oct 19, 2019
b507259
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
dongho-jung Oct 19, 2019
67847ff
checkout unrelated files to master
dongho-jung Oct 19, 2019
66a96e3
checkout sorting.py to upstream/master not origin/master
dongho-jung Oct 19, 2019
9390d21
accept TomAugspurger requests
dongho-jung Oct 24, 2019
a75843f
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
dongho-jung Oct 24, 2019
d8da75a
use deterministic values instead of random values
dongho-jung Oct 24, 2019
6f6371c
undo adding recons_labels method
dongho-jung Oct 25, 2019
82e5153
move a change returning unique comp ids
dongho-jung Oct 25, 2019
3cf850e
Merge branch 'master' into fix-GH28479-1
dongho-jung Nov 1, 2019
97adfa6
Merge remote-tracking branch 'origin/fix-GH28479-1' into fix-GH28479-1
andrew-buzzni Nov 4, 2019
49514aa
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
andrew-buzzni Nov 4, 2019
df6454b
add recons_labels under the BinGrouper
andrew-buzzni Nov 4, 2019
28ee287
fix indexing in recons_labels
andrew-buzzni Nov 4, 2019
a1049fd
add an exact reproduction of the original issue
andrew-buzzni Nov 4, 2019
3436dda
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
dongho-jung Nov 5, 2019
961a72c
add a comment under recons_labels
andrew-buzzni Nov 6, 2019
b45527c
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
dongho-jung Nov 7, 2019
a364ac6
Merge branch 'fix-GH28479-1' of https://github.com/0xF4D3C0D3/pandas …
dongho-jung Nov 7, 2019
368311c
rename from recons_labels to recons_codes
dongho-jung Nov 7, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -299,6 +299,7 @@ Other
- Using :meth:`DataFrame.replace` with overlapping keys in a nested dictionary will no longer raise, now matching the behavior of a flat dictionary (:issue:`27660`)
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` now support dicts as ``compression`` argument with key ``'method'`` being the compression method and others as additional compression options when the compression method is ``'zip'``. (:issue:`26023`)
- :meth:`Series.append` will no longer raise a ``TypeError`` when passed a tuple of ``Series`` (:issue:`28410`)
- :meth:`SeriesGroupBy.value_counts` will be able to handle the case even when the :class:`Grouper` makes empty groups (:issue: 28479)

.. _whatsnew_1000.contributors:

Expand Down
9 changes: 8 additions & 1 deletion pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1259,7 +1259,14 @@ def value_counts(
rep = partial(np.repeat, repeats=np.add.reduceat(inc, idx))

# multi-index components
labels = list(map(rep, self.grouper.recons_labels)) + [llab(lab, inc)]
try:
labels = list(map(rep, self.grouper.recons_labels)) + [llab(lab, inc)]
except ValueError:
# If applying rep to recons_labels go fail, use ids which has no
# consecutive duplicates instead.
_ids_idx = np.ones(len(ids), dtype=bool)
_ids_idx[1:] = ids[1:] != ids[:-1]
labels = list(map(rep, [ids[_ids_idx]])) + [llab(lab, inc)]
levels = [ping.group_index for ping in self.grouper.groupings] + [lev]
names = self.grouper.names + [self._selection_name]

Expand Down
27 changes: 26 additions & 1 deletion pandas/tests/groupby/test_value_counts.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import numpy as np
import pytest

from pandas import DataFrame, MultiIndex, Series, date_range
from pandas import DataFrame, Grouper, MultiIndex, Series, date_range
from pandas.util import testing as tm


Expand Down Expand Up @@ -79,3 +79,28 @@ def rebuild_index(df):
# have to sort on index because of unstable sort on values
left, right = map(rebuild_index, (left, right)) # xref GH9212
tm.assert_series_equal(left.sort_index(), right.sort_index())


@pytest.mark.parametrize(
"freq, size, frac", product(["1D", "2D", "1W", "1Y"], [100, 1000], [0.1, 0.5, 1])
)
def test_series_groupby_value_counts_with_grouper(freq, size, frac):
np.random.seed(42)

df = DataFrame.from_dict(
{
"date": date_range("2019-09-25", periods=size),
"name": np.random.choice(list("abcd"), size),
}
).sample(frac=frac)

gr = df.groupby(Grouper(key="date", freq=freq))["name"]

# have to sort on index because of unstable sort on values xref GH9212
result = gr.value_counts().sort_index()
expected = gr.apply(Series.value_counts).sort_index()
expected.index.names = (
result.index.names
) # .apply(Series.value_counts) can't create all names

tm.assert_series_equal(result, expected)