Skip to content

Commit 8efc717

Browse files
dongho-jungjreback
authored andcommitted
BUG: value_counts can handle the case even with empty groups (#28479) (#28634)
1 parent 7adc14a commit 8efc717

File tree

3 files changed

+35
-1
lines changed

3 files changed

+35
-1
lines changed

doc/source/whatsnew/v1.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -455,6 +455,7 @@ Other
455455
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` now support dicts as ``compression`` argument with key ``'method'`` being the compression method and others as additional compression options when the compression method is ``'zip'``. (:issue:`26023`)
456456
- Bug in :meth:`Series.diff` where a boolean series would incorrectly raise a ``TypeError`` (:issue:`17294`)
457457
- :meth:`Series.append` will no longer raise a ``TypeError`` when passed a tuple of ``Series`` (:issue:`28410`)
458+
- :meth:`SeriesGroupBy.value_counts` will be able to handle the case even when the :class:`Grouper` makes empty groups (:issue: 28479)
458459
- Fix corrupted error message when calling ``pandas.libs._json.encode()`` on a 0d array (:issue:`18878`)
459460
- Fix :class:`AbstractHolidayCalendar` to return correct results for
460461
years after 2030 (now goes up to 2200) (:issue:`27790`)

pandas/core/groupby/ops.py

+5
Original file line numberDiff line numberDiff line change
@@ -767,6 +767,11 @@ def group_info(self):
767767
ngroups,
768768
)
769769

770+
@cache_readonly
771+
def recons_codes(self):
772+
# get unique result indices, and prepend 0 as groupby starts from the first
773+
return [np.r_[0, np.flatnonzero(self.bins[1:] != self.bins[:-1]) + 1]]
774+
770775
@cache_readonly
771776
def result_index(self):
772777
if len(self.binlabels) != 0 and isna(self.binlabels[0]):

pandas/tests/groupby/test_value_counts.py

+29-1
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
import numpy as np
1010
import pytest
1111

12-
from pandas import DataFrame, MultiIndex, Series, date_range
12+
from pandas import DataFrame, Grouper, MultiIndex, Series, date_range, to_datetime
1313
import pandas.util.testing as tm
1414

1515

@@ -79,3 +79,31 @@ def rebuild_index(df):
7979
# have to sort on index because of unstable sort on values
8080
left, right = map(rebuild_index, (left, right)) # xref GH9212
8181
tm.assert_series_equal(left.sort_index(), right.sort_index())
82+
83+
84+
def test_series_groupby_value_counts_with_grouper():
85+
# GH28479
86+
df = DataFrame(
87+
{
88+
"Timestamp": [
89+
1565083561,
90+
1565083561 + 86400,
91+
1565083561 + 86500,
92+
1565083561 + 86400 * 2,
93+
1565083561 + 86400 * 3,
94+
1565083561 + 86500 * 3,
95+
1565083561 + 86400 * 4,
96+
],
97+
"Food": ["apple", "apple", "banana", "banana", "orange", "orange", "pear"],
98+
}
99+
).drop([3])
100+
101+
df["Datetime"] = to_datetime(df["Timestamp"].apply(lambda t: str(t)), unit="s")
102+
dfg = df.groupby(Grouper(freq="1D", key="Datetime"))
103+
104+
# have to sort on index because of unstable sort on values xref GH9212
105+
result = dfg["Food"].value_counts().sort_index()
106+
expected = dfg["Food"].apply(Series.value_counts).sort_index()
107+
expected.index.names = result.index.names
108+
109+
tm.assert_series_equal(result, expected)

0 commit comments

Comments
 (0)