Skip to content

BUG: value_counts can handle the case even with empty groups (#28479) #28634

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Nov 7, 2019
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
b28456a
BUG: value_counts can handle the case even with empty groups (#28479)
dongho-jung Sep 25, 2019
3ef5e8a
.
dongho-jung Sep 26, 2019
40475e8
removing consecutive duplicates was the same as just unique
dongho-jung Sep 27, 2019
f1f104a
get the performance while handling the exception explicitly
dongho-jung Sep 28, 2019
6fbaaa6
get rid of try-except
dongho-jung Oct 3, 2019
be018a0
Merge branch 'master' of https://github.com/pandas-dev/pandas into fi…
dongho-jung Oct 3, 2019
3a7f71e
make test more idiomatic
dongho-jung Oct 3, 2019
e045290
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
dongho-jung Oct 14, 2019
d101a73
Merge origin/master into fix-GH28479-1
dongho-jung Oct 14, 2019
31461d2
solve merge confilcts
dongho-jung Oct 14, 2019
11df012
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
andrew-buzzni Oct 18, 2019
8005603
move the logic into the BinGrouper.recons_labels
dongho-jung Oct 19, 2019
b507259
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
dongho-jung Oct 19, 2019
67847ff
checkout unrelated files to master
dongho-jung Oct 19, 2019
66a96e3
checkout sorting.py to upstream/master not origin/master
dongho-jung Oct 19, 2019
9390d21
accept TomAugspurger requests
dongho-jung Oct 24, 2019
a75843f
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
dongho-jung Oct 24, 2019
d8da75a
use deterministic values instead of random values
dongho-jung Oct 24, 2019
6f6371c
undo adding recons_labels method
dongho-jung Oct 25, 2019
82e5153
move a change returning unique comp ids
dongho-jung Oct 25, 2019
3cf850e
Merge branch 'master' into fix-GH28479-1
dongho-jung Nov 1, 2019
97adfa6
Merge remote-tracking branch 'origin/fix-GH28479-1' into fix-GH28479-1
andrew-buzzni Nov 4, 2019
49514aa
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
andrew-buzzni Nov 4, 2019
df6454b
add recons_labels under the BinGrouper
andrew-buzzni Nov 4, 2019
28ee287
fix indexing in recons_labels
andrew-buzzni Nov 4, 2019
a1049fd
add an exact reproduction of the original issue
andrew-buzzni Nov 4, 2019
3436dda
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
dongho-jung Nov 5, 2019
961a72c
add a comment under recons_labels
andrew-buzzni Nov 6, 2019
b45527c
Merge remote-tracking branch 'upstream/master' into fix-GH28479-1
dongho-jung Nov 7, 2019
a364ac6
Merge branch 'fix-GH28479-1' of https://github.com/0xF4D3C0D3/pandas …
dongho-jung Nov 7, 2019
368311c
rename from recons_labels to recons_codes
dongho-jung Nov 7, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -453,6 +453,7 @@ Other
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` now support dicts as ``compression`` argument with key ``'method'`` being the compression method and others as additional compression options when the compression method is ``'zip'``. (:issue:`26023`)
- Bug in :meth:`Series.diff` where a boolean series would incorrectly raise a ``TypeError`` (:issue:`17294`)
- :meth:`Series.append` will no longer raise a ``TypeError`` when passed a tuple of ``Series`` (:issue:`28410`)
- :meth:`SeriesGroupBy.value_counts` will be able to handle the case even when the :class:`Grouper` makes empty groups (:issue: 28479)
- Fix corrupted error message when calling ``pandas.libs._json.encode()`` on a 0d array (:issue:`18878`)
- Fix :class:`AbstractHolidayCalendar` to return correct results for
years after 2030 (now goes up to 2200) (:issue:`27790`)
Expand Down
5 changes: 5 additions & 0 deletions pandas/core/groupby/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -750,6 +750,11 @@ def group_info(self):
ngroups,
)

@cache_readonly
def recons_labels(self):
# get unique result indices, and prepend 0 as groupby starts from the first
return [np.r_[0, np.flatnonzero(self.bins[1:] != self.bins[:-1]) + 1]]

@cache_readonly
def result_index(self):
if len(self.binlabels) != 0 and isna(self.binlabels[0]):
Expand Down
30 changes: 29 additions & 1 deletion pandas/tests/groupby/test_value_counts.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
import numpy as np
import pytest

from pandas import DataFrame, MultiIndex, Series, date_range
from pandas import DataFrame, Grouper, MultiIndex, Series, date_range, to_datetime
import pandas.util.testing as tm


Expand Down Expand Up @@ -79,3 +79,31 @@ def rebuild_index(df):
# have to sort on index because of unstable sort on values
left, right = map(rebuild_index, (left, right)) # xref GH9212
tm.assert_series_equal(left.sort_index(), right.sort_index())


def test_series_groupby_value_counts_with_grouper():
# GH28479
df = DataFrame(
{
"Timestamp": [
1565083561,
1565083561 + 86400,
1565083561 + 86500,
1565083561 + 86400 * 2,
1565083561 + 86400 * 3,
1565083561 + 86500 * 3,
1565083561 + 86400 * 4,
],
"Food": ["apple", "apple", "banana", "banana", "orange", "orange", "pear"],
}
).drop([3])

df["Datetime"] = to_datetime(df["Timestamp"].apply(lambda t: str(t)), unit="s")
dfg = df.groupby(Grouper(freq="1D", key="Datetime"))

# have to sort on index because of unstable sort on values xref GH9212
result = dfg["Food"].value_counts().sort_index()
expected = dfg["Food"].apply(Series.value_counts).sort_index()
expected.index.names = result.index.names

tm.assert_series_equal(result, expected)