Skip to content

BUG: DataFrameGroupBy.value_counts fails with a TimeGrouper #50507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jan 3, 2023

Conversation

rhshadrach
Copy link
Member

Unlike other Groupings, BinGrouper was creating its Grouping based on the labels that were to be in the result. Thus the Grouping.grouping_vector length did not in general match the input object, which then fails when combing with other Groupings that do match the input object.

In order to include empty groups, e.g. the 2nd group below

df = pd.DataFrame(
    {
        "datetime": to_datetime(["2022-01-01", "2022-01-03"]),
        "values": [2, 3],
    }
)
gb = df.groupby(pd.Grouper(freq="1D", key="datetime"))
print(gb.sum())
#             values
# datetime          
# 2022-01-01       2
# 2022-01-02       0
# 2022-01-03       3

added a uniques argument to Grouping and then rely on Categorical to encode the groups including those that are missing.

codes = self.group_info[0]
labels = lev.take(codes)
ping = grouper.Grouping(
labels, labels, in_axis=False, level=None, uniques=lev.values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any danger that lev.values makes a copy? e.g. i think tzaware DatetimeIndex might

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this indeed did not work for tzaware. Changed to ._values and added tests.

@mroeschke mroeschke added Groupby Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jan 3, 2023
@mroeschke mroeschke added this to the 2.0 milestone Jan 3, 2023
@mroeschke mroeschke merged commit 3ea04c3 into pandas-dev:main Jan 3, 2023
@mroeschke
Copy link
Member

Thanks @rhshadrach

@rhshadrach rhshadrach deleted the value_counts_time_grouper branch January 4, 2023 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Groupby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrameGroupBy.value_counts fails with a TimeGrouper
3 participants