Skip to content

DataFrameGroupBy.filter returns empty dataframe when grouping by multiple columns and one column is type datetime64[ns] #11029

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jesaerys opened this issue Sep 8, 2015 · 2 comments
Labels
Datetime Datetime data dtype Groupby

Comments

@jesaerys
Copy link

jesaerys commented Sep 8, 2015

There seems to be a bug in the filter method when grouping by multiple columns, where one of the grouping columns has type datetime64[ns]. Here is an example:

data = [
    ('a', pd.Timestamp('2015-08-01'), 0),
    ('a', pd.Timestamp('2015-08-01'), 0),
    ('a', pd.Timestamp('2015-08-02'), 1),
    ('b', pd.Timestamp('2015-08-01'), 2),
    ('b', pd.Timestamp('2015-08-02'), 3),
    ('c', pd.Timestamp('2015-08-01'), 4),
    ('c', pd.Timestamp('2015-08-02'), 5),
    ('c', pd.Timestamp('2015-08-02'), 5),
]
data = pd.DataFrame(data, columns=('A', 'B', 'group_number'))
groups = data.groupby(['A', 'B'])
assert len(groups) == 6  # OK

filtered = groups.filter(lambda x: len(x) > 1)
assert len(filtered) == 4  # AssertionError

Now, if I change the timestamps to strings, filter works as expected:

data2 = data.copy()
data2['B'] = data2['B'].astype(str)
groups2 = data2.groupby(['A', 'B'])
filtered2 = groups2.filter(lambda x: len(x) > 1)
assert len(filtered2) == 4  # OK

Also, filter works as expected when grouping by only column B, regardless of whether it contains timestamps or strings.

What's going on here? Am I missing something or is this actually a bug? Thanks!

@jreback
Copy link
Contributor

jreback commented Sep 8, 2015

pls always pd.show_versions().

This was a regression from 0.16.0 in 0.16.2 and has been fixed in upcoming 0.17.0 in #10114

@jreback jreback closed this as completed Sep 8, 2015
@jreback jreback added Datetime Datetime data dtype Groupby labels Sep 8, 2015
@jesaerys
Copy link
Author

jesaerys commented Sep 8, 2015

I searched for "filter" prior to creating this, but I must have only searched the open issues. Sorry about that! Thanks for the info, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Groupby
Projects
None yet
Development

No branches or pull requests

2 participants