BUG: add reset logic for Grouper if new obj is passed in (#26564) #29800

alichaudry · 2019-11-22T20:41:55Z

closes pd.groupby seems to mutate my pd.Grouper in-place #26564
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

…26564)

pandas/core/groupby/grouper.py

jbrockmendel · 2019-11-22T20:50:22Z

needs tests that the relevant bug is fixed

jreback · 2019-11-22T23:12:28Z

this PR is likely duplicating this one: #29131

alichaudry · 2019-11-22T23:26:29Z

this PR is likely duplicating this one: #29131

Maybe I'll add my test here and see if it passes with #29131 .

…v#26564)

pandas/tests/resample/test_resampler_grouper.py

alichaudry · 2019-11-25T18:27:01Z

@jreback I wrote a test here and ran it against #29131 which doesn't pass the test. So this is still an outstanding issue:

=========================================================== test session starts ============================================================
platform linux -- Python 3.7.3, pytest-5.3.0, py-1.8.0, pluggy-0.13.0
rootdir: /home/ali/repo/pydatascience/pandas-dev2, inifile: setup.cfg
plugins: cov-2.8.1, xdist-1.30.0, hypothesis-4.47.1, forked-1.1.2
collected 1 item                                                                                                                           

pandas/tests/resample/test_resampler_grouper.py F                                                                                    [100%]

================================================================= FAILURES =================================================================
__________________________________________________ test_same_grouper_on_different_frames ___________________________________________________

    def test_same_grouper_on_different_frames():
    
        df1 = pd.DataFrame(
            [["a", 1, 2], ["a", 4, 5], ["b", 2, 3]], columns=["type", "num1", "num2"],
        )
        df1["date"] = pd.to_datetime(["05/29/2019", "05/28/2019", "05/27/2019"])
    
        df2 = pd.DataFrame([["c", 6, 7], ["d", 8, 9]], columns=["type", "num1", "num2"],)
        df2["date"] = pd.to_datetime(["02/12/2018", "03/13/2018"])
    
        groupbys = ["type", pd.Grouper(key="date", freq="1D")]
    
        df1.groupby(groupbys).sum()
>       df2.groupby(groupbys).count()

pandas/tests/resample/test_resampler_grouper.py:294: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/core/groupby/generic.py:1674: in count
    ids, _, ngroups = self.grouper.group_info
pandas/_libs/properties.pyx:34: in pandas._libs.properties.CachedProperty.__get__
    val = self.func(obj)
pandas/core/groupby/ops.py:299: in group_info
    comp_ids, obs_group_ids = self._get_compressed_labels()
pandas/core/groupby/ops.py:315: in _get_compressed_labels
    all_labels = [ping.labels for ping in self.groupings]
pandas/core/groupby/ops.py:315: in <listcomp>
    all_labels = [ping.labels for ping in self.groupings]
pandas/core/groupby/grouper.py:405: in labels
    self._make_labels()
pandas/core/groupby/grouper.py:424: in _make_labels
    labels = self.grouper.label_info
pandas/_libs/properties.pyx:34: in pandas._libs.properties.CachedProperty.__get__
    val = self.func(obj)
pandas/core/groupby/ops.py:310: in label_info
    sorter = np.lexsort((labels, self.indexer))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = ((array([ 0, 29]), array([2, 1, 0])),), kwargs = {}, relevant_args = (array([ 0, 29]), array([2, 1, 0]))

>   ???
E   ValueError: all keys need to be the same shape

<__array_function__ internals>:6: ValueError
============================================================ 1 failed in 0.17s =============================================================

jreback · 2019-11-25T22:09:34Z

@jreback I wrote a test here and ran it against #29131 which doesn't pass the test. So this is still an outstanding issue:

=========================================================== test session starts ============================================================
platform linux -- Python 3.7.3, pytest-5.3.0, py-1.8.0, pluggy-0.13.0
rootdir: /home/ali/repo/pydatascience/pandas-dev2, inifile: setup.cfg
plugins: cov-2.8.1, xdist-1.30.0, hypothesis-4.47.1, forked-1.1.2
collected 1 item                                                                                                                           

pandas/tests/resample/test_resampler_grouper.py F                                                                                    [100%]

================================================================= FAILURES =================================================================
__________________________________________________ test_same_grouper_on_different_frames ___________________________________________________

    def test_same_grouper_on_different_frames():
    
        df1 = pd.DataFrame(
            [["a", 1, 2], ["a", 4, 5], ["b", 2, 3]], columns=["type", "num1", "num2"],
        )
        df1["date"] = pd.to_datetime(["05/29/2019", "05/28/2019", "05/27/2019"])
    
        df2 = pd.DataFrame([["c", 6, 7], ["d", 8, 9]], columns=["type", "num1", "num2"],)
        df2["date"] = pd.to_datetime(["02/12/2018", "03/13/2018"])
    
        groupbys = ["type", pd.Grouper(key="date", freq="1D")]
    
        df1.groupby(groupbys).sum()
>       df2.groupby(groupbys).count()

pandas/tests/resample/test_resampler_grouper.py:294: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/core/groupby/generic.py:1674: in count
    ids, _, ngroups = self.grouper.group_info
pandas/_libs/properties.pyx:34: in pandas._libs.properties.CachedProperty.__get__
    val = self.func(obj)
pandas/core/groupby/ops.py:299: in group_info
    comp_ids, obs_group_ids = self._get_compressed_labels()
pandas/core/groupby/ops.py:315: in _get_compressed_labels
    all_labels = [ping.labels for ping in self.groupings]
pandas/core/groupby/ops.py:315: in <listcomp>
    all_labels = [ping.labels for ping in self.groupings]
pandas/core/groupby/grouper.py:405: in labels
    self._make_labels()
pandas/core/groupby/grouper.py:424: in _make_labels
    labels = self.grouper.label_info
pandas/_libs/properties.pyx:34: in pandas._libs.properties.CachedProperty.__get__
    val = self.func(obj)
pandas/core/groupby/ops.py:310: in label_info
    sorter = np.lexsort((labels, self.indexer))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

args = ((array([ 0, 29]), array([2, 1, 0])),), kwargs = {}, relevant_args = (array([ 0, 29]), array([2, 1, 0]))

>   ???
E   ValueError: all keys need to be the same shape

<__array_function__ internals>:6: ValueError
============================================================ 1 failed in 0.17s =============================================================

@alichaudry that PR is not merged, it may or may not pass the test you created; its not clear if this is a different issue or not. happy for you to investigate.

alichaudry · 2019-11-25T23:55:29Z

@alichaudry that PR is not merged, it may or may not pass the test you created; its not clear if this is a different issue or not. happy for you to investigate.

@jreback So what I meant is I took the following steps:

cloned [BUG] Fixed behavior of DataFrameGroupBy.apply to respect _group_selection_context #29131 locally (the PR you mentioned)
built pandas from scratch in a separate conda env (given that PR author's code)
put my test into the local version of [BUG] Fixed behavior of DataFrameGroupBy.apply to respect _group_selection_context #29131 and ran it, which failed

My point is, from what I can tell my test doesn't work on the code/changes in #29131, which means those changes don't address the bug that I found.

jreback · 2019-11-26T00:47:50Z

@alichaudry well i suspect that that PR solves some of the problem

happy to take a partial or full patch

pandas/core/groupby/grouper.py

jreback · 2020-01-01T18:21:35Z

@alichaudry pls merge master

…place

WillAyd · 2020-02-02T01:20:20Z

@alichaudry looks like CI is red - can you investigate and try to get passing?

pep8speaks · 2020-02-06T19:24:09Z

Hello @alichaudry! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-02-07 17:41:07 UTC

alichaudry · 2020-02-06T19:55:34Z

@WillAyd At this point I'm not sure how to proceed with this PR. I know what the issue is, and I've written a test in this PR which has the correct pass/fail behavior. I also put in a hacky solution for pd.Grouper in this PR which passes my test but fails a number of other tests. I don't have the bandwidth right now to re-work the pd.Grouper class to fix the root cause so I'm not sure where to go with this.

This issue as I described it in #26564 comes from the fact that I use pd.Grouper in a stateless way -- but when I use the grouper in a group-by, say df_first.groupby(my_grouper), and then use the same grouper in another group-by: df_second.groupby(my_grouper) it fails as my_grouper was mutated by the first group-by and the internal state of the grouper object (and its attributes) makes it fail on the latter one.

jbrockmendel · 2020-02-06T20:20:09Z

@alichaudry Will may have some more specific thoughts, but worst-case scenario you can make a PR with just the test and xfail it, so we'll know if this gets fixed down the road.

alichaudry · 2020-02-07T18:36:51Z

@jbrockmendel followed your advice; updated the PR to include only the test and marked it as xfail. I am seeing some unrelated test errors (they seem to have to do with pyarrow/feather functionality) so I was wondering if you have any insight into this. I will still wait for feedback from @WillAyd.

jbrockmendel · 2020-02-07T23:21:34Z

The pyarrow/feather failures are unrelated, you can ignore them for now.

alichaudry · 2020-02-20T20:56:40Z

@WillAyd If you get a chance I'd love to hear your thoughts on this PR. It's passed all tests as expected (only failing unrelated pyarrow/feather tests).

jbrockmendel · 2020-02-20T22:03:56Z

pandas/tests/resample/test_resampler_grouper.py

@@ -295,3 +296,35 @@ def test_median_duplicate_columns():
    result = df.resample("5s").median()
    expected.columns = result.columns
    tm.assert_frame_equal(result, expected)
+
+
+@pytest.mark.xfail(reason="marked as xfail for: #26564")


giving a GH reference is good, but the rest of this doesnt give much information. is there a single-line description that a reader would find informative?

can you add a "GH" in front of the "#" pls

ca

jbrockmendel · 2020-02-20T22:04:30Z

The OP says that this closes #26564, is that still accurate?

WillAyd · 2020-02-20T22:11:05Z

@alichaudry if you can merge master to get CI green can take another look

jreback · 2020-06-14T15:41:33Z

closing as stale if you want to continue, please open a new PR.

BUG: add reset logic for Grouper if new obj is passed in (pandas-dev#…

d03142e

…26564)

alichaudry mentioned this pull request Nov 22, 2019

pd.groupby seems to mutate my pd.Grouper in-place #26564

Open

jbrockmendel reviewed Nov 22, 2019

View reviewed changes

pandas/core/groupby/grouper.py Outdated Show resolved Hide resolved

BUG: adding test and addressing a comment on shortened URL (pandas-de…

05f649e

…v#26564)

jbrockmendel reviewed Nov 23, 2019

View reviewed changes

pandas/tests/resample/test_resampler_grouper.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Nov 23, 2019

View reviewed changes

pandas/tests/resample/test_resampler_grouper.py Outdated Show resolved Hide resolved

gfyoung added Bug Groupby labels Nov 24, 2019

BUG: make test df simpler and change to agg funcs (pandas-dev#29800)

0043739

jbrockmendel reviewed Nov 28, 2019

View reviewed changes

pandas/core/groupby/grouper.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'upstream/master' into pd-grouper-no-in-…

173f24a

…place

alichaudry added 2 commits February 6, 2020 13:00

Merge branch 'master' into pd-grouper-no-in-place

8f473d2

BUG: conform to standard test format (pandas-dev#29800)

8dea17d

BUG: blacken test (pandas-dev#29800)

5a1889b

alichaudry added 2 commits February 7, 2020 10:38

BUG: remove groupby fix and xfail test (pandas-dev#29800)

12e3b4f

BUG: fix import sort order (pandas-dev#29800)

079c847

jbrockmendel reviewed Feb 20, 2020

View reviewed changes

jreback closed this Jun 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: add reset logic for Grouper if new obj is passed in (#26564) #29800

BUG: add reset logic for Grouper if new obj is passed in (#26564) #29800

alichaudry commented Nov 22, 2019 •

edited

Loading

jbrockmendel commented Nov 22, 2019

jreback commented Nov 22, 2019

alichaudry commented Nov 22, 2019

alichaudry commented Nov 25, 2019

jreback commented Nov 25, 2019

alichaudry commented Nov 25, 2019

jreback commented Nov 26, 2019

jreback commented Jan 1, 2020

WillAyd commented Feb 2, 2020

pep8speaks commented Feb 6, 2020 •

edited

Loading

alichaudry commented Feb 6, 2020

jbrockmendel commented Feb 6, 2020

alichaudry commented Feb 7, 2020

jbrockmendel commented Feb 7, 2020

alichaudry commented Feb 20, 2020

jbrockmendel Feb 20, 2020

jbrockmendel commented Feb 20, 2020

WillAyd commented Feb 20, 2020

jreback commented Jun 14, 2020

BUG: add reset logic for Grouper if new obj is passed in (#26564) #29800

BUG: add reset logic for Grouper if new obj is passed in (#26564) #29800

Conversation

alichaudry commented Nov 22, 2019 • edited Loading

jbrockmendel commented Nov 22, 2019

jreback commented Nov 22, 2019

alichaudry commented Nov 22, 2019

alichaudry commented Nov 25, 2019

jreback commented Nov 25, 2019

alichaudry commented Nov 25, 2019

jreback commented Nov 26, 2019

jreback commented Jan 1, 2020

WillAyd commented Feb 2, 2020

pep8speaks commented Feb 6, 2020 • edited Loading

Comment last updated at 2020-02-07 17:41:07 UTC

alichaudry commented Feb 6, 2020

jbrockmendel commented Feb 6, 2020

alichaudry commented Feb 7, 2020

jbrockmendel commented Feb 7, 2020

alichaudry commented Feb 20, 2020

jbrockmendel Feb 20, 2020

Choose a reason for hiding this comment

jbrockmendel commented Feb 20, 2020

WillAyd commented Feb 20, 2020

jreback commented Jun 14, 2020

alichaudry commented Nov 22, 2019 •

edited

Loading

pep8speaks commented Feb 6, 2020 •

edited

Loading