Fix 'observed' kwarg not doing anything on SeriesGroupBy #26463

krsnik93 · 2019-05-19T21:13:24Z

closes observed keyword for SeriesGroupBy Ignored #24880
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback · 2019-05-19T21:29:22Z

when u make changes - do it in the original PR otherwise this is very confusing

krsnik93 · 2019-05-19T21:31:04Z

Sorry, I did not make any changes between the two PRs, except changing the branch. I wanted to switch because the first one was pushed from master.

jreback · 2019-05-19T21:32:26Z

ok fair enough - pls close one of them

krsnik93 · 2019-05-19T21:38:39Z

ok, closed the first one

codecov · 2019-05-19T21:53:41Z

Codecov Report

Merging #26463 into master will decrease coverage by <.01%.
The diff coverage is 96.77%.

@@            Coverage Diff             @@
##           master   #26463      +/-   ##
==========================================
- Coverage   91.77%   91.76%   -0.01%     
==========================================
  Files         174      174              
  Lines       50649    50652       +3     
==========================================
- Hits        46483    46482       -1     
- Misses       4166     4170       +4

Flag	Coverage Δ
#multiple	`90.3% <96.77%> (ø)`	⬆️
#single	`41.69% <12.9%> (-0.07%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/groupby/generic.py	`88.81% <100%> (-0.16%)`	⬇️
pandas/core/groupby/groupby.py	`97.17% <95.65%> (-0.06%)`	⬇️
pandas/io/gbq.py	`78.94% <0%> (-10.53%)`	⬇️
pandas/core/frame.py	`97% <0%> (-0.12%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e7ad884...e6bca5e. Read the comment docs.

pandas/core/groupby/generic.py

pandas/tests/groupby/test_groupby.py

WillAyd

Looking good just a few questions and comments on tests

pandas/core/groupby/generic.py

WillAyd · 2019-05-20T00:35:39Z

pandas/tests/groupby/test_categorical.py

+    df['a'] = df['a'].astype('category')
+    df['b'] = df['b'].astype('category')
+
+    # test .agg and .apply when observed == False


Rather than duplicate things in individual sections here should try and parametrize on agg/apply and True/False (for observed)

WillAyd · 2019-05-20T00:36:36Z

pandas/tests/groupby/test_categorical.py

+    df['a'] = df['a'].astype('category')
+    df['b'] = df['b'].astype('category')
+
+    # observed == False


Parametrize on True/False here as well. There should already be an observed fixture defined in the top level conftest.py that you can use

I couldn't make use of the existing fixture due to other parameters. I also included the cases for None, in case the default changes. The tests look much cleaner now, but there's a couple of fixtures that probably won't be used often.

WillAyd

This looks great - just a few more things from my perspective

WillAyd · 2019-05-20T16:19:48Z

pandas/tests/groupby/test_categorical.py

+
+
+@pytest.mark.parametrize("observed, index, op, data", [
+    (True, 'multi_index_cat_partial', 'agg', [3, 3, 4]),


You can add a separate decorator for op parametrized on agg/apply rather than having to duplicate each time here

Could also use the observed fixture separately

I think different values for index prevent me from doing that: I don't want apply to run with index multi_index_cat_partial (line 970), and I don't want to run agg on multi_index_non_cat_partial (line 971). BaseGrouper.apply changes the index when observed=True and I did not find a simple way to keep it, thus the differences in indices. Looking at values only would make this more simple, but that does not feel sufficient.

The same goes for observed, as observed=True and observed=False return different indices, I can't run all values of observed against all values of expected indices.

Oh OK I misread those values. You can also split those out into a separate test if they should exhibit different behavior

pandas/tests/groupby/test_categorical.py

WillAyd · 2019-05-20T16:22:36Z

pandas/tests/groupby/test_categorical.py

+    expected = pd.Series(data=data, index=index, name='c')
+    grouped = df_cat.groupby(['a', 'b'], observed=observed).c
+    actual = getattr(grouped, op)(sum)
+    assert_series_equal(expected, actual)


Can you change actual to result and do assert_series_equal(result, expected)?

WillAyd · 2019-05-20T16:26:36Z

pandas/tests/groupby/test_categorical.py

+    # GH 24880
+    index = request.getfixturevalue(index)
+    expected = pd.Series(data=data, index=index, name='c')
+    grouped = df_cat.groupby(['a', 'b'], observed=observed).c


Stylistic nit but can you use bracket notation instead of dot notation?

WillAyd · 2019-05-20T16:27:13Z

pandas/tests/groupby/test_categorical.py

+    # GH 24880
+    index = request.getfixturevalue(index)
+    expected = pd.Series(data=data, index=index, name='c')
+    actual = df_cat.groupby(['a', 'b'], observed=observed).c.\


Also stylistic but we only use implicit line continuations in code base, so could just break after the opening paranethes to apply here if not too long

WillAyd · 2019-05-20T16:27:58Z

pandas/tests/groupby/conftest.py

+
+
+@pytest.fixture
+def multi_index_cat_partial(df_cat):


Question on naming - what does partial mean here?

By partial I mean that it's not a full product of index level values. For example, if first level has ['x', 'y'] and second level ['a', 'b'], then a complete index is [('x', 'a'), ('x', 'b'), ('y', 'a'), ('y', 'b')] and I construct that with .from_product, and if not all combinations are in, then it's partial (for example, [('x', 'a'), ('x', 'b'), ('y', 'a')] where ('y', 'b') is not in.

Hmm OK. I find this rather confusing though that the determination as to whether or not this is "partial" is done entirely outside the scope of the fixture (i.e. it is up to the injected test to only partially align).

Is there not a way to make the fixture self contained to provide a Series or Frame instead? May require a rewrite of your tests but I think this is just confusing

WillAyd · 2019-05-20T16:32:59Z

@krsnik93 note that you also have an isort failure

…ports

WillAyd · 2019-05-20T19:46:00Z

pandas/tests/groupby/conftest.py

+
+
+@pytest.fixture
+def multi_index_cat_partial(df_cat):


Hmm OK. I find this rather confusing though that the determination as to whether or not this is "partial" is done entirely outside the scope of the fixture (i.e. it is up to the injected test to only partially align).

Is there not a way to make the fixture self contained to provide a Series or Frame instead? May require a rewrite of your tests but I think this is just confusing

WillAyd · 2019-05-20T19:46:39Z

pandas/tests/groupby/conftest.py

+def multi_index_cat_complete():
+    lvls = [CategoricalIndex(['x', 'y'], categories=['x', 'y'], ordered=False),
+            CategoricalIndex(['a', 'b'], categories=['a', 'b'], ordered=False)]
+    index = MultiIndex.from_product(lvls, names=['a', 'b'])


Since 'a' and 'b' are categories can you use distinct names? 'foo' and 'bar' are fine

WillAyd

Same comment in all places; I think these can be made more readable if the code gets reduced

Otherwise lgtm so over to @jreback

WillAyd · 2019-05-21T01:07:55Z

pandas/tests/groupby/test_categorical.py

+def test_groupby_series_observed_true(df_cat, operation):
+    # GH 24880
+    index = {
+        'agg': MultiIndex.from_frame(df_cat[['a', 'b']].drop_duplicates()),


I think it would be better to just use literal values here instead of deriving via these expressions; it would be easier to read and could be passed in via parametrization instead of using this local dict

pandas/tests/groupby/test_categorical.py

WillAyd · 2019-05-21T01:08:43Z

pandas/tests/groupby/test_categorical.py

+    index_names = df_cat.select_dtypes(
+        'category').columns.values.tolist() + [None]
+    index = {
+        True: MultiIndex.from_tuples(


Same comment

WillAyd · 2019-05-22T05:40:52Z

pandas/tests/groupby/test_categorical.py

+@pytest.mark.parametrize('operation', ['agg', 'apply'])
+def test_groupby_series_observed_true(df_cat, operation):
+    # GH 24880
+    index = {


Can you just parametrize this instead of using the dict this way?

Also using MultiIndex.from_* methods would be preferable for readability

pep8speaks · 2019-05-22T08:49:22Z

Hello @krsnik93! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2019-05-29 09:02:24 UTC

WillAyd

lgtm!

jreback · 2019-05-26T16:11:33Z

pandas/core/groupby/groupby.py

@@ -2301,6 +2302,69 @@ def tail(self, n=5):
        mask = self._cumcount_array(ascending=False) < n
        return self._selected_obj[mask]

+    def _reindex_output(self, result):
+        """
+        If we have categorical groupers, then we want to make sure that


can you update the doc-string with Parameters / Results; type things if you can

pandas/core/groupby/generic.py

jreback · 2019-05-26T16:16:44Z

does this have an associated issue?

jreback

needs a whatsnew note as well

jreback · 2019-05-26T16:17:18Z

pandas/tests/groupby/conftest.py

@@ -76,3 +76,13 @@ def three_group():
                      'D': np.random.randn(11),
                      'E': np.random.randn(11),
                      'F': np.random.randn(11)})
+
+
+@pytest.fixture


can this be more generally used in groupby/test_categorical.py?

I would have to keep doing .astype('category') in each of my tests for this. Either that, or derive another fixture from three_group, so it would not decrease the number of fixtures.

I also preferred literal values to random ones for easier equality checks.

it’s not about decreasing the number of fixtures
but rather reusing them across tests as much as possible

literal values are fine (u could just replace the random with fixed values)
changing existing to accommodate new tests is better than rolling new

jreback

change looks fine, can handle the cleaning up of the _wrap* at a later date, just a test comment

jreback · 2019-05-27T14:58:45Z

pandas/tests/groupby/test_categorical.py

+class TestSeriesGroupByObservedKwarg:
+    # GH 24880
+
+    @pytest.fixture(autouse=True)


don't use setup_method, this creates a rather opaque path. I don't think the class adds anything here (or rather it might but we don't use any classes in this file, so rather refactor in a later change).

I used a class not to have to do this bit:

df = df.copy()[:4] # leave out some groups df['A'] = df['A'].astype('category') df['B'] = df['B'].astype('category') df['C'] = pd.Series([1, 2, 3, 4])

in each of the 3 test functions. An alternative is to put the above in another fixture derived from the first one, but that would probably not be reused much.

So do I just rename setup_method to something like setup_df or do something else?

just define the fixture as a function outside of the class (and remove the class as indicated)

simonjayhawkins

a few minor points regarding the tests.

simonjayhawkins · 2019-05-27T21:49:31Z

pandas/tests/groupby/test_categorical.py

+
+
+@pytest.fixture
+def df_cat(df):


can you add a doctring, xref #19159

simonjayhawkins · 2019-05-27T21:52:19Z

pandas/tests/groupby/test_categorical.py

+    df_cat['A'] = df_cat['A'].astype('category')
+    df_cat['B'] = df_cat['B'].astype('category')
+    df_cat['C'] = pd.Series([1, 2, 3, 4])
+    yield df_cat


yield is perfectly valid, but for consistency with the rest of the fixtures can you use a return. There's no context, teardown or finalization here.

simonjayhawkins · 2019-05-27T21:56:36Z

pandas/tests/groupby/test_categorical.py

+                      'B': ['one', 'two', 'one', 'three']
+                      }, dtype='category'))),
+    ('apply', MultiIndex.from_frame(
+        pd.DataFrame({'A': ['foo', 'foo', 'bar', 'bar'],


if the index is the same as above with just the dtype being different, might be clearer if you parametrize over just dtype, something like...

@pytest.mark.paramtrize(...., 'kwargs', [(..., None), (..., dict(dtype='category'))]

and then

MultiIndex.from_frame(...., **kwargs)

simonjayhawkins · 2019-05-27T22:01:30Z

pandas/tests/groupby/test_categorical.py

+                      })))])
+def test_seriesgroupby_observed_true(df_cat, operation, index):
+    # GH 24880
+    expected = pd.Series(data=[1, 3, 2, 4], index=index, name='C')


Series and DataFrame are already imported, so you could remove the pd prefixes, here and elsewhere in the tests.

… for already imported

krsnik93 · 2019-05-28T15:27:44Z

Am I seeing this correctly? I don't see how test_constructor_list_frames could be failing from my changes...

WillAyd · 2019-05-28T15:31:37Z

It is unrelated; see #26546

krsnik93 · 2019-05-28T15:34:19Z

I see, thanks.

simonjayhawkins · 2019-05-28T22:15:31Z

@krsnik93 : the CI failures are fixed. can you merge master.

jreback · 2019-05-30T01:34:26Z

very nice @krsnik93

would love to have followups as discussed above (or other issues) if you can.

krsnik93 added 3 commits May 19, 2019 21:38

Fix 'observed' kwarg not doing anything on SeriesGroupBy

a5d6d1a

Merge branch 'GH24880'

41f49f4

Wrap long lines

2575c41

WillAyd requested changes May 19, 2019

View reviewed changes

pandas/core/groupby/generic.py Show resolved Hide resolved

pandas/tests/groupby/test_groupby.py Outdated Show resolved Hide resolved

WillAyd added the Groupby label May 19, 2019

WillAyd mentioned this pull request May 19, 2019

Fix GroupBy nth Handling with Observed=False #26419

Merged

3 tasks

Move tests to test_categorical.py

1c02d9f

WillAyd requested changes May 20, 2019

View reviewed changes

krsnik93 added 3 commits May 20, 2019 13:37

Merge remote-tracking branch 'upstream/master'

7350472

Merge branch 'master' into GH24880

0a949d5

Parameterized tests for 'observed' kwarg on SeriesGroupBy

0e9f473

WillAyd requested changes May 20, 2019

View reviewed changes

krsnik93 added 3 commits May 20, 2019 18:07

Merge remote-tracking branch 'upstream/master' into GH24880

1ef54f4

Split test_groupby_series_observed to utilize fixtures better;Sort im…

cd481ad

…ports

Sort imports in core/groupby/groupby.py

a515caf

WillAyd requested changes May 20, 2019

View reviewed changes

Remove too specific fixtures and adjust tests

ff42dd7

WillAyd requested changes May 21, 2019

View reviewed changes

WillAyd added this to the 0.25.0 milestone May 21, 2019

krsnik93 added 2 commits May 21, 2019 09:31

Merge remote-tracking branch 'upstream/master' into GH24880

c22875c

Use literal values for indices in tests

cc0b725

WillAyd requested changes May 22, 2019

View reviewed changes

krsnik93 added 2 commits May 22, 2019 09:47

Merge remote-tracking branch 'upstream/master' into GH24880

629a144

Use MultiIndex.from_* to construct indices in tests

e4fda22

Wrap long lines

8cfa4a1

WillAyd approved these changes May 22, 2019

View reviewed changes

jreback requested changes May 26, 2019

View reviewed changes

krsnik93 added 3 commits May 26, 2019 23:43

Merge remote-tracking branch 'upstream/master' into GH24880

db176de

Enhance docstring for _reindex_output

d520952

Modify tests to reuse existing fixture

3591dbc

jreback requested changes May 27, 2019

View reviewed changes

krsnik93 added 2 commits May 27, 2019 21:56

Merge remote-tracking branch 'upstream/master' into GH24880

f97c8a1

Refactor tests from a class to stand-alone functions

d5c9c40

simonjayhawkins reviewed May 27, 2019

View reviewed changes

krsnik93 added 2 commits May 28, 2019 01:48

Simplify a test, add a docstring for the fixture and drop pd.* prefix…

ad16db8

… for already imported

Merge remote-tracking branch 'upstream/master' into GH24880

7c525a1

Merge remote-tracking branch 'upstream/master' into GH24880

e6bca5e

jreback approved these changes May 30, 2019

View reviewed changes

jreback merged commit 9e76f4a into pandas-dev:master May 30, 2019

krsnik93 deleted the GH24880 branch June 17, 2019 14:10



		@pytest.mark.parametrize("observed, index, op, data", [
		(True, 'multi_index_cat_partial', 'agg', [3, 3, 4]),

Uh oh!

Fix 'observed' kwarg not doing anything on SeriesGroupBy #26463

Fix 'observed' kwarg not doing anything on SeriesGroupBy #26463

Uh oh!

Conversation

krsnik93 commented May 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented May 19, 2019

Uh oh!

krsnik93 commented May 19, 2019

Uh oh!

jreback commented May 19, 2019

Uh oh!

krsnik93 commented May 19, 2019

Uh oh!

codecov bot commented May 19, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

krsnik93 May 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd commented May 20, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented May 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated at 2019-05-29 09:02:24 UTC

krsnik93 commented May 19, 2019 •

edited

Loading

codecov bot commented May 19, 2019 •

edited

Loading

krsnik93 May 20, 2019 •

edited

Loading

pep8speaks commented May 22, 2019 •

edited

Loading