pct change bug issue 21200 #21235

simonariddell · 2018-05-28T23:10:49Z

tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

This addresses #21200 . When there are different groups in a dataframe, by using groupby it is expected that the pct_change function be applied on each group. However, combining groupby with pct_change does not produce the correct result.

Explanation:
Currently the groupby method in the pandas series and pandas dataframe pct change method can implement a vectorized solution, rather than calling apply, if certain conditions are met. For the pandas series method, the vectorized solution is the only option.
This is certainly inappropriate in cases where the groupby object is non-monotonic in its group order. To solve this I've added a check for monotonicity in both the series and dataframe implementation, as well as adding the opportunity to call apply for the series method.
In addition, I have augmented the UT to accept a parameter that can shuffle the dataframe, in order to ensure that the correct calculation occurs.

Concern
One concern I have is that depending on whether the apply or vectorized solution is used within the pct change method (e.g. depending on whether the groupby object is monotonic or not), the result returned to the user may have a different index structure. While this was the case prior to the PR, It's not clear to me if (1) this is an acceptable design within the pandas infrastructure, and (2) whether or not this is within the scope of a single PR that was originally opened to address a very specific bug.

As this is my first pandas PR, I would certainly appreciate feedback, and will incorporate any constructive feedback into future issues.

pep8speaks · 2018-05-28T23:10:54Z

Hello @SimonAlecks! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/groupby/generic.py !
There are no PEP8 issues in the file pandas/core/groupby/groupby.py !
There are no PEP8 issues in the file pandas/tests/groupby/test_transform.py !

Comment last updated on November 27, 2018 at 06:03 Hours UTC

WillAyd · 2018-05-29T11:26:32Z

Just eyeballing this I do not like that the implementation is going to vary depending on the monotonicity of the group as that has relatively large performance implications that the standard user will not understand nor should they care to

Is there any way to preserve the vectorized solution and come to the same result?

simonariddell · 2018-05-29T17:39:29Z

@WillAyd It's hard for me to think of a scenario where the vectorized solution would work if the data is not in order with respect to groups, although perhaps the monotonicity requirement is overkill? I'm new to Pandas source code, so I'll investigate the existing helper functions that may preserve the intended logic but lessen the requirements.

I'm also a little concerned though that the vectorized solution may be getting the first element of each group wrong, if it's calculating its percentage change with respect to the last element of the previous group. I'll check this when I get home.

codecov · 2018-05-30T04:47:31Z

Codecov Report

Merging #21235 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21235      +/-   ##
==========================================
+ Coverage   92.21%   92.21%   +<.01%     
==========================================
  Files         162      162              
  Lines       51763    51768       +5     
==========================================
+ Hits        47733    47740       +7     
+ Misses       4030     4028       -2

Flag	Coverage Δ
#multiple	`90.61% <100%> (ø)`	⬆️
#single	`43% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/groupby/groupby.py	`96.67% <100%> (+0.16%)`	⬆️
pandas/core/groupby/generic.py	`87.12% <100%> (+0.04%)`	⬆️
pandas/util/testing.py	`87.41% <0%> (-0.1%)`	⬇️
pandas/io/json/json.py	`93.09% <0%> (+0.47%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 89f0441...01d705f. Read the comment docs.

simonariddell · 2018-05-30T04:51:02Z

After changing the UT to have a value in the 0th mock array element, the UT begins to fail. The non-apply based vectorized solution doesn't seem to work, as it calculates a pct diff between the first element of a group, and the last element of the preceding group.

It seems to me the simplest solution to solve this issue is to remove the potential to use a vectorized solution. In addition to solving the original issue, it also solves the other issue I discovered above, as well as the additional bonus of letting the user expect the same index structure returned whether or not the vectorized style was called or not (granted, this could easily be fixed).

I suppose we could try preserving the vectorized solution, then writing a little algo to find the first element of each group and setting it to nan. I have no prior knowledge as to whether this would improve performance vs. calling apply. My intuition says at this point you'd be better off just using apply in the first place, but I could very well be wrong.

I also left my updated UT in, as it would catch these cases in case they occurred again, or if a vectorized solution was reimplemented.

Any thoughts @WillAyd ?

PS: Does anyone have a link they can share with me on how to troubleshoot/debug the continuous-integration/travis-ci/pr failure?

jreback

you are killing performance here, this is why it was changed originally. The real fix is to do this in a cythonized version, similar to the way shift/fillna are done.

jreback · 2018-05-30T11:12:54Z

pandas/tests/groupby/test_transform.py

+        df.insert(0, 'A', result)
+        result = df.sort_values(by='key')
+        result = result.loc[:, 'A']
+        result = result.reset_index(drop=True)


don't use inplace in testing

jreback · 2018-05-30T11:13:49Z

pandas/tests/groupby/test_transform.py

+def test_pct_change(test_series, shuffle, periods, fill_method, limit):
+    vals = [3, np.nan, 1, 2, 4, 10, np.nan, np.nan]
+    keys = ['a', 'b']
+    key_v = [k for j in list(map(lambda x: [x] * len(vals), keys)) for k in j]


if you really want to do this, use np.repeat

jreback · 2018-05-30T11:15:34Z

pandas/tests/groupby/test_transform.py

+        df = df.reindex(order).reset_index(drop=True)
+
+    manual_apply = []
+    for k in keys:


why aren't you just using this? df.loc[df.key == k, 'vals'], which is a Series

jreback · 2018-05-30T11:16:01Z

pandas/tests/groupby/test_transform.py

+                                                fill_method=fill_method,
+                                                limit=limit))
+    exp_vals = pd.concat(manual_apply).reset_index(drop=True)
+    exp = pd.DataFrame(exp_vals, columns=['A'])


concat already returns a frame, use ignore_index=True in the concat

WillAyd · 2018-05-30T17:41:49Z

Just looking at this can't you just changed the following line:

pandas/pandas/core/groupby/groupby.py

Line 2081 in c85ab08

shifted = filled.shift(periods=periods, freq=freq)

To be self.shift instead of filled.shift? I think that emulates exactly what the user is doing in the example cited so may just be a typo in the first place. That would preserve the performance and be much preferable if it works

simonariddell · 2018-06-04T01:00:28Z

In my most recent commit I tried to preserve vectorization, while also not making costly calls to is_monotonic, and preventing group contamination within the vectorization (e.g. first element of group B being divided by an element of group A).

I did this by only allowing the code to attempt vectorization if it can confirm that the series is monotonic, and if the truth value of whether or not it is monotonic already exists in the cache. If these conditions are met, it will do a vectorized pct change operation, and then a vectorized operation to set any values that shouldn't exist to np.nan (e.g. for shift=1 the first element of a group would be divided by the last element of the prior group, but this is an invalid value).

I expect this is trying to be too fancy given the problem and my knowledge of pandas source, so I'm prepared to nix it if this is the case, and just keep it simple and/or try to implement it in cython. Please just let me know. Else I will extend it to the dataframe method as well.

Hi William,
I tried what you suggested -- but I don't think it makes sense. I think the current logic is correct in so far as shifted should simply be the filled series that is shifted once. Another way to say it, is changing the code to your suggestion wouldn't address the issue that non-monotonic group ordering is handled incorrectly, and that even if it were monotonic, the first element of a group may have a pct change with respect to the last element of the previous group.

WillAyd

This isn't the correct approach and is way too convoluted. My previous suggestion worked fine for your tests, save the tests that introduce allowing None as an argument.

If however you can find a scenario where what I suggested doesn't work then this needs to be implemented in Cython.

WillAyd · 2018-06-04T05:04:03Z

pandas/tests/groupby/test_transform.py

-    df = DataFrame({'key': ['a'] * len(vals) + ['b'] * len(vals),
-                    'vals': vals * 2})
+    (-1, 'bfill', None), (-1, 'bfill', 1),
+    (-1, None, None), (-1, None, 1),


None is not a valid fill_method on master so this is changing the behavior

WillAyd · 2018-06-04T05:05:02Z

pandas/core/groupby/groupby.py

-        shifted = filled.shift(periods=periods, freq=freq)
-
-        return (filled / shifted) - 1
+        return self.apply(lambda x: x.pct_change(periods=periods,


Isn't this essentially eliminating a vectorized solution across the board? How are the ASVs comparing?

pct_change is blacklisted in benchmarks/groupby.py.

…method, improve test. pandas-dev#21235

simonariddell · 2018-06-09T23:43:55Z

For the reasons I mentioned above a vectorized solution will return incorrect values for pct_change under a set of conditions. I attempted to solve these conditions in the CR above, which William pointed out was too convoluted (I agree).

The additional change you suggested (self.shift instead of filled.shift) does not work either for the reasons I mentioned above. The new test I wrote would fail under that change.

My suggestion is to update the test, remove the vectorized solution, and create a new issue to implement this in cython, which I'm interested in doing.

WillAyd · 2018-06-10T00:19:53Z

Not sure what you mean by the benchmarks being blacklisted but for illustration purposes here is how your latest compares compares to master:

       before           after         ratio
     [415012f4]       [eaede342]
+         571±1μs          975±4ms  1708.33  groupby.GroupByMethods.time_dtype_as_group('float', 'pct_change', 'transformation')
+         572±2μs        974±0.6ms  1702.23  groupby.GroupByMethods.time_dtype_as_group('float', 'pct_change', 'direct')
+         592±3μs          635±8ms  1072.93  groupby.GroupByMethods.time_dtype_as_group('int', 'pct_change', 'transformation')
+         606±8μs          628±3ms  1036.87  groupby.GroupByMethods.time_dtype_as_group('int', 'pct_change', 'direct')
+        559±10μs         514±10ms   920.59  groupby.GroupByMethods.time_dtype_as_field('float', 'pct_change', 'direct')
+       529±0.9μs          481±5ms   908.79  groupby.GroupByMethods.time_dtype_as_field('float', 'pct_change', 'transformation')
+         603±8μs         460±20ms   762.64  groupby.GroupByMethods.time_dtype_as_field('int', 'pct_change', 'direct')
+         593±1μs          445±3ms   750.02  groupby.GroupByMethods.time_dtype_as_field('int', 'pct_change', 'transformation')

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.

As you can see (and I think have already recognized) this isn't a great solution and kills performance.

The additional change you suggested (self.shift instead of filled.shift) does not work either for the reasons I mentioned above. The new test I wrote would fail under that change.

I'm not entirely clear on what your reasons are. It's not a requirement that grouped elements be monotonic in any of the Cythonized code.

If you are getting an error on the test can you be explicit about what exactly that error is? IIRC correctly with the last set of tests what I suggested worked fine. I have a hunch that there may be confusion about the test as it was before (i.e. explicitly passing None would fail, but that was also expected). I could be wrong but in any case clarifying the failures you are seeing would be very helpful.

Final note - thanks for helping on this. I hope you are taking this feedback constructively and not in any negative fashion (I believe you are but just want to be explicit). Understanding how / when to interface with the Cythonized code is one of the more complicated aspects of this package

…method. Incorporate CR suggestion.

simonariddell · 2018-06-19T06:09:14Z

I'm very grateful for the feedback, and I definitely appreciate your time. I apologize for the delay since the last commit. I've been working on understanding the code more clearly.

In this situation I've attempted to solve the problem by incorporating your prior feedback. My previous posts were incorrect, and you correctly pointed out to me how the shift operator works.

In this latest CR I've attempted to fix the problem in line with the spirit of the original implementation. The first step is I added an explicit catch for a fill_method, which did not exist before.

I initially attempted to implement your 1-line suggestion, but I noticed that I couldn't simply call self.shift(), as it first needed to have the proper fill method. However, the fill method returns the obj attribute, not the groupby object.

In order to remedy this, without creating a function that changed the state of the code, I used the following strategy:
https://github.com/SimonAlecks/pandas/blob/64097ccd308db909dc99cbb88eeb6206beedb820/pandas/core/groupby/groupby.py#L2079
I first recreate a new groupby object, but I solve the fill problem as part of this group. I remember I then had to reset the cache in order for some of the meta-data to align properly. I convinced myself this was necessary at the time, but I didn't write down why, so I'm going to go revisit that now to remind myself that I did this for a good reason. More generally, I could see it being the case that this solution is a hack or not idiomatic within pandas. If so, could you please let me know the correct way to achieve this same result?

Once this is done, the result is relatively self-explanatory. and is pretty similar to the original code.

Lastly, I ran an ASV benchmark locally, and the performance seemed similar, but I need to spend some more time teaching myself how the benchmarking system and code works, as it's possible I made an error.

WillAyd

Getting closer

WillAyd · 2018-06-21T03:05:19Z

pandas/core/groupby/groupby.py

+            new = DataFrameGroupBy(self._obj_with_exclusions,
+                                   grouper=self.grouper)
+            new.obj = getattr(new, fill_method)(limit=limit)
+            new._reset_cache()


What's the point of these calls?

If a fill method is chosen, I want a groupby instance that has the properly filled data as its obj. When I assign new.obj = getattr(new, fill_method)(limit=limit), the new._cache is not updated, as a result it does not reflect the latest change.
By creating a new groupby object with the data object with the correct filled values I am able to call new.shift(periods=periods, freq=freq) later, where the shift method is called on data that has already been properly filled.
The 'trick' is that by assigning the filled data back to the groupby instance, I am able to still have access to the helper methods like shift.

Can you give me an example of where these three lines differ from just doing self.ffill() (I'm assuming fill_method=='ffill' and limit==1)?

Haven't seen any other aggregation function have to make these types of calls so I'm still confused as to why this is necessary. A concrete example would be immensely helpful

I can simplify those three lines, but I don't believe I can simply use self.ffill(). Let me try to explain:
The first line can be replaced with new = copy(self). However, the reason I call the fill method and reassign it to the .obj attribute, is that later on line 2087 I want to call the groupby implementation of the shift method on new. Whereas type(self.ffill()) returns a DataFrame. I used this 'trick' of doing the fill and reassigning it back to the .obj attribute in order to still have access to the groupby helper methods. If there is a more idiomatic way to do this, please let me know.

In this case if I were to remove those three lines and replace with with self.ffill(), the code would immediately fail when I tried to call a groupby method on the returned dataframe object.

Is this sufficiently clear/concrete?

Probably just easier if I share code myself at this point :-). Building off of what you have here's my stab at implementation:

https://github.com/WillAyd/pandas/blob/ebe4fffd935f2f16b6e5c635cfa4883e6bacc5ea/pandas/core/groupby/groupby.py#L2070

AFAICT it passes all of the tests you have and therefore provides the same behavior. However, it has the advantage of being more terse and not requiring the construction of a secondary GroupBy object.

Not saying you need to exactly mimic this but I'm sharing to highlight how I think this can be done more succinctly again without the need of constructing entirely new objects. If you know of where this would fail then we probably want to add a test case to handle whatever that difference is. Otherwise if they are the same I'd prefer if you adopt something a little more in line with what I've suggested.

No worries on timing - this is open source so any and all contributions are welcome at any time.

Thanks for sharing some code, that was very helpful. I updated my test to highlight a case where your code would fail, but what I wrote wouldn't. Specifically (1) I added a test case for no fill method, and (2) I added more np.nan in the test data. This causes your code to fail as it does not apply the fill_method to the shifted data.

I agree making a second object seems wrong. However, It's not clear to me how to avoid this, unless there are additional helper methods that I'm just not aware of or finding.

WillAyd · 2018-06-21T03:07:34Z

pandas/tests/groupby/test_transform.py

+        df = df.reindex(order).reset_index(drop=True)
+
+    manual_apply = []
+    for k in keys:


This loop / concat mechanism is rather non-idiomatic - whats the reasoning for this instead of just doing a comparison to the anonymous function as was done in the original issue?

I chose to do this as it means the test will ultimately assert that the result of a groupby apply is the same as looping over the keys and doing a normal apply, as the two should be identical. Or, more specifically, the groupby apply should ought to replicate the results of a normal apply looped over the groups.
I'm experimenting now with what an anonymous comparison would look like, but my naive thought is that a comparison to a normal apply is more to-the-point on what we want to test?

Edit: Fixed it in the newest CR. I like your suggestion more now that I see it in code.

WillAyd · 2018-06-21T03:10:07Z

pandas/tests/groupby/test_transform.py

@@ -743,14 +754,21 @@ def get_result(grp_obj):
                                  limit=limit)

    if test_series:
-        exp = pd.Series(exp_vals * 2)
-        exp.name = 'vals'
+        exp = exp.loc[:, 'A']
        grp = grp['vals']
        result = get_result(grp)


Not that your change introduced it but can we get rid of the get_result function? Not sure why this was set up this way and adds unnecessary fluff

WillAyd · 2018-06-21T03:12:12Z

pandas/tests/groupby/test_transform.py

        grp = grp['vals']
        result = get_result(grp)
+        df.insert(0, 'A', result)


What is the point of all of the insert / sort calls? Would rather just construct the expected value and simply compare that to the result of a pct_change call

Agree. Fixed.

…method. Incorporate CR suggestion.

…s/pandas into groupby_pctchange_bug

…method. Incorporate CR suggestion.

simonariddell · 2018-06-24T01:39:29Z

Incorporated most requested changes, with a few clarifying questions asked on a few of them. Thanks for your patience!

mroeschke · 2018-06-25T17:37:20Z

@SimonAlecks do you believe this change would also solve #21621?1 If so, could you add a test cases for that issue?

WillAyd · 2018-06-26T03:24:01Z

pandas/core/groupby/groupby.py

+            new = DataFrameGroupBy(self._obj_with_exclusions,
+                                   grouper=self.grouper)
+            new.obj = getattr(new, fill_method)(limit=limit)
+            new._reset_cache()


Can you give me an example of where these three lines differ from just doing self.ffill() (I'm assuming fill_method=='ffill' and limit==1)?

Haven't seen any other aggregation function have to make these types of calls so I'm still confused as to why this is necessary. A concrete example would be immensely helpful

WillAyd · 2018-06-26T03:24:29Z

pandas/tests/groupby/test_transform.py

-        return grp_obj.pct_change(periods=periods,
-                                  fill_method=fill_method,
-                                  limit=limit)
+    exp = grp['vals'].shift(0) / grp['vals'].shift(periods) - 1


Do you mind changing exp to expected while you are touching this? The latter is a quasi-standard we've been applying to tests

I'll change exp to expected, and wait for a response on the other comment before updating. I'm still active on this, apologies for the delay in responding to your helpful comments. Thanks!

WillAyd

OK I think this is in a good enough state with appropriate follow ups documented. @jreback over to you

jreback

looks ok. this needs a whatnew entry. I think its important enough to warrant a sub-section, with a small example of what it was doing before and the correct result. you an put it in right below the grouping section in Bug Fixes I think is ok.

simonariddell · 2018-12-02T22:15:11Z

Added the whatsnew. I checked for previous examples in the docs to follow style guidelines, but please let me know if it's too verbose or I made a stylistic mistake, and I'll quickly fix it. Thanks.

jreback · 2018-12-03T00:25:58Z

doc/source/whatsnew/v0.23.5.txt

@@ -42,7 +42,33 @@ Bug Fixes
 **Groupby/Resample/Rolling**

 - Bug in :meth:`DataFrame.resample` when resampling ``NaT`` in ``TimeDeltaIndex`` (:issue:`13223`).
-
+- Bug where calling :func:`SeriesGroupBy.pct_change` or :func:`DataFrameGroupBy.pct_change` would ignore groups when calculating the percent change (:issue:`21235`)


move this entire note to a sub-section in Backwards incompatible API changes.

jreback · 2018-12-03T00:26:10Z

doc/source/whatsnew/v0.23.5.txt

+.. ipython:: python
+
+   df = pd.DataFrame({'a': [1.0, 1.1, 2.2], 'grp': ['a', 'a', 'b']})
+


show the df, with 'df'

jreback · 2018-12-03T00:26:28Z

doc/source/whatsnew/v0.23.5.txt

+
+Grouping respected
+
+.. code-block:: ipython


use an ipython block (it will run the code)

Fixed. Didn't properly read contributor documentation on how to do documentation -- sorry for the churn.

WillAyd

We actually aren't doing a v0.23.5 release anymore. If you can merge master and move the whatsnew to v0.24 would be great

jreback · 2018-12-09T14:28:17Z

doc/source/whatsnew/v0.24.0.rst

@@ -378,6 +378,29 @@ Backwards incompatible API changes
 - Passing scalar values to :class:`DatetimeIndex` or :class:`TimedeltaIndex` will now raise ``TypeError`` instead of ``ValueError`` (:issue:`23539`)
 - ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`)
 - :meth:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`)
+- Bug where calling :func:`SeriesGroupBy.pct_change` or :func:`DataFrameGroupBy.pct_change` would ignore groups when calculating the percent change (:issue:`21235`)


move this to a separate subsection

jreback · 2018-12-09T14:28:31Z

doc/source/whatsnew/v0.24.0.rst

@@ -378,6 +378,29 @@ Backwards incompatible API changes
 - Passing scalar values to :class:`DatetimeIndex` or :class:`TimedeltaIndex` will now raise ``TypeError`` instead of ``ValueError`` (:issue:`23539`)
 - ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`)
 - :meth:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`)
+- Bug where calling :func:`SeriesGroupBy.pct_change` or :func:`DataFrameGroupBy.pct_change` would ignore groups when calculating the percent change (:issue:`21235`)
+
+**New behavior**:


follow the formatting of other subsections

jreback · 2018-12-09T14:28:36Z

doc/source/whatsnew/v0.24.0.rst

+
+.. ipython:: python
+
+   import pandas as pd


jreback · 2018-12-09T14:28:45Z

doc/source/whatsnew/v0.24.0.rst

+   df.groupby('grp').pct_change()
+
+
+**Previous behavior**:


previous first

doc/source/whatsnew/v0.24.0.rst

jreback

pls merge master as well. small changes. ping on green.

jreback · 2018-12-11T19:47:51Z

doc/source/whatsnew/v0.24.0.rst

@@ -379,6 +379,33 @@ Backwards incompatible API changes
 - ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`)
 - :meth:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`)

+Percentage change called on groupby now returns correct values


call this 'Percentage change on groupby changes'

jreback · 2018-12-11T19:49:09Z

doc/source/whatsnew/v0.24.0.rst

+Percentage change called on groupby now returns correct values
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Fixed a bug where calling :func:`SeriesGroupBy.pct_change` or :func:`DataFrameGroupBy.pct_change` would ignore groups when calculating the percent change (:issue:`21235`).


Can you add a tiny bit more here. A user reading this will want to know what is changing, IOW the groupby percentage worked across groups, now it correctly works per group.

jreback · 2018-12-11T19:51:37Z

pandas/tests/groupby/test_transform.py

+    # GH  21200, 21621
+    if freq == 'D':
+        pytest.xfail("'freq' test not necessary until #23918 completed and"
+                     "freq is used in the vectorized approach")


you could put the xxfail inthe parametrize, something like this format

@pytest.mark.parametrize("engine,ext", [ pytest.param('openpyxl', '.xlsx', marks=pytest.mark.skipif( not td.safe_import('openpyxl'), reason='No openpyxl')),

but of course its an xfail and not a skip and this is not an excel test :<

Would this actually fail or be non-performant compared to the other argument combinations? I think the latter and if so probably OK to just remove xfail / skip altogether; we have the TODO in the actual code in regards to the perf

It would be non-performant, but it wouldn't fail (well it technically would fail, but because the index in the test is not a date-time). Personally, I think it makes sense to make the index date-time, and not xfail this. Let me know if you agree, and I'll do that.

Ah OK. No I think that would make the test here more complicated than it needs to be; if you could follow the approach as outlined by @jreback would be preferable

I'm trying to understand this xfail and it seems like the behavior is correct bc, as @simonariddell mentioned, it should raise without a DatetimeIndex. what is the xfailed case supposed to be testing?

jreback · 2018-12-11T19:52:51Z

doc/source/whatsnew/v0.24.0.rst

+Percentage change called on groupby now returns correct values
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Fixed a bug where calling :func:`SeriesGroupBy.pct_change` or :func:`DataFrameGroupBy.pct_change` would ignore groups when calculating the percent change (:issue:`21235`).


can you also list both issues here

jreback · 2018-12-12T12:41:44Z

thanks @SimonAlecks

* upstream/master: pct change bug issue 21200 (pandas-dev#21235) DOC: Fix summaries not ending with a period (pandas-dev#24190) DOC: Add back currentmodule to api.rst (pandas-dev#24232) DOC: Fixed implicit imports for whatsnew (v >= version 20.0) (pandas-dev#24199) remove enum import for PY2 compat, xref pandas-dev#22802 (pandas-dev#24170) CI: Simplify Azure Pipelines configuration files (pandas-dev#23111) REF/TST: Add more pytest idiom to indexing/multiindex/test_getitem.py (pandas-dev#24053)

simonariddell · 2018-12-13T03:20:29Z

@jreback and @WillAyd thanks so much for your patience on my first PR!
Any suggestions for the next bug I tackle? I'd like to learn the index structure. (I'm perfectly capable of finding a bug to work on on my own, just curious if anything in particular that you have a preference for).

WillAyd · 2018-12-13T07:16:03Z

Since it came out of this PR #23918 would be a logical follow up

simonariddell added 2 commits May 28, 2018 15:33

pct change bug issue 21200

ee2c034

add whatsnew

47b5908

fix pep8 issues

41beb4a

jreback added Bug Groupby labels May 29, 2018

Removed vectorization from groupby pct change

25efd37

jreback requested changes May 30, 2018

View reviewed changes

try using cache arguments to keep vectorization

849fac4

WillAyd requested changes Jun 4, 2018

View reviewed changes

BUG,TST: Remove case where vectorization fails in pct_change groupby …

eaede34

…method, improve test. pandas-dev#21235

BUG,TST: Remove case where vectorization fails in pct_change groupby …

64097cc

…method. Incorporate CR suggestion.

WillAyd requested changes Jun 21, 2018

View reviewed changes

simonariddell added 3 commits June 23, 2018 18:35

BUG,TST: Remove case where vectorization fails in pct_change groupby …

fcd7183

…method. Incorporate CR suggestion.

Merge branch 'groupby_pctchange_bug' of https://github.com/SimonAleck…

e8681d6

…s/pandas into groupby_pctchange_bug

BUG,TST: Remove case where vectorization fails in pct_change groupby …

77c3935

…method. Incorporate CR suggestion.

mroeschke mentioned this pull request Jun 25, 2018

Groupby Pct_change Bug Report #21621

Closed

WillAyd requested changes Jun 26, 2018

View reviewed changes

WillAyd approved these changes Dec 2, 2018

View reviewed changes

jreback requested changes Dec 2, 2018

View reviewed changes

BUG,TST: GH21235 groupby pct_change whatsnew

a3c95cb

jreback requested changes Dec 3, 2018

View reviewed changes

BUG, TST: GH 21235 groupby pct_change whatsnew

e5289a2

WillAyd requested changes Dec 9, 2018

View reviewed changes

simonariddell added 3 commits December 9, 2018 00:30

Merge remote-tracking branch 'upstream/master'

1ccc507

merge master

7f4a244

BUG, TST: GH 21235 groupby pct_change whatsnew

c26cf60

WillAyd approved these changes Dec 9, 2018

View reviewed changes

jreback requested changes Dec 9, 2018

View reviewed changes

simonariddell added 2 commits December 9, 2018 14:45

BUG, TST: GH 21235 groupby pct_change whatsnew

877b433

Merge remote-tracking branch 'upstream/master'

c8b686e

jreback requested changes Dec 11, 2018

View reviewed changes

jreback reviewed Dec 11, 2018

View reviewed changes

simonariddell added 4 commits December 11, 2018 20:46

Merge remote-tracking branch 'upstream/master'

faf532e

Merge branch 'master' into groupby_pctchange_bug

6d87075

BUG, TST: GH 21235 groupby pct_change whatsnew

ad934b7

BUG, TST: GH 21235 groupby pct_change test parametrize

01d705f

jreback added this to the 0.24.0 milestone Dec 12, 2018

jreback approved these changes Dec 12, 2018

View reviewed changes

jreback merged commit 9ea2877 into pandas-dev:master Dec 12, 2018

simonariddell deleted the groupby_pctchange_bug branch December 15, 2018 07:54

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

pct change bug issue 21200 (pandas-dev#21235)

3498ee3

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

pct change bug issue 21200 (pandas-dev#21235)

7664fa6

		.. ipython:: python

		df = pd.DataFrame({'a': [1.0, 1.1, 2.2], 'grp': ['a', 'a', 'b']})

Uh oh!

pct change bug issue 21200 #21235

pct change bug issue 21200 #21235

Uh oh!

Conversation

simonariddell commented May 28, 2018 • edited by jreback Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pep8speaks commented May 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on November 27, 2018 at 06:03 Hours UTC

Uh oh!

WillAyd commented May 29, 2018

Uh oh!

simonariddell commented May 29, 2018

Uh oh!

codecov bot commented May 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

simonariddell commented May 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillAyd commented May 30, 2018

Uh oh!

simonariddell commented Jun 4, 2018

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonariddell commented Jun 9, 2018

Uh oh!

WillAyd commented Jun 10, 2018

Uh oh!

simonariddell commented Jun 19, 2018

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonariddell Jul 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonariddell Jul 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonariddell Jun 24, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simonariddell commented May 28, 2018 •

edited by jreback

Loading

pep8speaks commented May 28, 2018 •

edited

Loading

codecov bot commented May 30, 2018 •

edited

Loading

simonariddell commented May 30, 2018 •

edited

Loading

simonariddell Jul 22, 2018 •

edited

Loading

simonariddell Jul 29, 2018 •

edited

Loading

simonariddell Jun 24, 2018 •

edited

Loading

mroeschke commented Jun 25, 2018 •

edited

Loading

simonariddell Jul 22, 2018 •

edited

Loading