Skip to content

BUG: Fixed bug in groupby.std changing target column when as_index=False #11085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

terrytangyuan
Copy link
Contributor

Fixed #10355

@jreback
Copy link
Contributor

jreback commented Sep 13, 2015

this is not going to be oerformant

is there a reason you didn't follow the soln from the issue?

@terrytangyuan
Copy link
Contributor Author

I think the code structure changed somewhat. I couldn't figure out how to
use it.
On Sep 12, 2015 8:54 PM, "Jeff Reback" [email protected] wrote:

this is not going to be oerformant

is there a reason you didn't follow the soln from the issue?


Reply to this email directly or view it on GitHub
#11085 (comment).

@terrytangyuan
Copy link
Contributor Author

They are refactored to _cython_functions I guess. I tried to do similar thing using cython but failed. Maybe this is good for now and I'll figure out how to do it in a separate PERF PR some time later?

@jreback
Copy link
Contributor

jreback commented Sep 13, 2015

no that will just change it again

@terrytangyuan
Copy link
Contributor Author

@jreback any updated soln for the soln mentioned in issue? I tried similar way and failed, maybe I missed something. Thanks.

@jreback
Copy link
Contributor

jreback commented Sep 13, 2015

you can just do the .var, then the np.sqrt only applies to the action columns.

…lse (issue 10355) (+1 squashed commit)

Squashed commits:
[f222db1] BUG: Fixed bug in groupby.std changing target column when as_index=False (issue 10355)
@terrytangyuan
Copy link
Contributor Author

@jreback Could you take a look at the new push? Did I get what you meant?

@jreback
Copy link
Contributor

jreback commented Sep 13, 2015

no u can't use the lamba
that goes to a python path

compute var then apply the sqrt

@terrytangyuan
Copy link
Contributor Author

How do I apply it only to action columns without lambda?

@terrytangyuan
Copy link
Contributor Author

Since the result of applying .var would be a DataFrame and it lost its original groups, which is why the original implementation causes this bug.

@jreback
Copy link
Contributor

jreback commented Sep 13, 2015

this needs a similar approach to what I suggest (and like what .first does)

@terrytangyuan
Copy link
Contributor Author

So I removed the original definition of std() and the diff is as follows:

@@ -828,6 +828,8 @@ class GroupBy(PandasObject):
                               numeric_only=False, _convert=True)
     last = _groupby_function('last', 'last', _last_compat, numeric_only=False,
                              _convert=True)
+    def std(self):
+        return self._cython_agg_general('std')

     def ohlc(self):
         """
@@ -1485,6 +1487,8 @@ class BaseGrouper(object):
             'f': lambda func, a, b, c, d: func(a, b, c, d, 1)
         },
         'last': 'group_last',
+        'std': {'name' : 'group_var',
+                'f' : lambda func, a: np.sqrt(func(a))}
     }

Still couldn't get it working. (tried other similar things too) The code structure of groupby.py must have changed a lot since the issue has been posted. I think this PR at least fixed the bug. I won't be the best person to fix the perf issues.

@jreback
Copy link
Contributor

jreback commented Sep 14, 2015

then let's close this for now. this is non performant.

@xieyuheng
Copy link

#25315

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants