Adding lambda support inside of getitem for DataFrame, Series, .. etc. #2560

spearsem · 2012-12-18T20:02:50Z

To avoid the verbose syntax currently needed to select across many columns of a data frame, here's a suggestion. Inside of DataFrame's getitem function, make some special case logic to handle the case where a lambda function is passed in. If a lambda is passed in, then apply it to the dataframe itself and attempt to get the items based on the lambda's result.

Here's an example of what I mean. Suppose that I create a data frame named "dfrm" and it has columns A, B, C, D, and E. Then currently, the following syntax will work to sub-select across conditions on the A and B columns:

dfrm[(lambda x: (x.A < 0) & (x.B > 0))(dfrm)]

By adding the extra handling to getitem, you can remove the need for the the last set of parentheses where dfrm itself is passed as the argument to the lambda. getitem can check for a callable and just always pass itself to the callable, so that the syntax would look like this:

dfrm[(lambda x: (x.A < 0) & (x.B > 0))]

ghost · 2012-12-18T23:16:16Z

I'm surprised this doesn't work:

dfrm.ix[(drfm.A < 0) and (drfm.B > 0)]

because and'ing two arrays together is not currently a vector operation,
perhaps it should in this case.

jreback · 2012-12-18T23:27:03Z

does the following do what you want?

In [3]: df = pd.DataFrame(np.random.randn(20,3),columns=['A','B','C'])

In [4]: df
Out[4]: 
           A         B         C
0   0.334712 -0.557606  0.344016
1   0.549630 -0.264684 -0.916011
2   1.655768 -0.908992 -0.063336
3  -0.142142  0.259900 -0.260913
4   2.160908  0.239873  0.321448
5   1.650202  0.077349 -0.068250
6   0.354457  0.530161 -1.758845
7  -0.803534  0.015683  0.424979
8  -1.436670 -1.168130  0.222747
9   1.525383  0.363306 -0.192263
10  0.069851  0.850365  1.741803
11 -0.515722  1.348962 -0.375264
12 -0.204887  1.114886 -0.928263
13  0.612595  1.547913  0.336282
14 -0.780298  0.926265 -0.006614
15  1.213962  1.618504  0.133741
16  0.870338  0.146988  2.189953
17 -2.041328  1.338305 -0.129272
18  0.014687  1.048986 -1.525997
19 -1.147067  0.379734 -1.331019

In [5]: df[(df.A < 0) & (df.B > 0)]
Out[5]: 
           A         B         C
3  -0.142142  0.259900 -0.260913
7  -0.803534  0.015683  0.424979
11 -0.515722  1.348962 -0.375264
12 -0.204887  1.114886 -0.928263
14 -0.780298  0.926265 -0.006614
17 -2.041328  1.338305 -0.129272
19 -1.147067  0.379734 -1.331019

spearsem · 2012-12-19T00:01:26Z

That does it, but my goal with the lambdas is specifically to avoid needing to verbosely type out the dataframe's name -dot- attr_name for all the columns involved in the selection. The lambda at least lets me reduce it just to "x".

As for "and" not working where & works, this is just a known limitation of Python's for using the built in logical operators on arrays. It gives the classic "truth value of an array is undefined" error.

On Dec 18, 2012, at 6:27 PM, jreback [email protected] wrote:

does the following do what you want?

In [3]: df = pd.DataFrame(np.random.randn(20,3),columns=['A','B','C'])

In [4]: df
Out[4]:
A B C
0 0.334712 -0.557606 0.344016
1 0.549630 -0.264684 -0.916011
2 1.655768 -0.908992 -0.063336
3 -0.142142 0.259900 -0.260913
4 2.160908 0.239873 0.321448
5 1.650202 0.077349 -0.068250
6 0.354457 0.530161 -1.758845
7 -0.803534 0.015683 0.424979
8 -1.436670 -1.168130 0.222747
9 1.525383 0.363306 -0.192263
10 0.069851 0.850365 1.741803
11 -0.515722 1.348962 -0.375264
12 -0.204887 1.114886 -0.928263
13 0.612595 1.547913 0.336282
14 -0.780298 0.926265 -0.006614
15 1.213962 1.618504 0.133741
16 0.870338 0.146988 2.189953
17 -2.041328 1.338305 -0.129272
18 0.014687 1.048986 -1.525997
19 -1.147067 0.379734 -1.331019

In [5]: df[(df.A < 0) & (df.B > 0)]
Out[5]:
A B C
3 -0.142142 0.259900 -0.260913
7 -0.803534 0.015683 0.424979
11 -0.515722 1.348962 -0.375264
12 -0.204887 1.114886 -0.928263
14 -0.780298 0.926265 -0.006614
17 -2.041328 1.338305 -0.129272
19 -1.147067 0.379734 -1.331019
—
Reply to this email directly or view it on GitHub.

cpcloud · 2013-07-29T05:01:53Z

this is addressed by #4164

cpcloud mentioned this issue Jul 29, 2013

ENH: add expression evaluation functionality via eval #4164

Merged

64 tasks

cpcloud closed this as completed in #4164 Sep 16, 2013

jreback mentioned this issue Jan 17, 2015

ENH: Allow function in Series/DataFrame get syntax #9279

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding lambda support inside of getitem for DataFrame, Series, .. etc. #2560

Adding lambda support inside of getitem for DataFrame, Series, .. etc. #2560

spearsem commented Dec 18, 2012

ghost commented Dec 18, 2012

jreback commented Dec 18, 2012

spearsem commented Dec 19, 2012

cpcloud commented Jul 29, 2013

Adding lambda support inside of __getitem__ for DataFrame, Series, .. etc. #2560

Adding lambda support inside of __getitem__ for DataFrame, Series, .. etc. #2560

Comments

spearsem commented Dec 18, 2012

ghost commented Dec 18, 2012

jreback commented Dec 18, 2012

spearsem commented Dec 19, 2012

cpcloud commented Jul 29, 2013

Adding lambda support inside of getitem for DataFrame, Series, .. etc. #2560

Adding lambda support inside of getitem for DataFrame, Series, .. etc. #2560