DataFrame apply change return type from Series to DataFrame if result is empty #3698

asmirnov69 · 2013-05-27T20:44:10Z

pd.version.version ## return '0.11.0'
x = pd.DataFrame({'a': range(10), 'b': range(10)})
type(x.apply(lambda x: x['a'] + x['b'], 1)) # <class 'pandas.core.series.Series'>
x['c'] = x.apply(lambda x: x['a'] + x['b'], 1) ## works

x = x[x['a'] < 0]
type(x.apply(lambda x: x['a'] + x['b'], 1)) # <class 'pandas.core.frame.DataFrame'>
x['c'] = x.apply(lambda x: x['a'] + x['b'], 1) ## FAILS

Code above is the problem for quite common usage of apply to create new dataframe column using func applyed on row. It fails for empty dataframe and works for non-empty one.

jreback · 2013-05-27T21:25:01Z

this is in conflict with #2476. The issue is how to determine if a user supplied lambda is a reduction or not. When you have an empty frame this fails, so we return as if its not a reduction

e.g. will do what you want in this case

x.apply(lambda x: x.get('a',0) + x.get('b',0), 1)

asmirnov69 · 2013-05-27T22:03:44Z

I didn't realize that x in lambda expression in apply is Series. I will use your way of access to row values from now on.

Only suggestion would be to introduce new type to represent the row instead using of Series.
In row handling context [] operator is misleading -- IMHO without it access to row element will be less confusing for users not familiar with pandas code base.

jreback · 2013-05-28T00:16:47Z

actually, I think your suggestion is more confusing :), when you use axis=1, (that is the 1 in your expression), you are saying give me a row of the frame as a Series, by definition. You are then free in your function to do what you want, however, sometimes it is best to forgo the custom function,

e.g (and this is much faster)

In [16]: x.apply(lambda z: z['a'] + z['b'],1)
Out[16]: 
0     0
1     2
2     4
3     6
4     8
5    10
6    12
7    14
8    16
9    18
dtype: int64

In [17]: x[['a','b']].sum(1) 
Out[17]: 
0     0
1     2
2     4
3     6
4     8
5    10
6    12
7    14
8    16
9    18
dtype: int64

asmirnov69 · 2013-05-28T15:04:33Z

After getting a bit more in pandas method docs and examples I would retract my suggestion about introducing any changes into DataFrame.apply(). Also, in the light of my newly acquired knowledge, your workaround for my original problem using Series.get with default 0 will not work. This is because there are situations when there is no convenient default. Also it may mask typo in column name which will be hard to track down.

One more illustration of the same problem is below. As you can see return type of apply changes depending on condition applied to original dataframe. Does it look like bug? Workaround in my case is to use wrapper around apply call which I am going to do in my code.

>>> import pandas as pd
>>> pd.version.version
'0.11.0'
>>> df = pd.DataFrame({'a': range(10), 'b': range(10)})
>>> type(df[df['a']>0].apply(lambda x: x['a'] + x['b'], 1))
<class 'pandas.core.series.Series'>
>>> type(df[df['a']<0].apply(lambda x: x['a'] + x['b'], 1))
<class 'pandas.core.frame.DataFrame'>

jreback · 2013-05-28T15:26:11Z

you can do a lot of things inside the apply, e.g. lambda x: x.get('a',np.nan) + x.get('b',np.nan), but logic applies (pun intended).

This is probably not the best way to do it for the reason you indicate. The example you gave is not a bug, but a defined behavior.

asmirnov69 · 2013-05-28T16:28:10Z

So it is a feature, not a bug :)
Well, i think it is fine provided it is documented in 'Caveats and Gotchas' section at least.
As well as some stackoverflow readers might find it helpful: http://stackoverflow.com/questions/13331698/how-to-apply-a-function-to-two-columns-of-pandas-dataframe

Thanks for the discussion, I've learn a bit more of pandas -- hope never get back to R again.

jreback · 2013-05-28T16:37:13Z

yes, its a bit undefined what you do with the empties

would always welcome a doc PR!

jreback · 2013-06-04T15:36:40Z

going to move this to 0.12 for doc update in groupby, as @asmirnov69 suggest above

jreback · 2016-02-17T13:38:49Z

the reduce kw can be specified to deal with this

jreback modified the milestones: 0.15.0, 0.14.0 Feb 26, 2014

jreback modified the milestones: 0.16.0, Next Major Release Mar 3, 2015

jreback closed this as completed Feb 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame apply change return type from Series to DataFrame if result is empty #3698

DataFrame apply change return type from Series to DataFrame if result is empty #3698

asmirnov69 commented May 27, 2013

jreback commented May 27, 2013

asmirnov69 commented May 27, 2013

jreback commented May 28, 2013

asmirnov69 commented May 28, 2013

jreback commented May 28, 2013

asmirnov69 commented May 28, 2013

jreback commented May 28, 2013

jreback commented Jun 4, 2013

jreback commented Feb 17, 2016

DataFrame apply change return type from Series to DataFrame if result is empty #3698

DataFrame apply change return type from Series to DataFrame if result is empty #3698

Comments

asmirnov69 commented May 27, 2013

jreback commented May 27, 2013

asmirnov69 commented May 27, 2013

jreback commented May 28, 2013

asmirnov69 commented May 28, 2013

jreback commented May 28, 2013

asmirnov69 commented May 28, 2013

jreback commented May 28, 2013

jreback commented Jun 4, 2013

jreback commented Feb 17, 2016