-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
DataFrame apply change return type from Series to DataFrame if result is empty #3698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is in conflict with #2476. The issue is how to determine if a user supplied lambda is a reduction or not. When you have an empty frame this fails, so we return as if its not a reduction e.g. will do what you want in this case
|
I didn't realize that x in lambda expression in apply is Series. I will use your way of access to row values from now on. Only suggestion would be to introduce new type to represent the row instead using of Series. |
actually, I think your suggestion is more confusing :), when you use e.g (and this is much faster)
|
After getting a bit more in pandas method docs and examples I would retract my suggestion about introducing any changes into DataFrame.apply(). Also, in the light of my newly acquired knowledge, your workaround for my original problem using Series.get with default 0 will not work. This is because there are situations when there is no convenient default. Also it may mask typo in column name which will be hard to track down. One more illustration of the same problem is below. As you can see return type of apply changes depending on condition applied to original dataframe. Does it look like bug? Workaround in my case is to use wrapper around apply call which I am going to do in my code. >>> import pandas as pd >>> pd.version.version '0.11.0' >>> df = pd.DataFrame({'a': range(10), 'b': range(10)}) >>> type(df[df['a']>0].apply(lambda x: x['a'] + x['b'], 1)) <class 'pandas.core.series.Series'> >>> type(df[df['a']<0].apply(lambda x: x['a'] + x['b'], 1)) <class 'pandas.core.frame.DataFrame'> |
you can do a lot of things inside the apply, e.g. This is probably not the best way to do it for the reason you indicate. The example you gave is not a bug, but a defined behavior. |
So it is a feature, not a bug :) Thanks for the discussion, I've learn a bit more of pandas -- hope never get back to R again. |
yes, its a bit undefined what you do with the empties would always welcome a doc PR! |
going to move this to 0.12 for doc update in groupby, as @asmirnov69 suggest above |
the |
pd.version.version ## return '0.11.0'
x = pd.DataFrame({'a': range(10), 'b': range(10)})
type(x.apply(lambda x: x['a'] + x['b'], 1)) # <class 'pandas.core.series.Series'>
x['c'] = x.apply(lambda x: x['a'] + x['b'], 1) ## works
x = x[x['a'] < 0]
type(x.apply(lambda x: x['a'] + x['b'], 1)) # <class 'pandas.core.frame.DataFrame'>
x['c'] = x.apply(lambda x: x['a'] + x['b'], 1) ## FAILS
Code above is the problem for quite common usage of apply to create new dataframe column using func applyed on row. It fails for empty dataframe and works for non-empty one.
The text was updated successfully, but these errors were encountered: