Skip to content

Applying on empty DataFrame returns different types #16621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gzcf opened this issue Jun 7, 2017 · 5 comments
Open

Applying on empty DataFrame returns different types #16621

gzcf opened this issue Jun 7, 2017 · 5 comments
Labels
Apply Apply, Aggregate, Transform, Map Bug

Comments

@gzcf
Copy link

gzcf commented Jun 7, 2017

Code Sample

import pandas as pd

df = pd.DataFrame(columns = ['a', 'b'])

def foo(row):
    return True

def bar(row):
    row['a']
    return True

t = df.apply(bar, axis=1)
print(type(t))

t = df.apply(foo, axis=1)
print(type(t))

Problem description

When apply individual functions on the same empty DataFrame, it return different types.

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>

That's wired since their only difference is row['a'] expression which is never executed.

When df is not empty, both return pd.Series.

Test on pandas 1.19.1 and latest version 0.20.2.

Expected Output

I expected it always return pd.Series

Output of pd.show_versions()

2017-06-07 19:52:12 [pip.vcs] DEBUG: Registered VCS backend: git
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: hg
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: svn
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: bzr

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: zh_CN.UTF-8
LOCALE: None.None

pandas: 0.20.2
pytest: None
pip: 7.1.0
setuptools: 18.0.1
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: None
html5lib: None
sqlalchemy: 0.9.10
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None
@TomAugspurger
Copy link
Contributor

I suspect that bar raises a KeyError, which is caught inside .apply and sends it down a different code path. You're welcome to take a look at whats going on.

@TomAugspurger TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Jun 7, 2017
@jreback
Copy link
Contributor

jreback commented Jun 9, 2017

agree with @TomAugspurger might be excepting out of the inner loop. @gzcf if you want to investigate and see if you can make a fix that passes the test suite would be fine.

@jreback jreback added Bug Difficulty Intermediate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 9, 2017
@jreback jreback added this to the Next Major Release milestone Jun 9, 2017
@gzcf
Copy link
Author

gzcf commented Jun 9, 2017

I'm glad to help. Let me take some time to fix it.

@gzcf
Copy link
Author

gzcf commented Jun 12, 2017

After I read related codes, I found this is a intended behavior. Check issue #2476 and _apply_empty_result in 'pandas/core/frame.py'.

    def _apply_empty_result(self, func, axis, reduce, *args, **kwds):
        if reduce is None:
            reduce = False
            try:
                reduce = not isinstance(func(_EMPTY_SERIES, *args, **kwds),
                                        Series)
            except Exception:
                pass

        if reduce:
            return Series(NA, index=self._get_agg_axis(axis))
        else:
            return self.copy()

Look, these code will try guessing return type by calling func an empty Series. I don't think this is a good implementation. it's bad to except Exception, it will swallow all exceptions. At many cases, calling func with an empty Series will raise KeyError. But I am new to pandas source, I am not sure what's next to do.

There are some choices:

  • Default to Series without guessing type, maybe give some warning message meanwhile
  • Default to DataFrame...
  • Don't change this behavior
  • Let it fail and raise exception to user. This is original behavior.

Please give me some advice.

@Mega-Tom
Copy link

Mega-Tom commented Jun 15, 2020

I think the correct resolution is to use a series with data types from the data frame, instead of _EMPTY_SERIES. Just using simple default values for each datatype should make it much less likely to raise an exception.

@mroeschke mroeschke added Apply Apply, Aggregate, Transform, Map and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Jun 12, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug
Projects
None yet
Development

No branches or pull requests

6 participants