Applying on empty DataFrame returns different types #16621

gzcf · 2017-06-07T12:00:12Z

Code Sample

import pandas as pd

df = pd.DataFrame(columns = ['a', 'b'])

def foo(row):
    return True

def bar(row):
    row['a']
    return True

t = df.apply(bar, axis=1)
print(type(t))

t = df.apply(foo, axis=1)
print(type(t))

Problem description

When apply individual functions on the same empty DataFrame, it return different types.

<class 'pandas.core.frame.DataFrame'>
<class 'pandas.core.series.Series'>

That's wired since their only difference is row['a'] expression which is never executed.

When df is not empty, both return pd.Series.

Test on pandas 1.19.1 and latest version 0.20.2.

Expected Output

I expected it always return pd.Series

Output of `pd.show_versions()`

2017-06-07 19:52:12 [pip.vcs] DEBUG: Registered VCS backend: git
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: hg
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: svn
2017-06-07 19:52:13 [pip.vcs] DEBUG: Registered VCS backend: bzr

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 16.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: zh_CN.UTF-8
LOCALE: None.None

pandas: 0.20.2
pytest: None
pip: 7.1.0
setuptools: 18.0.1
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: 3.6.0
bs4: None
html5lib: None
sqlalchemy: 0.9.10
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2017-06-07T12:03:54Z

I suspect that bar raises a KeyError, which is caught inside .apply and sends it down a different code path. You're welcome to take a look at whats going on.

jreback · 2017-06-09T10:56:56Z

agree with @TomAugspurger might be excepting out of the inner loop. @gzcf if you want to investigate and see if you can make a fix that passes the test suite would be fine.

gzcf · 2017-06-09T11:29:22Z

I'm glad to help. Let me take some time to fix it.

gzcf · 2017-06-12T16:21:20Z

After I read related codes, I found this is a intended behavior. Check issue #2476 and _apply_empty_result in 'pandas/core/frame.py'.

    def _apply_empty_result(self, func, axis, reduce, *args, **kwds):
        if reduce is None:
            reduce = False
            try:
                reduce = not isinstance(func(_EMPTY_SERIES, *args, **kwds),
                                        Series)
            except Exception:
                pass

        if reduce:
            return Series(NA, index=self._get_agg_axis(axis))
        else:
            return self.copy()

Look, these code will try guessing return type by calling func an empty Series. I don't think this is a good implementation. it's bad to except Exception, it will swallow all exceptions. At many cases, calling func with an empty Series will raise KeyError. But I am new to pandas source, I am not sure what's next to do.

There are some choices:

Default to Series without guessing type, maybe give some warning message meanwhile
Default to DataFrame...
Don't change this behavior
Let it fail and raise exception to user. This is original behavior.

Please give me some advice.

Mega-Tom · 2020-06-15T15:57:15Z

I think the correct resolution is to use a series with data types from the data frame, instead of _EMPTY_SERIES. Just using simple default values for each datatype should make it much less likely to raise an exception.

TomAugspurger added the Dtype Conversions Unexpected or buggy dtype conversions label Jun 7, 2017

jreback added Bug Difficulty Intermediate Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 9, 2017

jreback added this to the Next Major Release milestone Jun 9, 2017

jbrockmendel removed Difficulty Intermediate labels Oct 21, 2019

fluggo mentioned this issue May 26, 2021

BUG: DataFrame.agg produces different types if the DataFrame is empty #41672

Open

3 tasks

mroeschke added Apply Apply, Aggregate, Transform, Map and removed Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Jun 12, 2021

mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Applying on empty DataFrame returns different types #16621

Applying on empty DataFrame returns different types #16621

gzcf commented Jun 7, 2017 •

edited by mroeschke

Loading

TomAugspurger commented Jun 7, 2017

jreback commented Jun 9, 2017

gzcf commented Jun 9, 2017

gzcf commented Jun 12, 2017

Mega-Tom commented Jun 15, 2020 •

edited

Loading

Applying on empty DataFrame returns different types #16621

Applying on empty DataFrame returns different types #16621

Comments

gzcf commented Jun 7, 2017 • edited by mroeschke Loading

Code Sample

Problem description

Expected Output

Output of pd.show_versions()

TomAugspurger commented Jun 7, 2017

jreback commented Jun 9, 2017

gzcf commented Jun 9, 2017

gzcf commented Jun 12, 2017

Mega-Tom commented Jun 15, 2020 • edited Loading

gzcf commented Jun 7, 2017 •

edited by mroeschke

Loading

Output of `pd.show_versions()`

Mega-Tom commented Jun 15, 2020 •

edited

Loading