Skip to content

BUG: Inconsistent return type of df.apply on empty df #50588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
adriantre opened this issue Jan 5, 2023 · 3 comments
Closed
2 of 3 tasks

BUG: Inconsistent return type of df.apply on empty df #50588

adriantre opened this issue Jan 5, 2023 · 3 comments
Labels
Apply Apply, Aggregate, Transform, Map Bug Closing Candidate May be closeable, needs more eyeballs Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@adriantre
Copy link

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
from pandas.testing import assert_series_equal

# non-empty
df = pd.DataFrame([1], columns=["col1"])
res1 = df.apply(lambda r: r.col1, axis=1)  # series
res2 = df.apply(lambda r: int(r.col1), axis=1)  # series
assert_series_equal(res1, res2)
# -> OK!

# empty
df_empty = pd.DataFrame(columns=["col1"])
res3 = df_empty.apply(lambda r: r.col1, axis=1)  # empty series
res4 = df_empty.apply(lambda r: int(r.col1), axis=1)  # empty dataframe
assert_series_equal(res3, res4)
# -> AssertionError: Series Expected type <class 'pandas.core.series.Series'>,
#    found <class 'pandas.core.frame.DataFrame'> instead

# Some use-case where you need to create a new series
pd.Series(res3)  # Works as expected
pd.Series(res4)  # Raises
# -> ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Issue Description

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../python3.9/site-packages/pandas/core/series.py", line 386, in __init__
    if is_empty_data(data) and dtype is None:
  File "/../python3.9/site-packages/pandas/core/construction.py", line 877, in is_empty_data
    is_simple_empty = is_list_like_without_dtype and not data
  File "/.../python3.9/site-packages/pandas/core/generic.py", line 1527, in __nonzero__
    raise ValueError(
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Expected Behavior

I would expect the result to be consistent, i.e. always return a series.

Installed Versions

commit : 8dab54d
python : 3.9.6.final.0
python-bits : 64
OS : Darwin
OS-release : 21.6.0
Version : Darwin Kernel Version 21.6.0: Mon Aug 22 20:17:10 PDT 2022; root:xnu-8020.140.49~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.UTF-8

pandas : 1.5.2
numpy : 1.23.4
pytz : 2022.7
dateutil : 2.8.1
setuptools : 56.0.0
pip : 22.3.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : 2.9.1
jinja2 : 3.1.2
IPython : None
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : 2022.11.0
gcsfs : None
matplotlib : 3.6.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.3
snappy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : 2022.6

@adriantre adriantre added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 5, 2023
@rhshadrach
Copy link
Member

Thanks for the report; this particular code path attempts to discern what shape to return by calling the passed function (your lambda, in this case) and using the result. Here, int(r.col1) fails when the DataFrame is empty. With no information about what the function does to the input, we have no way of determining what shape (e.g. Series vs DataFrame) to return. In such a scenario we return the input.

I think this is a duplicate of #47959.

@rhshadrach rhshadrach added Apply Apply, Aggregate, Transform, Map Closing Candidate May be closeable, needs more eyeballs labels Jan 5, 2023
@adriantre
Copy link
Author

I see, that makes sense. The easiest fix will then be to always check for empty frames, and specify a desired return value/type with an early return. I find that we do this a lot, but that is maybe too much to ask of the general user.

Could this be a kwarg to the apply function? df.apply(my_func, axis=1, result_when_empty=pd.Series(...))

The result_when_empty should allow for templating columns when it is a pd.DataFrame.

@rhshadrach
Copy link
Member

I added some thoughts on this in #47959 (comment); closing here to keep the discussion consolidated. Please join the discussion there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Apply Apply, Aggregate, Transform, Map Bug Closing Candidate May be closeable, needs more eyeballs Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

2 participants