Skip to content

ENH: add dict to return type of func argument of DataFrame#apply #470

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 19, 2022

Conversation

skatsuta
Copy link
Contributor

@skatsuta skatsuta commented Dec 14, 2022

  • Closes #xxxx (Replace xxxx with the Github issue number)
  • Tests added: Please use assert_type() to assert the type of any return value

Problem

It is perfectly valid to pass a function with a return type of dict as the func argument to DataFrame#apply when result_type="expand", but the current type stub does not support it.

For example, the following code

def returns_dict(x: pd.Series) -> dict[str, int]:
    return {'x': 1, 'y': 2}

df.apply(returns_dict, axis=1, result_type="expand")

results in the following errors when type-checked by Pyright:

error: Argument of type "(x: Series[Unknown]) -> dict[str, int]" cannot be assigned to parameter "f" of type "(...) -> Series[Unknown]" in function "apply"
    Type "(x: Series[Unknown]) -> dict[str, int]" cannot be assigned to type "(...) -> Series[Unknown]"
      Function return type "dict[str, int]" is incompatible with type "Series[Unknown]"
        "dict[str, int]" is incompatible with "Series[Unknown]" (reportGeneralTypeIssues)
error: Argument of type "Literal['expand']" cannot be assigned to parameter "result_type" of type "Literal['reduce']" in function "apply"
    "Literal['expand']" cannot be assigned to type "Literal['reduce']" (reportGeneralTypeIssues)

Solution

Add dict to the return type of func argument of DataFrame#apply when result_type="expand".

Also, add tests for passing a function that returns a dict to DataFrame#apply in the cases where axis is 0 and 1, respectively.

@skatsuta skatsuta force-pushed the dataframe-apply-func-return-dict branch from c929706 to 77060c1 Compare December 14, 2022 13:21
@skatsuta skatsuta changed the title ENH: add dict to return type of func in DataFrame#apply when result_type="expand" ENH: add dict to return type of func argument of DataFrame#apply when result_type="expand" Dec 14, 2022
@gandhis1
Copy link
Contributor

gandhis1 commented Dec 14, 2022

I recently modified these apply overloads and the number of permutations of overloads is a lot higher now. So any changes here need to be very careful to make sure they apply new data types everywhere they need to be present.

The changes you made look good to me, I only have one question. During the last round of changes to apply() annotations, we decided that even if a particular combination of function return and return_type did not make logical sense, if pandas allowed it without errors, then we would continue to permit it in the annotations.

I cannot test right now, but does dict work for any other result_type value (including the default of None)?

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As suggested by @gandhis1 here: #470 (comment)

you should modify all of the overloads to return dict, and create tests for all of the various combinations of axis and result_type as is already present in the tests.

If some combination of axis and result_type doesn't work with dict as the return type, then you leave dict out as a possibility for the return type of the Callable in the f argument.

@skatsuta
Copy link
Contributor Author

skatsuta commented Dec 15, 2022

@gandhis1 You are the one who added those comprehensive overloads! Thanks for the hard work!
Will check other combinations and add support for them if they also work!

@Dr-Irv Thanks for your suggestion! Will add support for func returning dict to other overloads and also additional comprehensive test cases!

@skatsuta skatsuta changed the title ENH: add dict to return type of func argument of DataFrame#apply when result_type="expand" ENH: add dict to return type of func argument of DataFrame#apply Dec 18, 2022
Copy link
Contributor Author

@skatsuta skatsuta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dr-Irv I've added Mapping return type to all possible overloads, and comprehensive test cases for all axis and result_type parameter combinations, which are 2 x 4 = 8 assertions in total (2 axes and 4 result types).

I've also replaced Literal[None] with just None, and reformatted and reordered some existing assertions for readability and clarity, but none of them are removed.

Would appreciate if you could take a look when you have time, thanks!

Comment on lines -547 to -564
assert_type(
# Note that technically it does not make sense to pass a result_type of "broadcast" to a scalar return
df.apply(returns_scalar, result_type="broadcast"),
pd.DataFrame,
),
pd.DataFrame,
)
check(
assert_type(df.apply(returns_series, result_type="broadcast"), pd.DataFrame),
pd.DataFrame,
)
check(
assert_type(
# Can only broadcast a list-like of 2 elements, not 3, because there are 2 rows
df.apply(returns_listlike_of_2, result_type="broadcast"),
pd.DataFrame,
),
pd.DataFrame,
Copy link
Contributor Author

@skatsuta skatsuta Dec 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions are moved to the bottom so that test cases for result_type="broadcast" are grouped together.

result_type: Literal[None] = ...,
result_type: None = ...,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not necessarily have to be Literal[None], just None is OK, and so are others.

@skatsuta skatsuta requested a review from Dr-Irv December 18, 2022 13:23
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @skatsuta

@Dr-Irv Dr-Irv merged commit b83f687 into pandas-dev:main Dec 19, 2022
@skatsuta skatsuta deleted the dataframe-apply-func-return-dict branch January 11, 2023 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants