Skip to content

DataFrame.apply stub doesn't reflect default value of axis parameter #393

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
debonte opened this issue Oct 18, 2022 · 6 comments
Closed

Comments

@debonte
Copy link

debonte commented Oct 18, 2022

Describe the bug
If you call DataFrame.apply and pass parameters beyond f but not including axis, you'll get the following error from Pyright:

No overloads for "apply" match the provided arguments

To Reproduce

import pandas as pd

def gethead(s, y):
    return s.head(y)

df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6], "B": [11, 12, 13, 14, 15, 16]})

df2 = df.apply(gethead, args=tuple([4]))  # error

Pylance gives the following error on the apply call:

No overloads for "apply" match the provided arguments
  Argument types: ((s: Unknown, y: Unknown) -> Unknown, tuple[int, ...])Pylance[reportGeneralTypeIssues](https://github.com/microsoft/pylance-release/blob/main/DIAGNOSTIC_SEVERITY_RULES.md#diagnostic-severity-rules)

Please complete the following information:

  • OS: Windows
  • OS Version: 10.0.25197 Build 25197
  • python version: 3.10.0
  • version of type checker: Pylance 2022.10.21
  • version of installed pandas-stubs: 1.5.0.220926

Additional context
Originally reported by @jonmooser at microsoft/pylance-release#3491

I was originally thinking that the fix was to indicate in the stubs that axis has a default value (axis: AxisType = ...), but then the two overloads of apply overlap with each other. It was looking like the correct fix would require more knowledge of the innards of apply and Pandas in general than I have.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 19, 2022

The trick here is that the function gethead() has to be properly typed to return a Series, and if default arguments are used, then we should return a DataFrame when we know that the function returns a Series. See the description of the result_type argument which describes the default behavior.

So for your example to work, it would have to change, and we have to add an overload for apply()

@jonmooser
Copy link

See the description of the result_type argument which describes the default behavior.

Dr-Irv,
Could you clarify the significance of the result_type argument? The doc says that only applies when axis=1. In my example axis=0 (implicitly)

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 19, 2022

Dr-Irv, Could you clarify the significance of the result_type argument? The doc says that only applies when axis=1. In my example axis=0 (implicitly)

I think (but not 100% sure) that the comment about axis=1 means that the values that are not None only take effect when axis=1, but do nothing when axis=0. But the default value of result_type==None does change the type of the result independent of the value of axis.

@jonmooser
Copy link

I think (but not 100% sure) that the comment about axis=1 means that the values that are not None only take effect when axis=1, but do nothing when axis=0. But the default value of result_type==None does change the type of the result independent of the value of axis.

Based on some experimenting, that's correct. If the passed function returns a scalar, apply() will create a Series. If the function returns something list-like, apply() will create a DataFrame.

But I think the underlying problem here may be a bit different. None of the overloads allows for an args param with no axis param.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 19, 2022

But I think the underlying problem here may be a bit different. None of the overloads allows for an args param with no axis param.

We need to have a new overload, where the return type is based on whether the function f is Callable[..., ScalarT] or Callable[..., list-like], where list-like is appropriately defined.

danielroseman added a commit to danielroseman/pandas-stubs that referenced this issue Oct 27, 2022
Dr-Irv pushed a commit that referenced this issue Oct 27, 2022
* Series.value_counts returns Series[int].

* Series.apply callable result might not be hashable

It's possible for the callable in Series.apply to return something
non-hashable like a list, but the result of apply should still be a
Series.

* More detailed typing for DataFrame.apply.

Whether it returns a Series or a DataFrame depends on the return type of
the callable. In the case of the callable returning a scalar, the result
is a Series unless the result_type is "broadcast".

* Add test for #393.
@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Oct 27, 2022

fixed in #401

@Dr-Irv Dr-Irv closed this as completed Oct 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants