Skip to content

API: Signature of UDF methods #40112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rhshadrach opened this issue Feb 27, 2021 · 6 comments
Open

API: Signature of UDF methods #40112

rhshadrach opened this issue Feb 27, 2021 · 6 comments
Labels
API - Consistency Internal Consistency of API/Behavior API Design Apply Apply, Aggregate, Transform, Map Enhancement Needs Discussion Requires discussion from core team before further action

Comments

@rhshadrach
Copy link
Member

There are many signatures of pandas methods that take a UDF. If I've missed methods, please let me know and will update.

Below, other_arg is a placeholder for 1 or more keyword arguments, and value is a placeholder for the default value.

Signature: func, other_arg=value, args=value, **kwargs

Signature: func=None, other_arg=value, *args, **kwargs

Signature: func, other_arg=value, *args, **kwargs

Signature: func, *args, other_arg=Value, **kwargs

Signature: func=None, *args, other_arg=Value, **kwargs

Signature: func, *args, **kwargs

Signature: arg, *args, **kwargs

@rhshadrach rhshadrach added Needs Discussion Requires discussion from core team before further action Apply Apply, Aggregate, Transform, Map API - Consistency Internal Consistency of API/Behavior labels Feb 27, 2021
@rhshadrach
Copy link
Member Author

rhshadrach commented Feb 27, 2021

Two thoughts

  • Renaming arg to func in Resampler.transform is noncontroversial.
  • IMO, the most preferred signature is func=None, *args, other_arg=value, **kwargs (where other_arg=value may not occur). With func=None, special handling will need to happen for methods that don't take a dictionary of UDFs as **kwargs (or that feature could be implemented).

As many of these methods are very heavily used, I'm worried about the impact of having a FutureWarning for so many at once. I am wondering if a gradual transition may be more user friendly.

@attack68
Copy link
Contributor

For your list:

Styler.apply(func, axis=0, subset=None, **kwargs)
Styler.applymap(func, subset=None, **kwargs)
Styler.where(cond, value, other=None, subset=None, **kwargs)
Styler.pipe(func, *args, **kwargs)

@rhshadrach
Copy link
Member Author

rhshadrach commented Feb 26, 2023

I'm now of the opinion these signatures should be:

func, args, kwargs, other_arg

Note the args / kwargs are not *args, **kwargs. While this means users will have to pass tuples / dicts to each of these respectively, it also ensures users can reliably pass arbitrary values without interfering with other args / kwargs.

@simonjayhawkins
Copy link
Member

Maybe a more radical simplification of the signature. Instead of using

func, args, kwargs, other_arg

what if we used just:

func, *, other_arg=value

By dropping explicit args and kwargs, we let users leverage functools.partial to bind any necessary positional or keyword arguments. This approach is especially appealing when func can be a list containing functions—or even a mix of functions and strings—because, in those cases, one would likely need to use functools.partial to handle diverse argument configurations anyway.

My thinking is that this more dramatic change would simplify the API and reduce the risk of subtle bugs. In the current design, there’s always the potential for positional and keyword arguments to collide, possibly resulting in hard-to-debug issues. With this proposal, if there’s any mismatch or if a required argument isn’t provided correctly, the code will simply break immediately rather than silently introducing errors.

That said, I remain cautious about whether this method will play nicely with JIT compilers or numpy ufuncs. It might require further investigation in those environments.

Also, moving away from the standard Python idiom of *args and **kwargs to a fixed tuple/dictionary format might be slightly awkward for users who are used to the established patterns. Especially when different functions in a list require distinct arguments, we'd likely need to manage this with collections of tuples and dictionaries.

How do you feel this over simplification will impact user experience?

@rhshadrach
Copy link
Member Author

@simonjayhawkins - I find requiring users to use functools.partial to be a viable way forward. I expect users to be annoyed with having to refactor their code, given how much apply et al may be used. This is especially the case when using pandas for ad-hoc data analysis where the code can become a bit more verbose. That said, I'm not opposed to the proposal.

@simonjayhawkins
Copy link
Member

in #61128 (comment) @datapythonista wrote

I think many cases considered here have no compatibility issues, like adding the missing args to methods which don't have them. This has actually been done recently in a method when a user needed it.

Now while not wanting to block any progress, I felt that this was overlapping with the proposal here and that we should have agreement on your proposal here before we can consider adding missing args to other methods.

My comment #40112 (comment) was more intended to restart this 4 year old discussion and not necessarily to not approve or agree other suggestions.

Signature: func=None, other_arg=value, *args, **kwargs

Signature: func, other_arg=value, *args, **kwargs

Now these two signatures are definitely clumsy IMO.

if i have a function, say,

def f(s, arg, kwarg=None):
    ...

then to pass the required arg as an positional arg I would need to do

s.agg(f,0,1)

as I also need to pass the (unused) axis parameter of .agg as a positional arg to avoid SyntaxError: positional argument follows keyword argument

whereas if i pass the arg for my function as a kwarg, I can simply write...

s.agg(f,arg=1)

and the required parameter for the function is passed and it is clearer and less error prone?

So it makes me think that having signatures with *args, especially after other_arg=value, should be avoided and forcing users to pass all arguments to the function using keyword arguments is preferable. i.e. removing *args from the signatures rather than adding to the ones that don't have it.

Now, you originally proposed #40112 (comment)

IMO, the most preferred signature is func=None, *args, other_arg=value, **kwargs

to avoid the positional arguments after the keyword arguments. If we look at this differently and do not allow positional args to be passed to the function, your original proposal becomes...

func=None, other_arg=value, **kwargs

which is IMO better than you latter proposal of

func, args, kwargs, other_arg

sweeping up kwargs at the end seems neater and less of a change to existing methods. Now I appreciate that using **kwargs is not typing friendly and hence my proposal to remove that too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior API Design Apply Apply, Aggregate, Transform, Map Enhancement Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

4 participants