Skip to content

CLN: use staticmethods where applicable. #34102

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
joooeey opened this issue May 10, 2020 · 18 comments
Closed

CLN: use staticmethods where applicable. #34102

joooeey opened this issue May 10, 2020 · 18 comments
Labels
Closing Candidate May be closeable, needs more eyeballs Needs Discussion Requires discussion from core team before further action

Comments

@joooeey
Copy link
Contributor

joooeey commented May 10, 2020

Issue

In my arbitrary sample of three .py files in the code base, Two files had multiple class methods in which the cls attribute was never used. It's entirely incomprehensible to me why these are specifically decorated as class methods since they don't access the class. It would improve code readability if they were static methods. Technically, most of them could also be module-level functions but refactoring that appears to be way more effort.

An example is pandas/plotting/matplotlib/core.py.

Proposed fix

  • To prevent anyone from making this mistake again, we should make sure our Linter rejects methods with unused arguments such as PyLint does.
  • Write a script that converts all methods (including class methods) with an unused first argument (typically self and cls) to static methods. Run that script across the Pandas codebase.
  • Fix any resulting test failures or other issues. Though very likely there will be none because a static method is virtually indistinguishable from a class method or instance method except if you check the type directly.
  • Make the pull request.
@joooeey joooeey added Needs Triage Issue that has not been reviewed by a pandas team member Usage Question labels May 10, 2020
@MarcoGorelli MarcoGorelli added Needs Discussion Requires discussion from core team before further action and removed Usage Question Needs Triage Issue that has not been reviewed by a pandas team member labels May 10, 2020
@jbrockmendel
Copy link
Member

Might also be worth tracking down non-classmethods that dont use self.

@CloseChoice
Copy link
Member

I just ran the pycharm code inspection and found 330 "method may be static warnings". Before we can make an automated check, we first need to change all of them. It may make sense to split this into multiple merge requests.

@joooeey
Copy link
Contributor Author

joooeey commented May 10, 2020

Might also be worth tracking down non-classmethods that dont use self.

Absolutely. The proposed fix included that from the beginning. The only reason the issue description focuses on the class methods is that I've only seen this issue with class methods in Pandas. We could even expand to address all methods with unused arguments and if it's not self or cls we can del the unused argument at the start of the method as PyLint recommends. This should be simple to automate too.

@jreback
Copy link
Contributor

jreback commented May 10, 2020

-1 on any use of staticmethod at all
this is an anti pattern in python

you can refactor to module level or leave

@joooeey
Copy link
Contributor Author

joooeey commented May 10, 2020

I understand your distaste for static method although I don't see why it's that extreme. The current state of hundreds of methods that are functionally static methods but not declared as such is what I'm worried about. Refactoring to static method is pragmatic in this case because it can be automated and leaves interfaces intact. But you are definitely free to take on this PR and refactor the ~330 class methods to module level functions. Maybe that can be scripted somehow too, or do it by hand. I agree that that would be cleaner, I just can't see anyone investing the time to do it.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented May 11, 2020

An example is pandas/plotting/matplotlib/core.py.

For this specific example, there are actually some technical reasons for the use of classmethod. For example, the _plot class methods need to be methods on the class, because they can be overrriden by subclasses (so factoring out as module level free-standing function doesn't work here). And second, although many of them don't use cls at all, they still need to be classmethods, because a subclass can use cls (and this actually happens in LinePlot._plot).
I only looked at _plot, but similar arguments might apply for the other class methods in this file as well.

This is of course just about this specific file. There certainly can be other places where class methods are used less purposefully. But so although we can talk about the general principle of not using class methods if it is not needed (with which I agree), it might be useful/needed in the end to check case by case.

But for example I check some random other classmethod occurences, and they were almost all valid use cases AFAIK.

@joooeey
Copy link
Contributor Author

joooeey commented May 11, 2020

And second, although many of them don't use cls at all, they still need to be classmethods, because a subclass can use cls (and this actually happens in LinePlot._plot).

Why would they need to be classmethod? They should be staticmethod. Derived classes can still override it with a classmethod. No? Am I missing something?

I was aware that overriding might be a concern and that's a big reason why I suggested the specific proposed fix in the first post. Thanks for spelling it out! So why don't we focus on the fix I proposed (the low hanging fruit) and then consider refactoring static methods in the code base to module level where that makes sense?

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented May 11, 2020

Derived classes can still override it with a classmethod. No? Am I missing something?

I suppose so yes, but personally I wouldn't find that better practice to overide a staticmethod as a classmethod, than to have some classmethods that don't use cls (but that's just a personal opinion).

So why don't we focus on the fix I proposed (the low hanging fruit)

I am personally fine with converting some classmethods to staticmethods, but I am still wondering if there are really that much cases if excluding the cases like the matplotlib file (eg in pandas/core/generic.py, there are 8 classmethod, all of which use cls. But I suppose it might vary a lot by file? And I am by accident looking in the "wrong" files)

@joooeey
Copy link
Contributor Author

joooeey commented May 11, 2020

I don't know where they are either. But I guess pycharm would tell you. I maintain that all 330 cases that pycharm reports are bad, simply because unused method (or function) arguments are a code smell. That also applies to the cases where cls is unused but a subclass overrides the method.

@jorisvandenbossche
Copy link
Member

I maintain that all 330 cases that pycharm reports are bad, simply because unused method (or function) arguments are a code smell.

My simple search for "@classmethod" says there are only 285 occurrences, and a large part of the ones I randomly looked at are using cls

@joooeey
Copy link
Contributor Author

joooeey commented May 11, 2020

That would imply that of the 330 cases, most are instance methods (i. e. normal methods) where self is not used in the method body. I don't find that surprising.

By the way a text search of @classmethod in all files in pandas-dev/pandas gave me 400 matches. Any idea where that difference might come from?

@jbrockmendel
Copy link
Member

By the way a text search of @classmethod in all files in pandas-dev/pandas gave me 400 matches. Any idea where that difference might come from?

Best guess is that you are including results inside generated C files

@jorisvandenbossche
Copy link
Member

Yeah, I was only looking at classmethods, not normal methods that don't use self, so that's probably the reason I see less occurences.

For normal methods that don't use self, and only for pandas/core, I get for example:

$ pylint pandas/core --disable=all --enable=no-self-use
************* Module pandas.core.resample
pandas/core/resample.py:156:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.series
pandas/core/series.py:331:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.indexing
pandas/core/indexing.py:951:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.accessor
pandas/core/accessor.py:54:4: R0201: Method could be a function (no-self-use)
pandas/core/accessor.py:57:4: R0201: Method could be a function (no-self-use)
pandas/core/accessor.py:60:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.indexes.multi
pandas/core/indexes/multi.py:304:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/multi.py:2611:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.indexes.interval
pandas/core/indexes/interval.py:633:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.indexes.range
pandas/core/indexes/range.py:536:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.indexes.base
pandas/core/indexes/base.py:243:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:246:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:249:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:252:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:705:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:847:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:2423:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:2779:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:2954:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:3049:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:3160:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:3176:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:3191:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:3945:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:3951:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:4433:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:4576:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:4590:4: R0201: Method could be a function (no-self-use)
pandas/core/indexes/base.py:4705:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.window.rolling
pandas/core/window/rolling.py:199:4: R0201: Method could be a function (no-self-use)
pandas/core/window/rolling.py:366:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.groupby.ops
pandas/core/groupby/ops.py:358:4: R0201: Method could be a function (no-self-use)
pandas/core/groupby/ops.py:590:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.groupby.generic
pandas/core/groupby/generic.py:332:4: R0201: Method could be a function (no-self-use)
pandas/core/groupby/generic.py:1502:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.internals.blocks
pandas/core/internals/blocks.py:1605:4: R0201: Method could be a function (no-self-use)
pandas/core/internals/blocks.py:2050:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.arrays.datetimes
pandas/core/arrays/datetimes.py:512:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.arrays.boolean
pandas/core/arrays/boolean.py:100:4: R0201: Method could be a function (no-self-use)
pandas/core/arrays/boolean.py:692:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.arrays.string_
pandas/core/arrays/string_.py:78:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.arrays.period
pandas/core/arrays/period.py:590:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.arrays.base
pandas/core/arrays/base.py:1003:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.computation.pytables
pandas/core/computation/pytables.py:138:4: R0201: Method could be a function (no-self-use)
************* Module pandas.core.computation.expr
pandas/core/computation/expr.py:524:4: R0201: Method could be a function (no-self-use)
pandas/core/computation/expr.py:688:4: R0201: Method could be a function (no-self-use)

Now, I personally don't feel very strongly about wether those should be cleaned up to be staticmethods or not.

@joooeey
Copy link
Contributor Author

joooeey commented May 12, 2020

I agree that this is a lot of work even if it can be automated. I just wanted to document this issue because I noticed it when working on #33821 and we briefly discussed it on that PR. Not sure if it is worth fixing for legacy code.

However, I do think that my first bullet point should be implemented, that is, new additions to the code shouldn't include any functions or methods with unused arguments. Lint is only applied on new code, is that right? This might be as easy as changing a single Linter option. I have no idea how our build process works though. (I have to say, it's possible that we're already preventing this. But I don't know how to check if we do.)

@jbrockmendel
Copy link
Member

Lint is only applied on new code, is that right?

No, the code checks apply to the whole code base.

with unused arguments

In many cases these are worth hunting down, but we would need to whitelist a number of cases where we include unused kwargs for e.g. numpy compat

@joooeey
Copy link
Contributor Author

joooeey commented May 12, 2020

I see. For cases where unused kwargs are required, Pylint's suggested fix of using del mykwarg at the top of the method is a neat solution. That makes it faster to comprehend the method.

@jbrockmendel jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Sep 24, 2020
@MarcoGorelli
Copy link
Member

Rather than closing, could/should this be repurposed as "delete unused arguments"?

@MarcoGorelli
Copy link
Member

closing as there hasn't been much/any uptake here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs Needs Discussion Requires discussion from core team before further action
Projects
None yet
Development

No branches or pull requests

6 participants