Skip to content

CLN/INT: avoid getattr(obj, 'values', obj) idiom #27167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jul 1, 2019 · 4 comments · Fixed by #44224
Closed

CLN/INT: avoid getattr(obj, 'values', obj) idiom #27167

jorisvandenbossche opened this issue Jul 1, 2019 · 4 comments · Fixed by #44224
Labels
Clean Internals Related to non-user accessible pandas implementation
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jul 1, 2019

We use this idiom quite a bit in the code base:

values = getattr(values, 'values', values)

One example:

values = getattr(values, 'values', values)

Personally, I find that this makes the code harder to reason about. What kind of object is values? What does the .values attribute return? (which depends on the object and dtype).

I think a more explicit coding can make things clearer. Sometimes I would find it clearer to just do:

if isinstance(values, ABCSeries):
    values = values.values

This is longer, yes, but then it is at least clear that it can potentially be a Series.

Although we should probably even try to avoid that algorithmic code is dealing with a mixture of Series, Index, EA, ndarray. I think, in general, we should nowadays be able to unpack containers in the beginning of / before algorithmic code so we know we only have either ndarray or EA.
Eg we could have an ensure_array that does such explicit checks and ensures the values are ndarray or EA.

cc @jreback @jbrockmendel @TomAugspurger

@jorisvandenbossche jorisvandenbossche added Internals Related to non-user accessible pandas implementation Clean labels Jul 1, 2019
@jorisvandenbossche jorisvandenbossche added this to the Contributions Welcome milestone Jul 1, 2019
@jreback
Copy link
Contributor

jreback commented Jul 1, 2019

yep this is a good idea

@TomAugspurger
Copy link
Contributor

Although we should probably even try to avoid that algorithmic code is dealing with a mixture of Series, Index, EA, ndarray. I think, in general, we should nowadays be able to unpack containers in the beginning of / before algorithmic code so we know we only have either ndarray or EA.
Eg we could have an ensure_array that does such explicit checks and ensures the values are ndarray or EA.

This is my preference. We have pandas.core.internals.construction.extract_array (should probably be moved).

@jreback
Copy link
Contributor

jreback commented Jul 1, 2019

ahh yes

if we rename to ensure_array and put in dtypes.common would be consistent

@jbrockmendel
Copy link
Member

Generally on board, question about exactly what you have in mind:

if isinstance(values, ABCSeries):
    values = values.values

Do you have a strong opinion on this versus values._values or values.array? e.g. I'm pretty sure there are places where we do values = getattr(values, '_values', getattr(values, 'values', values)) which is... yikes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants