Skip to content

update NDFrame __setattr__ to match behavior of __getattr__ #9004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

jakevdp
Copy link
Contributor

@jakevdp jakevdp commented Dec 4, 2014

This addresses the issue raised in #8994.
Here is the behavior before this PR:

data = pd.DataFrame({'x':[1, 2, 3]})

data.y = 2
data['y'] = [2, 4, 6]
data.y = 5

print(data.y)
#2
print(data['y'].values)
# [5, 5, 5]

Here is the behavior after this PR:

data = pd.DataFrame({'x':[1, 2, 3]})

data.y = 2
data['y'] = [2, 4, 6]
data.y = 5

print(data.y)
#5
print(data['y'].values)
# [2, 4, 6]

The fix boils down to the fact that when data.y is called, Python first calls data.__getattribute__('y') before calling data.__getattr__('y'), but when data.y = 5 is called, Python goes directly to data.__setattr__('y', 5) without any prior search for defined attributes.

I don't think there are any unintended side-effects to this change, but I'd appreciate some other folks thinking through whether this is the right solution. Thanks!

@jakevdp jakevdp changed the title update NDFrame __setattr__ to match behavior of __getattr__ (Addresses #8994) update NDFrame __setattr__ to match behavior of __getattr__ Dec 4, 2014
@jreback
Copy link
Contributor

jreback commented Dec 4, 2014

@jakevdp can you add some tests to validate this (e.g. your new behavior).

also a release note (you can use this exact example, with a code-block for the before), in the API section pls (and reference the original issue).

I think this is ok and seems to fix the 'hiding' issue.

The fundamental issue is that allowing columnar convenience access via attributes collides with 'regular' attribute access (as well as possible method ovewriting).

@jorisvandenbossche
@shoyer ?

@jreback jreback added this to the 0.15.2 milestone Dec 4, 2014
@jakevdp
Copy link
Contributor Author

jakevdp commented Dec 5, 2014

It occurs to me that there's another option here: we could make it so that data.y = 5 results in the same thing as data['y'] = 5, i.e. creates a new column named 'y' if the column does not already exist. That may do even more to eliminate confusion between attributes and column names. Is there any reason this has not been done?

@jreback
Copy link
Contributor

jreback commented Dec 5, 2014

the problem with that is people expect to be able to set arbitrary attributes on objects (even though the propagation does not happen) - eg except for certain attributes these are lost when combined

I think that would be a nice change - but that would have to wait for 0.16 for consideration

going to release 0.15.2 shortly (I think this fix is ok as its really a bug)

@jreback
Copy link
Contributor

jreback commented Dec 5, 2014

@jakevdp awesome. can you add the example in the release notes as well.

We can merge this and discuss whether to change at a later date (e.g. setattr on a DataFrame to create a column rather than an attribute)

@jakevdp
Copy link
Contributor Author

jakevdp commented Dec 5, 2014

I updated the whatsnew in the most recent commit. Let me know if you want that to be modified in any way.

@jreback
Copy link
Contributor

jreback commented Dec 6, 2014

@jorisvandenbossche @shoyer ?

@shoyer
Copy link
Member

shoyer commented Dec 6, 2014

This looks fine to me as a fix. I think we can safely assume that users haven't been relying on the broken behavior.

In the long term, I do think we should consider just making data.y = 5 equivalent to data['y'] = 5, but that will definitely be a breaking change, especially for people who do things like subclassing DataFrame. @hugadams any thoughts?

@hughesadam87
Copy link

Thanks for cc'ing me.

One question I have is will this attribute persist? Jake, what is the return of:

data.y = 5 
d2 = data.y**2
d2.y

In regard to subclassing Frames, I actually don't do that per-se in my package; however, I know GeoPandas does. We use a composite class, so we're not storing attributes on the DataFrame. Of course, if it ever does come to the point that the DataFrame can store arbitrary attributes, it would require us to rethink the design I'm sure. In fact, I do plan to rework this in the future to subclass like GeoPandas. But in any case, I guess to summarize my thoughts is I'd say if this PR is generally good for Pandas, then +1 and we will work around any changes you guys make. It might be helpful to get someone from GeoPandas to check this out too.

@shoyer
Copy link
Member

shoyer commented Dec 6, 2014

@hugadams Nope, the attribute is not persisted through arithmetic -- this PR does not change that behavior.

@hughesadam87
Copy link

@shoyer Cool, thanks.

@jakevdp
Copy link
Contributor Author

jakevdp commented Dec 7, 2014

Hi @hugadams – I don't think this change will affect the example you bring up.

Note that, unless there's something I'm overlooking, the only thing this current fix changes is the case where:

  1. a custom attribute is added to a dataframe (e.g. df.val = 4)
  2. a column is then added with the same name as the attribute (e.g. df['val'] = [1, 2, 3])
  3. the attribute is modified (e.g. df.val = 5)

I think that unless someone is doing that exact sequence of events, this change will not affect anything.


try:
object.__getattribute__(self, name)
return object.__setattr__(self, name, value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to return here (not that it really matters, but you did remove the unnecessary return below!)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was either this, or this followed by return None. I decided to go for brevity 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see why it's needed now :)

@jreback
Copy link
Contributor

jreback commented Dec 7, 2014

closed via d5963dc

@jakevdp thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants