update NDFrame setattr to match behavior of getattr #9004

jakevdp · 2014-12-04T22:29:27Z

This addresses the issue raised in #8994.
Here is the behavior before this PR:

data = pd.DataFrame({'x':[1, 2, 3]})

data.y = 2
data['y'] = [2, 4, 6]
data.y = 5

print(data.y)
#2
print(data['y'].values)
# [5, 5, 5]

Here is the behavior after this PR:

data = pd.DataFrame({'x':[1, 2, 3]})

data.y = 2
data['y'] = [2, 4, 6]
data.y = 5

print(data.y)
#5
print(data['y'].values)
# [2, 4, 6]

The fix boils down to the fact that when data.y is called, Python first calls data.__getattribute__('y') before calling data.__getattr__('y'), but when data.y = 5 is called, Python goes directly to data.__setattr__('y', 5) without any prior search for defined attributes.

I don't think there are any unintended side-effects to this change, but I'd appreciate some other folks thinking through whether this is the right solution. Thanks!

jreback · 2014-12-04T23:43:22Z

@jakevdp can you add some tests to validate this (e.g. your new behavior).

also a release note (you can use this exact example, with a code-block for the before), in the API section pls (and reference the original issue).

I think this is ok and seems to fix the 'hiding' issue.

The fundamental issue is that allowing columnar convenience access via attributes collides with 'regular' attribute access (as well as possible method ovewriting).

@jorisvandenbossche
@shoyer ?

jakevdp · 2014-12-05T00:34:16Z

It occurs to me that there's another option here: we could make it so that data.y = 5 results in the same thing as data['y'] = 5, i.e. creates a new column named 'y' if the column does not already exist. That may do even more to eliminate confusion between attributes and column names. Is there any reason this has not been done?

jreback · 2014-12-05T00:37:19Z

the problem with that is people expect to be able to set arbitrary attributes on objects (even though the propagation does not happen) - eg except for certain attributes these are lost when combined

I think that would be a nice change - but that would have to wait for 0.16 for consideration

going to release 0.15.2 shortly (I think this fix is ok as its really a bug)

jreback · 2014-12-05T13:49:18Z

@jakevdp awesome. can you add the example in the release notes as well.

We can merge this and discuss whether to change at a later date (e.g. setattr on a DataFrame to create a column rather than an attribute)

jakevdp · 2014-12-05T15:12:06Z

I updated the whatsnew in the most recent commit. Let me know if you want that to be modified in any way.

jreback · 2014-12-06T16:44:24Z

@jorisvandenbossche @shoyer ?

shoyer · 2014-12-06T20:08:09Z

This looks fine to me as a fix. I think we can safely assume that users haven't been relying on the broken behavior.

In the long term, I do think we should consider just making data.y = 5 equivalent to data['y'] = 5, but that will definitely be a breaking change, especially for people who do things like subclassing DataFrame. @hugadams any thoughts?

hughesadam87 · 2014-12-06T21:35:32Z

Thanks for cc'ing me.

One question I have is will this attribute persist? Jake, what is the return of:

data.y = 5 
d2 = data.y**2
d2.y

In regard to subclassing Frames, I actually don't do that per-se in my package; however, I know GeoPandas does. We use a composite class, so we're not storing attributes on the DataFrame. Of course, if it ever does come to the point that the DataFrame can store arbitrary attributes, it would require us to rethink the design I'm sure. In fact, I do plan to rework this in the future to subclass like GeoPandas. But in any case, I guess to summarize my thoughts is I'd say if this PR is generally good for Pandas, then +1 and we will work around any changes you guys make. It might be helpful to get someone from GeoPandas to check this out too.

shoyer · 2014-12-06T21:50:11Z

@hugadams Nope, the attribute is not persisted through arithmetic -- this PR does not change that behavior.

hughesadam87 · 2014-12-06T22:03:17Z

@shoyer Cool, thanks.

jakevdp · 2014-12-07T07:40:39Z

Hi @hugadams – I don't think this change will affect the example you bring up.

Note that, unless there's something I'm overlooking, the only thing this current fix changes is the case where:

a custom attribute is added to a dataframe (e.g. df.val = 4)
a column is then added with the same name as the attribute (e.g. df['val'] = [1, 2, 3])
the attribute is modified (e.g. df.val = 5)

I think that unless someone is doing that exact sequence of events, this change will not affect anything.

shoyer · 2014-12-07T08:43:56Z

pandas/core/generic.py

+
+        try:
+            object.__getattribute__(self, name)
+            return object.__setattr__(self, name, value)


you don't need to return here (not that it really matters, but you did remove the unnecessary return below!)

It was either this, or this followed by return None. I decided to go for brevity 😄

Oh, I see why it's needed now :)

jreback · 2014-12-07T15:15:07Z

closed via d5963dc

@jakevdp thanks!

update NDFrame __setattr__ to match behavior of __getattr__

2bf6d18

jakevdp changed the title ~~update NDFrame __setattr__ to match behavior of __getattr__ (Addresses #8994)~~ update NDFrame __setattr__ to match behavior of __getattr__ Dec 4, 2014

jreback added the API Design label Dec 4, 2014

jreback added this to the 0.15.2 milestone Dec 4, 2014

TST: test setattr when attr and column have same name

e732906

add whatsnew entry for PR pandas-dev#9004

7387a5d

shoyer reviewed Dec 7, 2014
View reviewed changes

jreback closed this Dec 7, 2014

jreback mentioned this pull request Dec 7, 2014

Asymmetry in corner case for DataFrame __getattr__ and __setattr__ #8994

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update NDFrame setattr to match behavior of getattr #9004

update NDFrame setattr to match behavior of getattr #9004

jakevdp commented Dec 4, 2014

jreback commented Dec 4, 2014

jakevdp commented Dec 5, 2014

jreback commented Dec 5, 2014

jreback commented Dec 5, 2014

jakevdp commented Dec 5, 2014

jreback commented Dec 6, 2014

shoyer commented Dec 6, 2014

hughesadam87 commented Dec 6, 2014

shoyer commented Dec 6, 2014

hughesadam87 commented Dec 6, 2014

jakevdp commented Dec 7, 2014

shoyer Dec 7, 2014

jakevdp Dec 7, 2014

shoyer Dec 7, 2014

jreback commented Dec 7, 2014

update NDFrame __setattr__ to match behavior of __getattr__ #9004

update NDFrame __setattr__ to match behavior of __getattr__ #9004

Conversation

jakevdp commented Dec 4, 2014

jreback commented Dec 4, 2014

jakevdp commented Dec 5, 2014

jreback commented Dec 5, 2014

jreback commented Dec 5, 2014

jakevdp commented Dec 5, 2014

jreback commented Dec 6, 2014

shoyer commented Dec 6, 2014

hughesadam87 commented Dec 6, 2014

shoyer commented Dec 6, 2014

hughesadam87 commented Dec 6, 2014

jakevdp commented Dec 7, 2014

shoyer Dec 7, 2014

Choose a reason for hiding this comment

jakevdp Dec 7, 2014

Choose a reason for hiding this comment

shoyer Dec 7, 2014

Choose a reason for hiding this comment

jreback commented Dec 7, 2014

update NDFrame setattr to match behavior of getattr #9004

update NDFrame setattr to match behavior of getattr #9004