Skip to content

Prevent adding new attributes to the accessors .str, .dt and .cat #11575

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jankatins
Copy link
Contributor

assigning to Series.str, Series.dt, or Series.cat was not failing
although you couldn't get the value back:

In[10]: a = pandas.Series(pandas.Categorical(list("abc")))
In[11]: a.cat.labels = [1,2]
In[12]: a.cat.labels
Traceback (most recent call last):
  File "C:\portabel\miniconda\envs\pandas_dev\lib\site-packages\IPython\core\interactiveshell.py", line 2883, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-12-a68ee763e4e8>", line 1, in <module>
    a.cat.labels
AttributeError: 'CategoricalAccessor' object has no attribute 'labels'

Now we fail early:

In[10]: a = pandas.Series(pandas.Categorical(list("abc")))
In[11]: a.cat.labels = [1,2]
Traceback (most recent call last):
  File "C:\data\external\pandas\pandas\tests\test_categorical.py", line 3633, in test_cat_accessor_no_new_attributes
    c.cat.labels = [1,2]
  File "C:\data\external\pandas\pandas\core\base.py", line 121, in __setattr__
    raise AttributeError( "You cannot add any new attribute '{key}'".format(key=key))
AttributeError: You cannot add any new attribute 'labels'

Closes: #10673

@jankatins jankatins force-pushed the prevent_new_accessor_attributes branch from 9a72002 to 344cf2b Compare November 11, 2015 22:14
@jankatins jankatins changed the title Prevent adding new attributes to the accessors .str and .cat Prevent adding new attributes to the accessors .str, .dt and .cat Nov 11, 2015
@jankatins jankatins force-pushed the prevent_new_accessor_attributes branch from 344cf2b to b364a15 Compare November 11, 2015 22:17
@max-sixty
Copy link
Contributor

I've had some pickling issues with doing this in the past, as __setattr__ gets called during unpickling, without __init__ being called.
Another way - slightly less elegant but lower blast radius - is to set a private variable _str and then have a property def str(self): ; return self._str. That'll also prevent setting of str.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2015

I agree here with @MaximilianR
this is already done in NDFrame using _accessors as a white-list with __getattr__, so should be straightforward, so I think would be ok to emulate in PandasDelegate (not that these allow sub-classes to override, e.g. Series does for .dt,.str.,.cat themselves.

@jreback jreback added API Design Error Reporting Incorrect or improved errors from pandas labels Nov 12, 2015
@jankatins jankatins force-pushed the prevent_new_accessor_attributes branch from b364a15 to 985a957 Compare November 12, 2015 11:37
@jankatins
Copy link
Contributor Author

I'm not sure I can follow: as far as I can see, the accessors .str, .dt and .cat are not pickled (and they should not!). This is not trying to prevent s.str = something but s.str.someattr = something. The main benefit is that a user, who misspells an attribute, is not surprised. E.g. when using s.cat.labels instead of s.cat.categories.

Currently assigning (s.cat.labels = "aaa") works, but you can't get the value from this attribute back (print(s.cat.labels) raises an AttributeError). I've actually no idea where this behaviour comes from, the last should work, as the first (without this PR) uses the normal item assignment and the latter normal get (that's at least what the pycharm debugger lets me see).

@jankatins
Copy link
Contributor Author

This needs a rebase if pulled after #11582...

@@ -3106,6 +3106,11 @@ def _simple_new(cls, values, name=None, categories=None, ordered=None, **kwargs)
result._reset_identity()
return result

# PandasDelegate prevents adding any new attributes as that's what reasonable if used as
# accessor. But we want it again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this inherits from PandasObject so not necessary

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least during my tests it was? It inherits from PandasDelegate:

class CategoricalIndex(Index, PandasDelegate):

@jankatins jankatins force-pushed the prevent_new_accessor_attributes branch from 985a957 to 8cf43f1 Compare November 13, 2015 11:12
@jankatins
Copy link
Contributor Author

Ok, took me a while to find a place for the mixin: both core.common and core.base resulted in circular depencies due to the way the StringMethods are defined (str = ...(StringMethods,...) -> can't get the import to a function :-():

Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 143.165\helpers\pycharm\utrunner.py", line 121, in <module>
    modules = [loadSource(a[0])]
  File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 143.165\helpers\pycharm\utrunner.py", line 41, in loadSource
    module = imp.load_source(moduleName, fileName)
  File "C:\data\external\pandas\pandas\tests\test_base.py", line 6, in <module>
    import pandas.compat as compat
  File "C:\data\external\pandas\pandas\__init__.py", line 42, in <module>
    import pandas.core.config_init
  File "C:\data\external\pandas\pandas\core\config_init.py", line 17, in <module>
    from pandas.core.format import detect_console_encoding
  File "C:\data\external\pandas\pandas\core\format.py", line 8, in <module>
    from pandas.core.base import PandasObject
  File "C:\data\external\pandas\pandas\core\base.py", line 10, in <module>
    from pandas.core.strings import StringMethods
  File "C:\data\external\pandas\pandas\core\strings.py", line 4, in <module>
    from pandas.core.base import NoNewAttributesMixin
ImportError: cannot import name NoNewAttributesMixin

@jankatins jankatins force-pushed the prevent_new_accessor_attributes branch 2 times, most recently from fea2a1f to 2d25b16 Compare November 13, 2015 12:11
@@ -25,6 +25,7 @@ Enhancements
objects for the ``filepath_or_buffer`` argument. (:issue:`11033`)
- ``DataFrame`` now uses the fields of a ``namedtuple`` as columns, if columns are not supplied (:issue:`11181`)
- Improve the error message displayed in :func:`pandas.io.gbq.to_gbq` when the DataFrame does not match the schema of the destination table (:issue:`11359`)
- Prevent adding new attributes to the accessors .str, .dt and .cat (:issue:`10673`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API change section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was actually thinking it should go to bugfix: currently you can't get the value back, so the concept is broken and we only make the error more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok (put the .str etal with backticks)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done (to bug and added backticks)

@jankatins jankatins force-pushed the prevent_new_accessor_attributes branch from 2d25b16 to bfa6c9b Compare November 13, 2015 14:42
@jankatins
Copy link
Contributor Author

BTW: The Series and Indexes are kind of inconsistent:

  • Series only has accessors, so no datetime/string/categorical related functions directly exposed under Series but .dt, .cat, which are implemented as Delegate and .str, which is no Delegate
  • CategoricalIndex is a Delegate directly (-> CatIndex.codes works directly without CatIndex.cat.codes)
  • DatetimeIndex is not a Delegate, but exposes it's methods directly, too, via mixins
  • There is no StringIndex but Index.str.... exists (via IndexOpsMixin) for all Indexes?

@jreback
Copy link
Contributor

jreback commented Nov 13, 2015

@JanSchulz

those are not inconsistent, merely implementations. DatetimeIndex, actually defines things (IOW it is the 'target' of the delegation). CategoricalIndex uses the Delegate (so its like Series in this respect). Index/Series are also users of .str.

@jankatins
Copy link
Contributor Author

I disagree: Consistent would be either:

  • Series and Index would use accessors, which are delegates to the real data below (like Series.cat or to a wrapper like in Series.str / Series.dt (ok, preoblematic, because we currently don't have real methods on a string/datetime objects, only numpy/pandas functions. Or the DateTimeIndex, which is used like one...)
  • Series and Index would both have subclasses like DateTimeSeries (does not exist) or DateTimeIndex (does exist), which expose the API directly.

This commit is mostly for the benefit of users who misspell things
on the accessors.

Assigning to `Series.str`, `Series.dt`, or `Series.cat` was not failing
although you couldn't get the value back:

```
In[10]: a = pandas.Series(pandas.Categorical(list("abc")))
In[11]: a.cat.labels = [1,2]
In[12]: a.cat.labels
AttributeError: 'CategoricalAccessor' object has no attribute 'labels'
```

Now we fail early:

```
In[10]: a = pandas.Series(pandas.Categorical(list("abc")))
In[11]: a.cat.labels = [1,2]
AttributeError: You cannot add any new attribute 'labels'
```

Refactor/add a StringAccessorMixin to break a import cycle.
@jankatins jankatins force-pushed the prevent_new_accessor_attributes branch from bfa6c9b to 065415d Compare November 13, 2015 17:22
@jankatins
Copy link
Contributor Author

So, move the string part to a new mixin in string.py.. If this is green, I think this should be ready to merge...

@jreback jreback added this to the 0.17.1 milestone Nov 13, 2015
@jreback
Copy link
Contributor

jreback commented Nov 13, 2015

@JanSchulz looks good! ping on green.

@jankatins
Copy link
Contributor Author

Seems travis is dead?!

Ok, I will rebase #11582 on top of this PR, to shortcut that process a bit...

jreback added a commit that referenced this pull request Nov 14, 2015
Prevent adding new attributes to the accessors .str, .dt and .cat
@jreback jreback merged commit 9cbc179 into pandas-dev:master Nov 14, 2015
@jreback
Copy link
Contributor

jreback commented Nov 14, 2015

thanks @JanSchulz

go ahead and rebase #11582

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants