-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Added public accessor registrar #18827
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Added public accessor registrar #18827
Conversation
Adds new methods for registing custom accessors to pandas objects. This will be helpful for implementing pandas-dev#18767 outside of pandas. Closes pandas-dev#14781
Hello @TomAugspurger! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on January 15, 2018 at 12:34 Hours UTC |
@TomAugspurger : A couple of points:
|
doc/source/internals.rst
Outdated
>>> ds.geo.center | ||
(5.0, 10.0) | ||
>>> ds.geo.plot() | ||
# plots data on a map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make this less abstract, you probably will need plots and some more concrete code to actually plot these points.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with this being abstract. The point is the accessor, not the functionality provided by it, if that makes sense.
Codecov Report
@@ Coverage Diff @@
## master #18827 +/- ##
==========================================
- Coverage 91.55% 91.53% -0.02%
==========================================
Files 147 148 +1
Lines 48812 48837 +25
==========================================
+ Hits 44690 44704 +14
- Misses 4122 4133 +11
Continue to review full report at Codecov.
|
|
||
|
||
# Ported with modifications from xarray | ||
# https://github.com/pydata/xarray/blob/master/xarray/core/extensions.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, please do let me know exactly what you needed to change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's the relevant diff between xarray and pandas:
@TomAugspurger : For better or worse, that's a much less significant change than I expected. Maybe should add a comment about that.
this should not be in the main public namespace. |
Here's the relevant diff between xarray and pandas: diff --git a/xarray.py b/pandas.py
index 48b2cba..fc3dd6b 100644
--- a/xarray.py
+++ b/pandas.py
@@ -11,9 +11,10 @@ class _CachedAccessor(object):
try:
accessor_obj = self._accessor(obj)
except AttributeError:
- # __getattr__ on data object will swallow any AttributeErrors raised
- # when initializing the accessor, so we need to raise as something
- # else (GH933):
+ # TODO
+ # __getattr__ on data object will swallow any AttributeErrors
+ # raised when initializing the accessor, so we need to raise
+ # as something else (GH933):
msg = 'error initializing %r accessor.' % self._name
if PY2:
msg += ' Full traceback:\n' + traceback.format_exc()
@@ -30,9 +31,9 @@ def _register_accessor(name, cls):
def decorator(accessor):
if hasattr(cls, name):
warnings.warn(
- 'registration of accessor %r under name %r for type %r is '
- 'overriding a preexisting attribute with the same name.'
- % (accessor, name, cls),
+ 'registration of accessor {!r} under name {!r} for type '
+ '{!r} is overriding a preexisting attribute with the same '
+ 'name.'.format(accessor, name, cls),
AccessorRegistrationWarning,
stacklevel=2)
setattr(cls, name, _CachedAccessor(name, accessor))
(my TODO is for whether or not that matters for pandas). Otherwise, we're trying to use new-style string formatting. |
pandas/core/accessor.py
Outdated
# https://github.com/pydata/xarray/blob/master/xarray/core/extensions.py | ||
|
||
|
||
class _CachedAccessor(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason you are not using this for the internal accessors?
pandas/extensions/__init__.py
Outdated
@@ -0,0 +1,4 @@ | |||
"""Public API for extending panadas objects.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is better, but still not a fan of extending the top-level namespaces (e.g. with extensions). api.extensions would be better
pandas/tests/api/test_api.py
Outdated
@@ -41,6 +41,13 @@ class TestPDApi(Base): | |||
# misc | |||
misc = ['IndexSlice', 'NaT'] | |||
|
|||
# extension points |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see above
from pandas.errors import AccessorRegistrationWarning | ||
|
||
|
||
@contextlib.contextmanager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just use the pytest monkeypatch fixture for this
Updated.
|
I disagree. The best use of this is to actually implement the internals. If it works, great, if not then you have found a bug / feature lacking. |
Refactored to have our accessors use it.
In [9]: s = pd.Series(pd.TimedeltaIndex(['5S', '10S']))
In [10]: s.dt.values.isna()
Out[10]: array([False, False], dtype=bool)
In [11]: s[0] = float('nan')
# isna has been cached on the accessor's .values, but not invalidated
# when s was mutated
In [12]: s.dt.values.isna()
Out[12]: array([ False, False], dtype=bool) To work around this, I added a
|
This seems like an inadvertent detail of the implementation of So the other fix would be to change how the I don't think we need a |
Making it easy to extend functionality is great, but the new commit that changes where the current str/cat/dt attributes are defined I'm very much -1 on. Inheritance notwithstanding, the class definition should be WYSIWYG. Newbies shouldn't have to track down where e.g. |
That's my preference as well, we just won't be able to / have to use the |
1. Removed optional caching 2. Refactored `Properties` to create the indexes it uses on demand 3. Moved accessor definitions to classes for clarity
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -119,6 +119,56 @@ Current Behavior | |||
|
|||
s.rank(na_option='top') | |||
|
|||
|
|||
Extending Pandas Objects with New Accessors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on removing this section in the whatsnew? This may be too niche for most users to care about.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. some api consistency questions.
pandas/core/accessor.py
Outdated
# 1. We don't need to catch and re-raise AttributeErrors as RuntimeErrors | ||
|
||
|
||
class _CachedAccssor(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
name this: CachedAccessor
we use this internally so private is not appropriate
# NDFrame | ||
object.__setattr__(obj, self._name, accessor_obj) | ||
return accessor_obj | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it might make sense to have a few methods here which are AbstractMethodError to guide the api, eg. validate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to leave that open to the person implementing the accessor. Perhaps for some accessors, there's no need for validation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too general we need to be opionated, if there is no validation then that should be explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean be "too general". The purpose of this is to give an officially blessed way to do "DataFrame.foo = MyAccessor"; what you do past that point is up to the library. It's not like CachedAccessor
ever has a chance to see the data so that it can validate things.
The only restriction on the accessor is that its init method be __init__(self, pandas_obj)
. I'll document that, but I'm not sure what else there is to be done.
self.index = index | ||
self.name = name | ||
def __init__(self, data): | ||
self._validate(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see my commet above, these can be non-private I think as these are not user exposed
data : Series | ||
copy : boolean, default False | ||
copy the input data | ||
def _get_values(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally would like these to be non-private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's best for methods and attributes on the accessor to be private. Imagine if we wanted to delegate a method like .get_values
, so foo.bar.get_values
. This would break the accessor, since its get_values
would be broken.
|
||
@classmethod | ||
def _make_accessor(cls, data): | ||
def __new__(cls, data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we be consistent for validation, IOW either do it in __new__
or in __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to put this in __new__
since we don't instantiate a CombinedDatetimelikeProperties
, just one of its parents. I think this is unavoidable, but it would be nice to make it more consistent with the others. I'll see what I can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One option is to write a Properties._validate
do most of the work here. But it feels weird to have a base class decide which child class should be instantiated. I think having it in __new__
(with a comment for why it's there) is clearest.
@jreback @jbrockmendel thoughts on the refactored version? Anything else you'd like to see changed? |
Completely addresses my WYSIWYG complaint, thanks. |
@jreback could you clarify your thoughts here? I've documented that the only requirement on the user-defined accessor is that it's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, looks fine. one question on the warning.
@@ -65,3 +65,7 @@ class MergeError(ValueError): | |||
Error raised when problems arise during merging due to problems | |||
with input data. Subclass of `ValueError`. | |||
""" | |||
|
|||
|
|||
class AccessorRegistrationWarning(Warning): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need a special warning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched to a UserWarning, which matches what we do with df.x = [1, 2, 3]
.
Adds new methods for registing custom accessors to pandas objects.
This will be helpful for implementing #18767
outside of pandas.
If we accept this I'm not sure it belongs in the top-level namespace where I have it currently. What would a good home be for this be?
pd.api.extensions
?Closes #14781