Skip to content

REF: Dispatch string methods to ExtensionArray #36357

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 35 commits into from
Sep 30, 2020
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
9e90d4e
Implement BaseDtypeTests for ArrowStringDtype
xhochy Jul 10, 2020
92f1d26
Refactor to use parametrized StringDtype
TomAugspurger Sep 3, 2020
00096f0
wip
TomAugspurger Sep 8, 2020
5a89dbf
Merge remote-tracking branch 'upstream/master' into arrow-string-arra…
TomAugspurger Sep 11, 2020
89f8e6a
annoyed
TomAugspurger Sep 11, 2020
3f82225
Merge remote-tracking branch 'upstream/master' into dispatch-string-m…
TomAugspurger Sep 14, 2020
fabc01e
wip
TomAugspurger Sep 14, 2020
a4d4ad5
remove old
TomAugspurger Sep 14, 2020
e76a3c1
fixup
TomAugspurger Sep 14, 2020
49dff8a
Merge remote-tracking branch 'upstream/master' into dispatch-string-m…
TomAugspurger Sep 17, 2020
75831b3
fixup
TomAugspurger Sep 17, 2020
1cf54cc
doctest
TomAugspurger Sep 17, 2020
fc81ebe
docstrings
TomAugspurger Sep 17, 2020
6be1af6
typing
TomAugspurger Sep 17, 2020
95b3310
typing
TomAugspurger Sep 17, 2020
20a8705
wip
TomAugspurger Sep 21, 2020
136831a
Merge remote-tracking branch 'upstream/master' into dispatch-string-m…
TomAugspurger Sep 21, 2020
38c1611
wip
TomAugspurger Sep 22, 2020
ea27e57
Merge remote-tracking branch 'upstream/master' into dispatch-string-m…
TomAugspurger Sep 22, 2020
8d3aecd
Move to arrays
TomAugspurger Sep 22, 2020
d11c2ba
Fixup types
TomAugspurger Sep 22, 2020
349e281
test coverage
TomAugspurger Sep 22, 2020
c6b99cb
Merge remote-tracking branch 'upstream/master' into dispatch-string-m…
TomAugspurger Sep 22, 2020
b7ab130
fixup
TomAugspurger Sep 22, 2020
3b837d1
Merge remote-tracking branch 'upstream/master' into dispatch-string-m…
TomAugspurger Sep 22, 2020
28cf7e6
Merge remote-tracking branch 'upstream/master' into dispatch-string-m…
TomAugspurger Sep 23, 2020
6dcd44e
update docstring
TomAugspurger Sep 23, 2020
efb3e3d
document current implementation
TomAugspurger Sep 24, 2020
0da7031
typo
TomAugspurger Sep 24, 2020
35a97ab
Merge remote-tracking branch 'upstream/master' into dispatch-string-m…
TomAugspurger Sep 25, 2020
d681f99
fixup
TomAugspurger Sep 25, 2020
cc5ceed
Merge remote-tracking branch 'upstream/master' into dispatch-string-m…
TomAugspurger Sep 29, 2020
457c112
fixup
TomAugspurger Sep 29, 2020
58e1bb9
Merge remote-tracking branch 'upstream/master' into dispatch-string-m…
TomAugspurger Sep 29, 2020
cb2fb24
simplify inheritance
TomAugspurger Sep 29, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -2316,6 +2316,9 @@ def replace(self, to_replace, value, inplace: bool = False):
# ------------------------------------------------------------------------
# String methods interface
def _str_map(self, f, na_value=np.nan, dtype=np.dtype(object)):
# Optimization to apply the callable `f` to the categories once
# and rebuild the result by `take`ing from the result with the codes.
# Returns the same type as the object-dtype impelmentation though.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Returns the same type as the object-dtype impelmentation though.
# Returns the same type as the object-dtype implementation though.

from pandas.core.arrays import PandasArray

categories = self.categories
Expand Down
28 changes: 28 additions & 0 deletions pandas/core/strings/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,31 @@
"""
Implementation of pandas.Series.str and its interface.

* strings.accessor.StringMethods : Accessor for Series.str
* strings.base.BaseStringArrayMethods: Mixin ABC for EAs to implement str methods

Most methods on the StringMethods accessor follow the pattern:

1. extract the array from the series (or index)
2. Call that array's impelmentation of the string method
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Call that array's impelmentation of the string method
2. Call that array's implementation of the string method

3. Wrap the result (in a Series, index, or DataFrame)

Pandas extension arrays implementing string methods should inherit from
pandas.core.strings.base.BaseStringArrayMethods. This is an ABC defining
the various string methods. To avoid namespace clashes and pollution,
these are prefixed with `_str_`. So ``Series.str.upper()`` calls
``Series.array._str_upper()``. The interface isn't currently public
to other string extension arrays.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much more clear, thanks

# Pandas current implementation is in ObjectStringArrayMixin. This is designed
# to work on object-dtype ndarrays.
#
# BaseStringArrayMethods
# - ObjectStringArrayMixin
# - StringArray
# - PandasArray
# - Categorical

from .accessor import StringMethods
from .base import BaseStringArrayMethods

Expand Down
2 changes: 2 additions & 0 deletions pandas/core/strings/accessor.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,8 @@ class StringMethods(NoNewAttributesMixin):
dtype: object
"""

# Note: see the docstring in pandas.core.strings.__init__
# for an explanation of the implementation.
# TODO: Dispatch all the methods
# Currently the following are not dispatched to the array
# * cat
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/strings/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ class BaseStringArrayMethods(abc.ABC):
"""
Base class for extension arrays implementing string methods.

This is our ExtensionArrays can override the implementation of
Series.str.<method>. We don't currenlty expect this to work with
This is where our ExtensionArrays can override the implementation of
Series.str.<method>. We don't expect this to work with
3rd-party extension arrays.

* User calls Series.str.<method>
Expand Down