Skip to content

Commit 030f613

Browse files
committed
Merge pull request #4092 from jtratner/refactor_string_special_methods
CLN: Refactor string special methods
2 parents a16f243 + a558314 commit 030f613

24 files changed

+201
-310
lines changed

doc/source/release.rst

+11-1
Original file line numberDiff line numberDiff line change
@@ -175,8 +175,18 @@ pandas 0.12
175175
``bs4`` + ``html5lib`` when lxml fails to parse. a list of parsers to try
176176
until success is also valid
177177
- more consistency in the to_datetime return types (give string/array of string inputs) (:issue:`3888`)
178+
- The internal ``pandas`` class hierarchy has changed (slightly). The
179+
previous ``PandasObject`` now is called ``PandasContainer`` and a new
180+
``PandasObject`` has become the baseclass for ``PandasContainer`` as well
181+
as ``Index``, ``Categorical``, ``GroupBy``, ``SparseList``, and
182+
``SparseArray`` (+ their base classes). Currently, ``PandasObject``
183+
provides string methods (from ``StringMixin``). (:issue:`4090`, :issue:`4092`)
184+
- New ``StringMixin`` that, given a ``__unicode__`` method, gets python 2 and
185+
python 3 compatible string methods (``__str__``, ``__bytes__``, and
186+
``__repr__``). Plus string safety throughout. Now employed in many places
187+
throughout the pandas library. (:issue:`4090`, :issue:`4092`)
178188

179-
**Experimental Feautres**
189+
**Experimental Features**
180190

181191
- Added experimental ``CustomBusinessDay`` class to support ``DateOffsets``
182192
with custom holiday calendars and custom weekmasks. (:issue:`2301`)

doc/source/v0.12.0.txt

+33-21
Original file line numberDiff line numberDiff line change
@@ -8,13 +8,13 @@ enhancements along with a large number of bug fixes.
88

99
Highlites include a consistent I/O API naming scheme, routines to read html,
1010
write multi-indexes to csv files, read & write STATA data files, read & write JSON format
11-
files, Python 3 support for ``HDFStore``, filtering of groupby expressions via ``filter``, and a
11+
files, Python 3 support for ``HDFStore``, filtering of groupby expressions via ``filter``, and a
1212
revamped ``replace`` routine that accepts regular expressions.
1313

1414
API changes
1515
~~~~~~~~~~~
1616

17-
- The I/O API is now much more consistent with a set of top level ``reader`` functions
17+
- The I/O API is now much more consistent with a set of top level ``reader`` functions
1818
accessed like ``pd.read_csv()`` that generally return a ``pandas`` object.
1919

2020
* ``read_csv``
@@ -38,7 +38,7 @@ API changes
3838
* ``to_clipboard``
3939

4040

41-
- Fix modulo and integer division on Series,DataFrames to act similary to ``float`` dtypes to return
41+
- Fix modulo and integer division on Series,DataFrames to act similary to ``float`` dtypes to return
4242
``np.nan`` or ``np.inf`` as appropriate (:issue:`3590`). This correct a numpy bug that treats ``integer``
4343
and ``float`` dtypes differently.
4444

@@ -50,15 +50,15 @@ API changes
5050
p / p
5151
p / 0
5252

53-
- Add ``squeeze`` keyword to ``groupby`` to allow reduction from
53+
- Add ``squeeze`` keyword to ``groupby`` to allow reduction from
5454
DataFrame -> Series if groups are unique. This is a Regression from 0.10.1.
55-
We are reverting back to the prior behavior. This means groupby will return the
56-
same shaped objects whether the groups are unique or not. Revert this issue (:issue:`2893`)
55+
We are reverting back to the prior behavior. This means groupby will return the
56+
same shaped objects whether the groups are unique or not. Revert this issue (:issue:`2893`)
5757
with (:issue:`3596`).
5858

5959
.. ipython:: python
6060

61-
df2 = DataFrame([{"val1": 1, "val2" : 20}, {"val1":1, "val2": 19},
61+
df2 = DataFrame([{"val1": 1, "val2" : 20}, {"val1":1, "val2": 19},
6262
{"val1":1, "val2": 27}, {"val1":1, "val2": 12}])
6363
def func(dataf):
6464
return dataf["val2"] - dataf["val2"].mean()
@@ -96,9 +96,9 @@ API changes
9696
and thus you should cast to an appropriate numeric dtype if you need to
9797
plot something.
9898

99-
- Add ``colormap`` keyword to DataFrame plotting methods. Accepts either a
100-
matplotlib colormap object (ie, matplotlib.cm.jet) or a string name of such
101-
an object (ie, 'jet'). The colormap is sampled to select the color for each
99+
- Add ``colormap`` keyword to DataFrame plotting methods. Accepts either a
100+
matplotlib colormap object (ie, matplotlib.cm.jet) or a string name of such
101+
an object (ie, 'jet'). The colormap is sampled to select the color for each
102102
column. Please see :ref:`visualization.colormaps` for more information.
103103
(:issue:`3860`)
104104

@@ -159,6 +159,18 @@ API changes
159159
``bs4`` + ``html5lib`` when lxml fails to parse. a list of parsers to try
160160
until success is also valid
161161

162+
- The internal ``pandas`` class hierarchy has changed (slightly). The
163+
previous ``PandasObject`` now is called ``PandasContainer`` and a new
164+
``PandasObject`` has become the baseclass for ``PandasContainer`` as well
165+
as ``Index``, ``Categorical``, ``GroupBy``, ``SparseList``, and
166+
``SparseArray`` (+ their base classes). Currently, ``PandasObject``
167+
provides string methods (from ``StringMixin``). (:issue:`4090`, :issue:`4092`)
168+
169+
- New ``StringMixin`` that, given a ``__unicode__`` method, gets python 2 and
170+
python 3 compatible string methods (``__str__``, ``__bytes__``, and
171+
``__repr__``). Plus string safety throughout. Now employed in many places
172+
throughout the pandas library. (:issue:`4090`, :issue:`4092`)
173+
162174
I/O Enhancements
163175
~~~~~~~~~~~~~~~~
164176

@@ -184,7 +196,7 @@ I/O Enhancements
184196

185197
.. warning::
186198

187-
You may have to install an older version of BeautifulSoup4,
199+
You may have to install an older version of BeautifulSoup4,
188200
:ref:`See the installation docs<install.optional_dependencies>`
189201

190202
- Added module for reading and writing Stata files: ``pandas.io.stata`` (:issue:`1512`)
@@ -203,15 +215,15 @@ I/O Enhancements
203215
- The option, ``tupleize_cols`` can now be specified in both ``to_csv`` and
204216
``read_csv``, to provide compatiblity for the pre 0.12 behavior of
205217
writing and reading multi-index columns via a list of tuples. The default in
206-
0.12 is to write lists of tuples and *not* interpret list of tuples as a
207-
multi-index column.
218+
0.12 is to write lists of tuples and *not* interpret list of tuples as a
219+
multi-index column.
208220

209221
Note: The default behavior in 0.12 remains unchanged, but starting with 0.13,
210-
the default *to* write and read multi-index columns will be in the new
222+
the default *to* write and read multi-index columns will be in the new
211223
format. (:issue:`3571`, :issue:`1651`, :issue:`3141`)
212224

213225
- If an ``index_col`` is not specified (e.g. you don't have an index, or wrote it
214-
with ``df.to_csv(..., index=False``), then any ``names`` on the columns index will
226+
with ``df.to_csv(..., index=False``), then any ``names`` on the columns index will
215227
be *lost*.
216228

217229
.. ipython:: python
@@ -296,8 +308,8 @@ Other Enhancements
296308
pd.get_option('a.b')
297309
pd.get_option('b.c')
298310

299-
- The ``filter`` method for group objects returns a subset of the original
300-
object. Suppose we want to take only elements that belong to groups with a
311+
- The ``filter`` method for group objects returns a subset of the original
312+
object. Suppose we want to take only elements that belong to groups with a
301313
group sum greater than 2.
302314

303315
.. ipython:: python
@@ -317,7 +329,7 @@ Other Enhancements
317329
dff.groupby('B').filter(lambda x: len(x) > 2)
318330

319331
Alternatively, instead of dropping the offending groups, we can return a
320-
like-indexed objects where the groups that do not pass the filter are
332+
like-indexed objects where the groups that do not pass the filter are
321333
filled with NaNs.
322334

323335
.. ipython:: python
@@ -333,9 +345,9 @@ Experimental Features
333345

334346
- Added experimental ``CustomBusinessDay`` class to support ``DateOffsets``
335347
with custom holiday calendars and custom weekmasks. (:issue:`2301`)
336-
348+
337349
.. note::
338-
350+
339351
This uses the ``numpy.busdaycalendar`` API introduced in Numpy 1.7 and
340352
therefore requires Numpy 1.7.0 or newer.
341353

@@ -416,7 +428,7 @@ Bug Fixes
416428
- Extend ``reindex`` to correctly deal with non-unique indices (:issue:`3679`)
417429
- ``DataFrame.itertuples()`` now works with frames with duplicate column
418430
names (:issue:`3873`)
419-
- Bug in non-unique indexing via ``iloc`` (:issue:`4017`); added ``takeable`` argument to
431+
- Bug in non-unique indexing via ``iloc`` (:issue:`4017`); added ``takeable`` argument to
420432
``reindex`` for location-based taking
421433

422434
- ``DataFrame.from_records`` did not accept empty recarrays (:issue:`3682`)

pandas/core/base.py

+58
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
"""
2+
Base class(es) for all pandas objects.
3+
"""
4+
from pandas.util import py3compat
5+
6+
class StringMixin(object):
7+
"""implements string methods so long as object defines a `__unicode__` method.
8+
Handles Python2/3 compatibility transparently."""
9+
# side note - this could be made into a metaclass if more than one object nees
10+
def __str__(self):
11+
"""
12+
Return a string representation for a particular object.
13+
14+
Invoked by str(obj) in both py2/py3.
15+
Yields Bytestring in Py2, Unicode String in py3.
16+
"""
17+
18+
if py3compat.PY3:
19+
return self.__unicode__()
20+
return self.__bytes__()
21+
22+
def __bytes__(self):
23+
"""
24+
Return a string representation for a particular object.
25+
26+
Invoked by bytes(obj) in py3 only.
27+
Yields a bytestring in both py2/py3.
28+
"""
29+
from pandas.core.config import get_option
30+
31+
encoding = get_option("display.encoding")
32+
return self.__unicode__().encode(encoding, 'replace')
33+
34+
def __repr__(self):
35+
"""
36+
Return a string representation for a particular object.
37+
38+
Yields Bytestring in Py2, Unicode String in py3.
39+
"""
40+
return str(self)
41+
42+
class PandasObject(StringMixin):
43+
"""baseclass for various pandas objects"""
44+
45+
@property
46+
def _constructor(self):
47+
"""class constructor (for this class it's just `__class__`"""
48+
return self.__class__
49+
50+
def __unicode__(self):
51+
"""
52+
Return a string representation for a particular object.
53+
54+
Invoked by unicode(obj) in py2 only. Yields a Unicode String in both
55+
py2/py3.
56+
"""
57+
# Should be overwritten by base classes
58+
return object.__repr__(self)

pandas/core/categorical.py

+6-6
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
import numpy as np
44

55
from pandas.core.algorithms import factorize
6+
from pandas.core.base import PandasObject
67
from pandas.core.index import Index
78
import pandas.core.common as com
89
from pandas.core.frame import DataFrame
@@ -25,8 +26,7 @@ def f(self, other):
2526

2627
return f
2728

28-
29-
class Categorical(object):
29+
class Categorical(PandasObject):
3030
"""
3131
Represents a categorical variable in classic R / S-plus fashion
3232
@@ -134,9 +134,9 @@ def __array__(self, dtype=None):
134134
def __len__(self):
135135
return len(self.labels)
136136

137-
def __repr__(self):
137+
def __unicode__(self):
138138
temp = 'Categorical: %s\n%s\n%s'
139-
values = np.asarray(self)
139+
values = com.pprint_thing(np.asarray(self))
140140
levheader = 'Levels (%d): ' % len(self.levels)
141141
levstring = np.array_repr(self.levels,
142142
max_line_width=60)
@@ -145,9 +145,9 @@ def __repr__(self):
145145
lines = levstring.split('\n')
146146
levstring = '\n'.join([lines[0]] +
147147
[indent + x.lstrip() for x in lines[1:]])
148+
name = '' if self.name is None else self.name
149+
return temp % (name, values, levheader + levstring)
148150

149-
return temp % ('' if self.name is None else self.name,
150-
repr(values), levheader + levstring)
151151

152152
def __getitem__(self, key):
153153
if isinstance(key, (int, np.integer)):

pandas/core/common.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -64,10 +64,10 @@ def _isnull_new(obj):
6464
if lib.isscalar(obj):
6565
return lib.checknull(obj)
6666

67-
from pandas.core.generic import PandasObject
67+
from pandas.core.generic import PandasContainer
6868
if isinstance(obj, np.ndarray):
6969
return _isnull_ndarraylike(obj)
70-
elif isinstance(obj, PandasObject):
70+
elif isinstance(obj, PandasContainer):
7171
# TODO: optimize for DataFrame, etc.
7272
return obj.apply(isnull)
7373
elif isinstance(obj, list) or hasattr(obj, '__array__'):
@@ -91,10 +91,10 @@ def _isnull_old(obj):
9191
if lib.isscalar(obj):
9292
return lib.checknull_old(obj)
9393

94-
from pandas.core.generic import PandasObject
94+
from pandas.core.generic import PandasContainer
9595
if isinstance(obj, np.ndarray):
9696
return _isnull_ndarraylike_old(obj)
97-
elif isinstance(obj, PandasObject):
97+
elif isinstance(obj, PandasContainer):
9898
# TODO: optimize for DataFrame, etc.
9999
return obj.apply(_isnull_old)
100100
elif isinstance(obj, list) or hasattr(obj, '__array__'):

pandas/core/frame.py

-34
Original file line numberDiff line numberDiff line change
@@ -584,10 +584,6 @@ def _verbose_info(self, value):
584584
def axes(self):
585585
return [self.index, self.columns]
586586

587-
@property
588-
def _constructor(self):
589-
return self.__class__
590-
591587
@property
592588
def shape(self):
593589
return (len(self.index), len(self.columns))
@@ -653,28 +649,6 @@ def _repr_fits_horizontal_(self,ignore_width=False):
653649

654650
return repr_width < width
655651

656-
def __str__(self):
657-
"""
658-
Return a string representation for a particular DataFrame
659-
660-
Invoked by str(df) in both py2/py3.
661-
Yields Bytestring in Py2, Unicode String in py3.
662-
"""
663-
664-
if py3compat.PY3:
665-
return self.__unicode__()
666-
return self.__bytes__()
667-
668-
def __bytes__(self):
669-
"""
670-
Return a string representation for a particular DataFrame
671-
672-
Invoked by bytes(df) in py3 only.
673-
Yields a bytestring in both py2/py3.
674-
"""
675-
encoding = com.get_option("display.encoding")
676-
return self.__unicode__().encode(encoding, 'replace')
677-
678652
def __unicode__(self):
679653
"""
680654
Return a string representation for a particular DataFrame
@@ -714,14 +688,6 @@ def __unicode__(self):
714688

715689
return value
716690

717-
def __repr__(self):
718-
"""
719-
Return a string representation for a particular DataFrame
720-
721-
Yields Bytestring in Py2, Unicode String in py3.
722-
"""
723-
return str(self)
724-
725691
def _repr_html_(self):
726692
"""
727693
Return a html representation for a particular DataFrame.

0 commit comments

Comments
 (0)