Skip to content

CLN: Refactor string special methods #4092

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 2, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -175,8 +175,18 @@ pandas 0.12
``bs4`` + ``html5lib`` when lxml fails to parse. a list of parsers to try
until success is also valid
- more consistency in the to_datetime return types (give string/array of string inputs) (:issue:`3888`)
- The internal ``pandas`` class hierarchy has changed (slightly). The
previous ``PandasObject`` now is called ``PandasContainer`` and a new
``PandasObject`` has become the baseclass for ``PandasContainer`` as well
as ``Index``, ``Categorical``, ``GroupBy``, ``SparseList``, and
``SparseArray`` (+ their base classes). Currently, ``PandasObject``
provides string methods (from ``StringMixin``). (:issue:`4090`, :issue:`4092`)
- New ``StringMixin`` that, given a ``__unicode__`` method, gets python 2 and
python 3 compatible string methods (``__str__``, ``__bytes__``, and
``__repr__``). Plus string safety throughout. Now employed in many places
throughout the pandas library. (:issue:`4090`, :issue:`4092`)

**Experimental Feautres**
**Experimental Features**

- Added experimental ``CustomBusinessDay`` class to support ``DateOffsets``
with custom holiday calendars and custom weekmasks. (:issue:`2301`)
Expand Down
54 changes: 33 additions & 21 deletions doc/source/v0.12.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ enhancements along with a large number of bug fixes.

Highlites include a consistent I/O API naming scheme, routines to read html,
write multi-indexes to csv files, read & write STATA data files, read & write JSON format
files, Python 3 support for ``HDFStore``, filtering of groupby expressions via ``filter``, and a
files, Python 3 support for ``HDFStore``, filtering of groupby expressions via ``filter``, and a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpcloud is it worth it to fix the trailing whitespace? Easy to remove the fix, but it was bothering me a little.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure why not? i suppose it makes it more difficult to review...but for docs i don't think it's a big deal

revamped ``replace`` routine that accepts regular expressions.

API changes
~~~~~~~~~~~

- The I/O API is now much more consistent with a set of top level ``reader`` functions
- The I/O API is now much more consistent with a set of top level ``reader`` functions
accessed like ``pd.read_csv()`` that generally return a ``pandas`` object.

* ``read_csv``
Expand All @@ -38,7 +38,7 @@ API changes
* ``to_clipboard``


- Fix modulo and integer division on Series,DataFrames to act similary to ``float`` dtypes to return
- Fix modulo and integer division on Series,DataFrames to act similary to ``float`` dtypes to return
``np.nan`` or ``np.inf`` as appropriate (:issue:`3590`). This correct a numpy bug that treats ``integer``
and ``float`` dtypes differently.

Expand All @@ -50,15 +50,15 @@ API changes
p / p
p / 0

- Add ``squeeze`` keyword to ``groupby`` to allow reduction from
- Add ``squeeze`` keyword to ``groupby`` to allow reduction from
DataFrame -> Series if groups are unique. This is a Regression from 0.10.1.
We are reverting back to the prior behavior. This means groupby will return the
same shaped objects whether the groups are unique or not. Revert this issue (:issue:`2893`)
We are reverting back to the prior behavior. This means groupby will return the
same shaped objects whether the groups are unique or not. Revert this issue (:issue:`2893`)
with (:issue:`3596`).

.. ipython:: python

df2 = DataFrame([{"val1": 1, "val2" : 20}, {"val1":1, "val2": 19},
df2 = DataFrame([{"val1": 1, "val2" : 20}, {"val1":1, "val2": 19},
{"val1":1, "val2": 27}, {"val1":1, "val2": 12}])
def func(dataf):
return dataf["val2"] - dataf["val2"].mean()
Expand Down Expand Up @@ -96,9 +96,9 @@ API changes
and thus you should cast to an appropriate numeric dtype if you need to
plot something.

- Add ``colormap`` keyword to DataFrame plotting methods. Accepts either a
matplotlib colormap object (ie, matplotlib.cm.jet) or a string name of such
an object (ie, 'jet'). The colormap is sampled to select the color for each
- Add ``colormap`` keyword to DataFrame plotting methods. Accepts either a
matplotlib colormap object (ie, matplotlib.cm.jet) or a string name of such
an object (ie, 'jet'). The colormap is sampled to select the color for each
column. Please see :ref:`visualization.colormaps` for more information.
(:issue:`3860`)

Expand Down Expand Up @@ -159,6 +159,18 @@ API changes
``bs4`` + ``html5lib`` when lxml fails to parse. a list of parsers to try
until success is also valid

- The internal ``pandas`` class hierarchy has changed (slightly). The
previous ``PandasObject`` now is called ``PandasContainer`` and a new
``PandasObject`` has become the baseclass for ``PandasContainer`` as well
as ``Index``, ``Categorical``, ``GroupBy``, ``SparseList``, and
``SparseArray`` (+ their base classes). Currently, ``PandasObject``
provides string methods (from ``StringMixin``). (:issue:`4090`, :issue:`4092`)

- New ``StringMixin`` that, given a ``__unicode__`` method, gets python 2 and
python 3 compatible string methods (``__str__``, ``__bytes__``, and
``__repr__``). Plus string safety throughout. Now employed in many places
throughout the pandas library. (:issue:`4090`, :issue:`4092`)

I/O Enhancements
~~~~~~~~~~~~~~~~

Expand All @@ -184,7 +196,7 @@ I/O Enhancements

.. warning::

You may have to install an older version of BeautifulSoup4,
You may have to install an older version of BeautifulSoup4,
:ref:`See the installation docs<install.optional_dependencies>`

- Added module for reading and writing Stata files: ``pandas.io.stata`` (:issue:`1512`)
Expand All @@ -203,15 +215,15 @@ I/O Enhancements
- The option, ``tupleize_cols`` can now be specified in both ``to_csv`` and
``read_csv``, to provide compatiblity for the pre 0.12 behavior of
writing and reading multi-index columns via a list of tuples. The default in
0.12 is to write lists of tuples and *not* interpret list of tuples as a
multi-index column.
0.12 is to write lists of tuples and *not* interpret list of tuples as a
multi-index column.

Note: The default behavior in 0.12 remains unchanged, but starting with 0.13,
the default *to* write and read multi-index columns will be in the new
the default *to* write and read multi-index columns will be in the new
format. (:issue:`3571`, :issue:`1651`, :issue:`3141`)

- If an ``index_col`` is not specified (e.g. you don't have an index, or wrote it
with ``df.to_csv(..., index=False``), then any ``names`` on the columns index will
with ``df.to_csv(..., index=False``), then any ``names`` on the columns index will
be *lost*.

.. ipython:: python
Expand Down Expand Up @@ -296,8 +308,8 @@ Other Enhancements
pd.get_option('a.b')
pd.get_option('b.c')

- The ``filter`` method for group objects returns a subset of the original
object. Suppose we want to take only elements that belong to groups with a
- The ``filter`` method for group objects returns a subset of the original
object. Suppose we want to take only elements that belong to groups with a
group sum greater than 2.

.. ipython:: python
Expand All @@ -317,7 +329,7 @@ Other Enhancements
dff.groupby('B').filter(lambda x: len(x) > 2)

Alternatively, instead of dropping the offending groups, we can return a
like-indexed objects where the groups that do not pass the filter are
like-indexed objects where the groups that do not pass the filter are
filled with NaNs.

.. ipython:: python
Expand All @@ -333,9 +345,9 @@ Experimental Features

- Added experimental ``CustomBusinessDay`` class to support ``DateOffsets``
with custom holiday calendars and custom weekmasks. (:issue:`2301`)

.. note::

This uses the ``numpy.busdaycalendar`` API introduced in Numpy 1.7 and
therefore requires Numpy 1.7.0 or newer.

Expand Down Expand Up @@ -416,7 +428,7 @@ Bug Fixes
- Extend ``reindex`` to correctly deal with non-unique indices (:issue:`3679`)
- ``DataFrame.itertuples()`` now works with frames with duplicate column
names (:issue:`3873`)
- Bug in non-unique indexing via ``iloc`` (:issue:`4017`); added ``takeable`` argument to
- Bug in non-unique indexing via ``iloc`` (:issue:`4017`); added ``takeable`` argument to
``reindex`` for location-based taking

- ``DataFrame.from_records`` did not accept empty recarrays (:issue:`3682`)
Expand Down
58 changes: 58 additions & 0 deletions pandas/core/base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
"""
Base class(es) for all pandas objects.
"""
from pandas.util import py3compat

class StringMixin(object):
"""implements string methods so long as object defines a `__unicode__` method.
Handles Python2/3 compatibility transparently."""
# side note - this could be made into a metaclass if more than one object nees
def __str__(self):
"""
Return a string representation for a particular object.

Invoked by str(obj) in both py2/py3.
Yields Bytestring in Py2, Unicode String in py3.
"""

if py3compat.PY3:
return self.__unicode__()
return self.__bytes__()

def __bytes__(self):
"""
Return a string representation for a particular object.

Invoked by bytes(obj) in py3 only.
Yields a bytestring in both py2/py3.
"""
from pandas.core.config import get_option

encoding = get_option("display.encoding")
return self.__unicode__().encode(encoding, 'replace')

def __repr__(self):
"""
Return a string representation for a particular object.

Yields Bytestring in Py2, Unicode String in py3.
"""
return str(self)

class PandasObject(StringMixin):
"""baseclass for various pandas objects"""

@property
def _constructor(self):
"""class constructor (for this class it's just `__class__`"""
return self.__class__

def __unicode__(self):
"""
Return a string representation for a particular object.

Invoked by unicode(obj) in py2 only. Yields a Unicode String in both
py2/py3.
"""
# Should be overwritten by base classes
return object.__repr__(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be super(PandasObject, self).__repr__()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. That would be an infinite loop. (because StringMixin calls __str__, etc.) That's the only awkward part of this setup...but for most PandasObjects, it works out okay.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh whoops blah sorry

12 changes: 6 additions & 6 deletions pandas/core/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import numpy as np

from pandas.core.algorithms import factorize
from pandas.core.base import PandasObject
from pandas.core.index import Index
import pandas.core.common as com
from pandas.core.frame import DataFrame
Expand All @@ -25,8 +26,7 @@ def f(self, other):

return f


class Categorical(object):
class Categorical(PandasObject):
"""
Represents a categorical variable in classic R / S-plus fashion

Expand Down Expand Up @@ -134,9 +134,9 @@ def __array__(self, dtype=None):
def __len__(self):
return len(self.labels)

def __repr__(self):
def __unicode__(self):
temp = 'Categorical: %s\n%s\n%s'
values = np.asarray(self)
values = com.pprint_thing(np.asarray(self))
levheader = 'Levels (%d): ' % len(self.levels)
levstring = np.array_repr(self.levels,
max_line_width=60)
Expand All @@ -145,9 +145,9 @@ def __repr__(self):
lines = levstring.split('\n')
levstring = '\n'.join([lines[0]] +
[indent + x.lstrip() for x in lines[1:]])
name = '' if self.name is None else self.name
return temp % (name, values, levheader + levstring)

return temp % ('' if self.name is None else self.name,
repr(values), levheader + levstring)

def __getitem__(self, key):
if isinstance(key, (int, np.integer)):
Expand Down
8 changes: 4 additions & 4 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -64,10 +64,10 @@ def _isnull_new(obj):
if lib.isscalar(obj):
return lib.checknull(obj)

from pandas.core.generic import PandasObject
from pandas.core.generic import PandasContainer
if isinstance(obj, np.ndarray):
return _isnull_ndarraylike(obj)
elif isinstance(obj, PandasObject):
elif isinstance(obj, PandasContainer):
# TODO: optimize for DataFrame, etc.
return obj.apply(isnull)
elif isinstance(obj, list) or hasattr(obj, '__array__'):
Expand All @@ -91,10 +91,10 @@ def _isnull_old(obj):
if lib.isscalar(obj):
return lib.checknull_old(obj)

from pandas.core.generic import PandasObject
from pandas.core.generic import PandasContainer
if isinstance(obj, np.ndarray):
return _isnull_ndarraylike_old(obj)
elif isinstance(obj, PandasObject):
elif isinstance(obj, PandasContainer):
# TODO: optimize for DataFrame, etc.
return obj.apply(_isnull_old)
elif isinstance(obj, list) or hasattr(obj, '__array__'):
Expand Down
34 changes: 0 additions & 34 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -584,10 +584,6 @@ def _verbose_info(self, value):
def axes(self):
return [self.index, self.columns]

@property
def _constructor(self):
return self.__class__

@property
def shape(self):
return (len(self.index), len(self.columns))
Expand Down Expand Up @@ -653,28 +649,6 @@ def _repr_fits_horizontal_(self,ignore_width=False):

return repr_width < width

def __str__(self):
"""
Return a string representation for a particular DataFrame

Invoked by str(df) in both py2/py3.
Yields Bytestring in Py2, Unicode String in py3.
"""

if py3compat.PY3:
return self.__unicode__()
return self.__bytes__()

def __bytes__(self):
"""
Return a string representation for a particular DataFrame

Invoked by bytes(df) in py3 only.
Yields a bytestring in both py2/py3.
"""
encoding = com.get_option("display.encoding")
return self.__unicode__().encode(encoding, 'replace')

def __unicode__(self):
"""
Return a string representation for a particular DataFrame
Expand Down Expand Up @@ -714,14 +688,6 @@ def __unicode__(self):

return value

def __repr__(self):
"""
Return a string representation for a particular DataFrame

Yields Bytestring in Py2, Unicode String in py3.
"""
return str(self)

def _repr_html_(self):
"""
Return a html representation for a particular DataFrame.
Expand Down
Loading