Skip to content

BUG/CLN: clarified timedelta inferences #5995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 20, 2014
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,8 @@ API Changes
when detecting chained assignment, related (:issue:`5938`)
- DataFrame.head(0) returns self instead of empty frame (:issue:`5846`)
- ``autocorrelation_plot`` now accepts ``**kwargs``. (:issue:`5623`)
- ``convert_objects`` now accepts a ``convert_timedeltas='coerce'`` argument to allow forced dtype conversion of
timedeltas (:issue:`5458`,:issue:`5689`)

Experimental Features
~~~~~~~~~~~~~~~~~~~~~
Expand All @@ -78,12 +80,13 @@ Improvements to existing features
- support ``dtypes`` property on ``Series/Panel/Panel4D``
- extend ``Panel.apply`` to allow arbitrary functions (rather than only ufuncs) (:issue:`1148`)
allow multiple axes to be used to operate on slabs of a ``Panel``
- The ``ArrayFormatter``s for ``datetime`` and ``timedelta64`` now intelligently
- The ``ArrayFormatter`` for ``datetime`` and ``timedelta64`` now intelligently
limit precision based on the values in the array (:issue:`3401`)
- pd.show_versions() is now available for convenience when reporting issues.
- perf improvements to Series.str.extract (:issue:`5944`)
- perf improvments in ``dtypes/ftypes`` methods (:issue:`5968`)
- perf improvments in indexing with object dtypes (:issue:`5968`)
- improved dtype inference for ``timedelta`` like passed to constructors (:issue:`5458`,:issue:`5689`)

.. _release.bug_fixes-0.13.1:

Expand Down Expand Up @@ -122,6 +125,7 @@ Bug Fixes
- Recent changes in IPython cause warnings to be emitted when using previous versions
of pandas in QTConsole, now fixed. If you're using an older version and
need to supress the warnings, see (:issue:`5922`).
- Bug in merging ``timedelta`` dtypes (:issue:`5695`)

pandas 0.13.0
-------------
Expand Down
44 changes: 22 additions & 22 deletions doc/source/v0.13.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
v0.13.1 (???)
-------------

This is a major release from 0.13.0 and includes a number of API changes, several new features and
This is a minor release from 0.13.0 and includes a number of API changes, several new features and
enhancements along with a large number of bug fixes.

Highlights include:
Expand All @@ -29,6 +29,27 @@ Deprecations
Enhancements
~~~~~~~~~~~~

- The ``ArrayFormatter`` for ``datetime`` and ``timedelta64`` now intelligently
limit precision based on the values in the array (:issue:`3401`)

Previously output might look like:

.. code-block:: python

age today diff
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00

Now the output looks like:

.. ipython:: python

df = DataFrame([ Timestamp('20010101'),
Timestamp('20040601') ], columns=['age'])
df['today'] = Timestamp('20130419')
df['diff'] = df['today']-df['age']
df

- ``Panel.apply`` will work on non-ufuncs. See :ref:`the docs<basics.apply_panel>`.

.. ipython:: python
Expand Down Expand Up @@ -83,27 +104,6 @@ Enhancements
result
result.loc[:,:,'ItemA']

- The ``ArrayFormatter``s for ``datetime`` and ``timedelta64`` now intelligently
limit precision based on the values in the array (:issue:`3401`)

Previously output might look like:

.. code-block:: python

age today diff
0 2001-01-01 00:00:00 2013-04-19 00:00:00 4491 days, 00:00:00
1 2004-06-01 00:00:00 2013-04-19 00:00:00 3244 days, 00:00:00

Now the output looks like:

.. ipython:: python

df = DataFrame([ Timestamp('20010101'),
Timestamp('20040601') ], columns=['age'])
df['today'] = Timestamp('20130419')
df['diff'] = df['today']-df['age']
df

Experimental
~~~~~~~~~~~~

Expand Down
23 changes: 20 additions & 3 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1514,7 +1514,8 @@ def _values_from_object(o):


def _possibly_convert_objects(values, convert_dates=True,
convert_numeric=True):
convert_numeric=True,
convert_timedeltas=True):
""" if we have an object dtype, try to coerce dates and/or numbers """

# if we have passed in a list or scalar
Expand All @@ -1539,6 +1540,22 @@ def _possibly_convert_objects(values, convert_dates=True,
values = lib.maybe_convert_objects(
values, convert_datetime=convert_dates)

# convert timedeltas
if convert_timedeltas and values.dtype == np.object_:

if convert_timedeltas == 'coerce':
from pandas.tseries.timedeltas import \
_possibly_cast_to_timedelta
values = _possibly_cast_to_timedelta(values, coerce=True)

# if we are all nans then leave me alone
if not isnull(new_values).all():
values = new_values

else:
values = lib.maybe_convert_objects(
values, convert_timedelta=convert_timedeltas)

# convert to numeric
if values.dtype == np.object_:
if convert_numeric:
Expand Down Expand Up @@ -1624,7 +1641,7 @@ def _possibly_cast_to_datetime(value, dtype, coerce=False):
elif is_timedelta64:
from pandas.tseries.timedeltas import \
_possibly_cast_to_timedelta
value = _possibly_cast_to_timedelta(value)
value = _possibly_cast_to_timedelta(value, coerce='compat')
except:
pass

Expand Down Expand Up @@ -1655,7 +1672,7 @@ def _possibly_cast_to_datetime(value, dtype, coerce=False):
elif inferred_type in ['timedelta', 'timedelta64']:
from pandas.tseries.timedeltas import \
_possibly_cast_to_timedelta
value = _possibly_cast_to_timedelta(value)
value = _possibly_cast_to_timedelta(value, coerce='compat')

return value

Expand Down
2 changes: 1 addition & 1 deletion pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -3626,7 +3626,7 @@ def append(self, other, ignore_index=False, verify_integrity=False):
index = None if other.name is None else [other.name]
other = other.reindex(self.columns, copy=False)
other = DataFrame(other.values.reshape((1, len(other))),
index=index, columns=self.columns)
index=index, columns=self.columns).convert_objects()
elif isinstance(other, list) and not isinstance(other[0], DataFrame):
other = DataFrame(other)
if (self.columns.get_indexer(other.columns) >= 0).all():
Expand Down
28 changes: 15 additions & 13 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1844,16 +1844,18 @@ def copy(self, deep=True):
return self._constructor(data).__finalize__(self)

def convert_objects(self, convert_dates=True, convert_numeric=False,
copy=True):
convert_timedeltas=True, copy=True):
"""
Attempt to infer better dtype for object columns
Parameters
----------
convert_dates : if True, attempt to soft convert_dates, if 'coerce',
convert_dates : if True, attempt to soft convert dates, if 'coerce',
force conversion (and non-convertibles get NaT)
convert_numeric : if True attempt to coerce to numbers (including
strings), non-convertibles get NaN
convert_timedeltas : if True, attempt to soft convert timedeltas, if 'coerce',
force conversion (and non-convertibles get NaT)
copy : Boolean, if True, return copy, default is True
Returns
Expand All @@ -1863,6 +1865,7 @@ def convert_objects(self, convert_dates=True, convert_numeric=False,
return self._constructor(
self._data.convert(convert_dates=convert_dates,
convert_numeric=convert_numeric,
convert_timedeltas=convert_timedeltas,
copy=copy)).__finalize__(self)

#----------------------------------------------------------------------
Expand Down Expand Up @@ -3174,23 +3177,22 @@ def abs(self):
-------
abs: type of caller
"""
obj = np.abs(self)

# suprimo numpy 1.6 hacking
# for timedeltas
if _np_version_under1p7:

def _convert_timedeltas(x):
if x.dtype.kind == 'm':
return np.abs(x.view('i8')).astype(x.dtype)
return np.abs(x)

if self.ndim == 1:
if obj.dtype == 'm8[us]':
obj = obj.astype('m8[ns]')
return _convert_timedeltas(self)
elif self.ndim == 2:
def f(x):
if x.dtype == 'm8[us]':
x = x.astype('m8[ns]')
return x

if 'm8[us]' in obj.dtypes.values:
obj = obj.apply(f)
return self.apply(_convert_timedeltas)

return obj
return np.abs(self)

def pct_change(self, periods=1, fill_method='pad', limit=None, freq=None,
**kwds):
Expand Down
7 changes: 4 additions & 3 deletions pandas/core/internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -1315,8 +1315,8 @@ def is_bool(self):
"""
return lib.is_bool_array(self.values.ravel())

def convert(self, convert_dates=True, convert_numeric=True, copy=True,
by_item=True):
def convert(self, convert_dates=True, convert_numeric=True, convert_timedeltas=True,
copy=True, by_item=True):
""" attempt to coerce any object types to better types
return a copy of the block (if copy = True)
by definition we ARE an ObjectBlock!!!!!
Expand All @@ -1334,7 +1334,8 @@ def convert(self, convert_dates=True, convert_numeric=True, copy=True,

values = com._possibly_convert_objects(
values.ravel(), convert_dates=convert_dates,
convert_numeric=convert_numeric
convert_numeric=convert_numeric,
convert_timedeltas=convert_timedeltas,
).reshape(values.shape)
values = _block_shape(values, ndim=self.ndim)
items = self.items.take([i])
Expand Down
4 changes: 2 additions & 2 deletions pandas/lib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ from cpython cimport (PyDict_New, PyDict_GetItem, PyDict_SetItem,
Py_INCREF, PyTuple_SET_ITEM,
PyList_Check, PyFloat_Check,
PyString_Check,
PyBytes_Check,
PyBytes_Check,
PyTuple_SetItem,
PyTuple_New,
PyObject_SetAttrString)
Expand All @@ -31,7 +31,7 @@ from datetime import datetime as pydatetime
# this is our tseries.pxd
from datetime cimport *

from tslib cimport convert_to_tsobject
from tslib cimport convert_to_tsobject, convert_to_timedelta64
import tslib
from tslib import NaT, Timestamp, repr_timedelta64

Expand Down
1 change: 1 addition & 0 deletions pandas/src/datetime.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ cdef extern from "datetime.h":
bint PyDateTime_Check(object o)
bint PyDate_Check(object o)
bint PyTime_Check(object o)
bint PyDelta_Check(object o)
object PyDateTime_FromDateAndTime(int year, int month, int day, int hour,
int minute, int second, int us)

Expand Down
Loading