Skip to content

BUG: various bug fixes for DataFrame/Series construction #2752

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 10, 2013
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions RELEASE.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,20 @@ Where to get it
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
* Documentation: http://pandas.pydata.org

**API Changes**

- Series now automatically will try to set the correct dtype based on passed datetimelike objects (datetime/Timestamp)
- timedelta64 are returned in appropriate cases (e.g. Series - Series, when both are datetime64)
- mixed datetimes and objects (GH2751_) in a constructor witll be casted correctly
- astype on datetimes to object are now handled (as well as NaT conversions to np.nan)

**Bug fixes**

- Single element ndarrays of datetimelike objects are handled (e.g. np.array(datetime(2001,1,1,0,0))), w/o dtype being passed
- 0-dim ndarrays with a passed dtype are handled correctly (e.g. np.array(0.,dtype='float32'))

.. _GH2751: https://github.com/pydata/pandas/issues/2751

pandas 0.10.1
=============

Expand Down
17 changes: 17 additions & 0 deletions doc/source/missing_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,23 @@ pandas provides the :func:`~pandas.core.common.isnull` and
missing by the ``isnull`` and ``notnull`` functions. ``inf`` and
``-inf`` are no longer considered missing by default.

Datetimes
---------

For datetime64[ns] types, ``NaT`` represents missing values. This is a pseudo-native
sentinal value that can be represented by numpy in a singular dtype (datetime64[ns]).
Pandas objects provide intercompatibility between ``NaT`` and ``NaN``.

.. ipython:: python

df2 = df.copy()
df2['timestamp'] = Timestamp('20120101')
df2
df2.ix[['a','c','h'],['one','timestamp']] = np.nan
df2
df2.get_dtype_counts()


Calculations with missing data
------------------------------

Expand Down
54 changes: 54 additions & 0 deletions doc/source/v0.10.2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
.. _whatsnew_0102:

v0.10.2 (February ??, 2013)
---------------------------

This is a minor release from 0.10.1 and includes many new features and
enhancements along with a large number of bug fixes. There are also a number of
important API changes that long-time pandas users should pay close attention
to.

API changes
~~~~~~~~~~~

Datetime64[ns] columns in a DataFrame (or a Series) allow the use of ``np.nan`` to indicate a nan value, in addition to the traditional ``NaT``, or not-a-time. This allows convenient nan setting in a generic way. Furthermore datetime64 columns are created by default, when passed datetimelike objects (*this change was introduced in 0.10.1*)

.. ipython:: python

df = DataFrame(randn(6,2),date_range('20010102',periods=6),columns=['A','B'])
df['timestamp'] = Timestamp('20010103')
df

# datetime64[ns] out of the box
df.get_dtype_counts()

# use the traditional nan, which is mapped to NaT internally
df.ix[2:4,['A','timestamp']] = np.nan
df

Astype conversion on datetime64[ns] to object, implicity converts ``NaT`` to ``np.nan``


.. ipython:: python

import datetime
s = Series([datetime.datetime(2001, 1, 2, 0, 0) for i in range(3)])
s.dtype
s[1] = np.nan
s
s.dtype
s = s.astype('O')
s
s.dtype

New features
~~~~~~~~~~~~

**Enhancements**

**Bug Fixes**

See the `full release notes
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
on GitHub for a complete list.

2 changes: 2 additions & 0 deletions doc/source/whatsnew.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ What's New

These are new features and improvements of note in each release.

.. include:: v0.10.2.txt

.. include:: v0.10.1.txt

.. include:: v0.10.0.txt
Expand Down
14 changes: 14 additions & 0 deletions pandas/core/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -654,6 +654,20 @@ def _possibly_cast_to_datetime(value, dtype):
except:
pass

elif dtype is None:
# we might have a array (or single object) that is datetime like, and no dtype is passed
# don't change the value unless we find a datetime set
v = value
if not (is_list_like(v) or hasattr(v,'len')):
v = [ v ]
if len(v):
inferred_type = lib.infer_dtype(v)
if inferred_type == 'datetime':
try:
value = tslib.array_to_datetime(np.array(v))
except:
pass

return value


Expand Down
6 changes: 3 additions & 3 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -4289,7 +4289,7 @@ def applymap(self, func):

# if we have a dtype == 'M8[ns]', provide boxed values
def infer(x):
if x.dtype == 'M8[ns]':
if com.is_datetime64_dtype(x):
x = lib.map_infer(x, lib.Timestamp)
return lib.map_infer(x, func)
return self.apply(infer)
Expand Down Expand Up @@ -4980,7 +4980,7 @@ def _get_agg_axis(self, axis_num):
def _get_numeric_data(self):
if self._is_mixed_type:
num_data = self._data.get_numeric_data()
return DataFrame(num_data, copy=False)
return DataFrame(num_data, index=self.index, copy=False)
else:
if (self.values.dtype != np.object_ and
not issubclass(self.values.dtype.type, np.datetime64)):
Expand All @@ -4991,7 +4991,7 @@ def _get_numeric_data(self):
def _get_bool_data(self):
if self._is_mixed_type:
bool_data = self._data.get_bool_data()
return DataFrame(bool_data, copy=False)
return DataFrame(bool_data, index=self.index, copy=False)
else: # pragma: no cover
if self.values.dtype == np.bool_:
return self
Expand Down
58 changes: 47 additions & 11 deletions pandas/core/series.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,17 +72,28 @@ def na_op(x, y):

def wrapper(self, other):
from pandas.core.frame import DataFrame
dtype = None
wrap_results = lambda x: x

lvalues, rvalues = self, other

if (com.is_datetime64_dtype(self) and
com.is_datetime64_dtype(other)):
if com.is_datetime64_dtype(self):

if not isinstance(rvalues, np.ndarray):
rvalues = np.array([rvalues])

# rhs is either a timedelta or a series/ndarray
if lib.is_timedelta_array(rvalues):
rvalues = np.array([ np.timedelta64(v) for v in rvalues ],dtype='timedelta64[ns]')
dtype = 'M8[ns]'
elif com.is_datetime64_dtype(rvalues):
dtype = 'timedelta64[ns]'
else:
raise ValueError("cannot operate on a series with out a rhs of a series/ndarray of type datetime64[ns] or a timedelta")

lvalues = lvalues.view('i8')
rvalues = rvalues.view('i8')

wrap_results = lambda rs: rs.astype('timedelta64[ns]')

if isinstance(rvalues, Series):
lvalues = lvalues.values
rvalues = rvalues.values
Expand All @@ -91,7 +102,7 @@ def wrapper(self, other):
if self.index.equals(other.index):
name = _maybe_match_name(self, other)
return Series(wrap_results(na_op(lvalues, rvalues)),
index=self.index, name=name)
index=self.index, name=name, dtype=dtype)

join_idx, lidx, ridx = self.index.join(other.index, how='outer',
return_indexers=True)
Expand All @@ -105,13 +116,13 @@ def wrapper(self, other):
arr = na_op(lvalues, rvalues)

name = _maybe_match_name(self, other)
return Series(arr, index=join_idx, name=name)
return Series(arr, index=join_idx, name=name,dtype=dtype)
elif isinstance(other, DataFrame):
return NotImplemented
else:
# scalars
return Series(na_op(lvalues.values, rvalues),
index=self.index, name=self.name)
index=self.index, name=self.name, dtype=dtype)
return wrapper


Expand Down Expand Up @@ -777,7 +788,7 @@ def astype(self, dtype):
See numpy.ndarray.astype
"""
casted = com._astype_nansafe(self.values, dtype)
return self._constructor(casted, index=self.index, name=self.name)
return self._constructor(casted, index=self.index, name=self.name, dtype=casted.dtype)

def convert_objects(self, convert_dates=True):
"""
Expand Down Expand Up @@ -1195,7 +1206,7 @@ def tolist(self):
Overrides numpy.ndarray.tolist
"""
if com.is_datetime64_dtype(self):
return self.astype(object).values.tolist()
return list(self)
return self.values.tolist()

def to_dict(self):
Expand Down Expand Up @@ -3083,8 +3094,12 @@ def _try_cast(arr):
raise TypeError('Cannot cast datetime64 to %s' % dtype)
else:
subarr = _try_cast(data)
elif copy:
else:
subarr = _try_cast(data)

if copy:
subarr = data.copy()

elif isinstance(data, list) and len(data) > 0:
if dtype is not None:
try:
Expand All @@ -3094,12 +3109,15 @@ def _try_cast(arr):
raise
subarr = np.array(data, dtype=object, copy=copy)
subarr = lib.maybe_convert_objects(subarr)
subarr = com._possibly_cast_to_datetime(subarr, dtype)
else:
subarr = lib.list_to_object_array(data)
subarr = lib.maybe_convert_objects(subarr)
subarr = com._possibly_cast_to_datetime(subarr, dtype)
else:
subarr = _try_cast(data)

# scalar like
if subarr.ndim == 0:
if isinstance(data, list): # pragma: no cover
subarr = np.array(data, dtype=object)
Expand All @@ -3115,7 +3133,14 @@ def _try_cast(arr):
dtype = np.object_

if dtype is None:
value, dtype = _dtype_from_scalar(value)

# a 1-element ndarray
if isinstance(value, np.ndarray):
dtype = value.dtype
value = value.item()
else:
value, dtype = _dtype_from_scalar(value)

subarr = np.empty(len(index), dtype=dtype)
else:
# need to possibly convert the value here
Expand All @@ -3124,6 +3149,17 @@ def _try_cast(arr):
subarr.fill(value)
else:
return subarr.item()

# the result that we want
elif subarr.ndim == 1:
if index is not None:

# a 1-element ndarray
if len(subarr) != len(index) and len(subarr) == 1:
value = subarr[0]
subarr = np.empty(len(index), dtype=subarr.dtype)
subarr.fill(value)

elif subarr.ndim > 1:
if isinstance(data, np.ndarray):
raise Exception('Data must be 1-dimensional')
Expand Down
11 changes: 11 additions & 0 deletions pandas/src/inference.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,17 @@ def is_datetime64_array(ndarray values):
return False
return True

def is_timedelta_array(ndarray values):
import datetime
cdef int i, n = len(values)
if n == 0:
return False
for i in range(n):
if not isinstance(values[i],datetime.timedelta):
return False
return True


def is_date_array(ndarray[object] values):
cdef int i, n = len(values)
if n == 0:
Expand Down
39 changes: 36 additions & 3 deletions pandas/tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ def _skip_if_no_scipy():

JOIN_TYPES = ['inner', 'outer', 'left', 'right']


class CheckIndexing(object):

_multiprocess_can_split_ = True
Expand Down Expand Up @@ -6484,14 +6483,18 @@ def test_get_X_columns(self):
['a', 'e']))

def test_get_numeric_data(self):
df = DataFrame({'a': 1., 'b': 2, 'c': 'foo'},

df = DataFrame({'a': 1., 'b': 2, 'c': 'foo', 'f' : Timestamp('20010102')},
index=np.arange(10))
result = df.get_dtype_counts()
expected = Series({'int64': 1, 'float64' : 1, 'datetime64[ns]': 1, 'object' : 1})
assert_series_equal(result, expected)

result = df._get_numeric_data()
expected = df.ix[:, ['a', 'b']]
assert_frame_equal(result, expected)

only_obj = df.ix[:, ['c']]
only_obj = df.ix[:, ['c','f']]
result = only_obj._get_numeric_data()
expected = df.ix[:, []]
assert_frame_equal(result, expected)
Expand Down Expand Up @@ -7367,6 +7370,36 @@ def test_as_matrix_numeric_cols(self):
values = self.frame.as_matrix(['A', 'B', 'C', 'D'])
self.assert_(values.dtype == np.float64)


def test_constructor_with_datetimes(self):

# single item
df = DataFrame({'A' : 1, 'B' : 'foo', 'C' : 'bar', 'D' : Timestamp("20010101"), 'E' : datetime(2001,1,2,0,0) },
index=np.arange(10))
result = df.get_dtype_counts()
expected = Series({'int64': 1, 'datetime64[ns]': 2, 'object' : 2})
assert_series_equal(result, expected)

# check with ndarray construction ndim==0 (e.g. we are passing a ndim 0 ndarray with a dtype specified)
df = DataFrame({'a': 1., 'b': 2, 'c': 'foo', 'float64' : np.array(1.,dtype='float64'),
'int64' : np.array(1,dtype='int64')}, index=np.arange(10))
result = df.get_dtype_counts()
expected = Series({'int64': 2, 'float64' : 2, 'object' : 1})
assert_series_equal(result, expected)

# check with ndarray construction ndim>0
df = DataFrame({'a': 1., 'b': 2, 'c': 'foo', 'float64' : np.array([1.]*10,dtype='float64'),
'int64' : np.array([1]*10,dtype='int64')}, index=np.arange(10))
result = df.get_dtype_counts()
expected = Series({'int64': 2, 'float64' : 2, 'object' : 1})
assert_series_equal(result, expected)

# GH #2751 (construction with no index specified)
df = DataFrame({'a':[1,2,4,7], 'b':[1.2, 2.3, 5.1, 6.3], 'c':list('abcd'), 'd':[datetime(2000,1,1) for i in range(4)] })
result = df.get_dtype_counts()
expected = Series({'int64': 1, 'float64' : 1, 'datetime64[ns]': 1, 'object' : 1})
assert_series_equal(result, expected)

def test_constructor_frame_copy(self):
cop = DataFrame(self.frame, copy=True)
cop['A'] = 5
Expand Down
Loading