Skip to content

Commit 9faa8de

Browse files
committed
Merge branch 'dtypes_bug' of https://github.com/jreback/pandas into jreback-dtypes_bug
Conflicts: pandas/tests/test_frame.py
2 parents d44e9c7 + cb56c98 commit 9faa8de

File tree

13 files changed

+391
-205
lines changed

13 files changed

+391
-205
lines changed

RELEASE.rst

+3
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,9 @@ pandas 0.11.0
5353

5454
- Do not automatically upcast numeric specified dtypes to ``int64`` or
5555
``float64`` (GH622_ and GH797_)
56+
- DataFrame construction of lists and scalars, with no dtype present, will
57+
result in casting to ``int64`` or ``float64``, regardless of platform.
58+
This is not an apparent change in the API, but noting it.
5659
- Guarantee that ``convert_objects()`` for Series/DataFrame always returns a
5760
copy
5861
- groupby operations will respect dtypes for numeric float operations

doc/source/v0.11.0.txt

+57-25
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
v0.11.0 (March ??, 2013)
44
------------------------
55

6-
This is a minor release from 0.10.1 and includes many new features and
6+
This is a major release from 0.10.1 and includes many new features and
77
enhancements along with a large number of bug fixes. There are also a number of
88
important API changes that long-time pandas users should pay close attention
99
to.
@@ -13,7 +13,8 @@ API changes
1313

1414
Numeric dtypes will propagate and can coexist in DataFrames. If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``, or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste.
1515

16-
**Dtype Specification**
16+
Dtype Specification
17+
~~~~~~~~~~~~~~~~~~~
1718

1819
.. ipython:: python
1920

@@ -29,7 +30,8 @@ Numeric dtypes will propagate and can coexist in DataFrames. If a dtype is passe
2930
df3
3031
df3.dtypes
3132

32-
**Dtype conversion**
33+
Dtype Conversion
34+
~~~~~~~~~~~~~~~~
3335

3436
.. ipython:: python
3537

@@ -54,6 +56,26 @@ Numeric dtypes will propagate and can coexist in DataFrames. If a dtype is passe
5456
Timestamp('20010104'), '20010105'],dtype='O')
5557
s.convert_objects(convert_dates='coerce')
5658

59+
Dtype Gotchas
60+
~~~~~~~~~~~~~
61+
62+
**Platform Gotchas**
63+
64+
Starting in 0.11.0, construction of DataFrame/Series will use default dtypes of ``int64`` and ``float64``,
65+
*regardless of platform*. This is not an apparent change from earlier versions of pandas. If you specify
66+
dtypes, they *WILL* be respected, however (GH2837_)
67+
68+
The following will all result in ``int64`` dtypes
69+
70+
.. ipython:: python
71+
72+
DataFrame([1,2],columns=['a']).dtypes
73+
DataFrame({'a' : [1,2] }).dtypes
74+
DataFrame({'a' : 1 }, index=range(2)).dtypes
75+
76+
Keep in mind that ``DataFrame(np.array([1,2]))`` **WILL** result in ``int32`` on 32-bit platforms!
77+
78+
5779
**Upcasting Gotchas**
5880

5981
Performing indexing operations on integer type data can easily upcast the data.
@@ -82,21 +104,13 @@ While float dtypes are unchanged.
82104
casted
83105
casted.dtypes
84106

85-
New features
86-
~~~~~~~~~~~~
87-
88-
**Enhancements**
107+
Datetimes Conversion
108+
~~~~~~~~~~~~~~~~~~~~
89109

90-
- In ``HDFStore``, provide dotted attribute access to ``get`` from stores (e.g. store.df == store['df'])
91-
92-
**Bug Fixes**
93-
94-
See the `full release notes
95-
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
96-
on GitHub for a complete list.
97-
98-
99-
Datetime64[ns] columns in a DataFrame (or a Series) allow the use of ``np.nan`` to indicate a nan value, in addition to the traditional ``NaT``, or not-a-time. This allows convenient nan setting in a generic way. Furthermore datetime64 columns are created by default, when passed datetimelike objects (*this change was introduced in 0.10.1*)
110+
Datetime64[ns] columns in a DataFrame (or a Series) allow the use of ``np.nan`` to indicate a nan value,
111+
in addition to the traditional ``NaT``, or not-a-time. This allows convenient nan setting in a generic way.
112+
Furthermore ``datetime64[ns]`` columns are created by default, when passed datetimelike objects (*this change was introduced in 0.10.1*)
113+
(GH2809_, GH2810_)
100114

101115
.. ipython:: python
102116

@@ -111,8 +125,7 @@ Datetime64[ns] columns in a DataFrame (or a Series) allow the use of ``np.nan``
111125
df.ix[2:4,['A','timestamp']] = np.nan
112126
df
113127

114-
Astype conversion on datetime64[ns] to object, implicity converts ``NaT`` to ``np.nan``
115-
128+
Astype conversion on ``datetime64[ns]`` to ``object``, implicity converts ``NaT`` to ``np.nan``
116129

117130
.. ipython:: python
118131

@@ -127,13 +140,32 @@ Astype conversion on datetime64[ns] to object, implicity converts ``NaT`` to ``n
127140
s.dtype
128141

129142

130-
``Squeeze`` to possibly remove length 1 dimensions from an object.
143+
New features
144+
~~~~~~~~~~~~
131145

132-
.. ipython:: python
146+
**Enhancements**
147+
148+
- In ``HDFStore``, provide dotted attribute access to ``get`` from stores
149+
(e.g. store.df == store['df'])
150+
151+
- ``Squeeze`` to possibly remove length 1 dimensions from an object.
133152

134-
p = Panel(randn(3,4,4),items=['ItemA','ItemB','ItemC'],
153+
.. ipython:: python
154+
155+
p = Panel(randn(3,4,4),items=['ItemA','ItemB','ItemC'],
135156
major_axis=date_range('20010102',periods=4),
136157
minor_axis=['A','B','C','D'])
137-
p
138-
p.reindex(items=['ItemA']).squeeze()
139-
p.reindex(items=['ItemA'],minor=['B']).squeeze()
158+
p
159+
p.reindex(items=['ItemA']).squeeze()
160+
p.reindex(items=['ItemA'],minor=['B']).squeeze()
161+
162+
**Bug Fixes**
163+
164+
See the `full release notes
165+
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
166+
on GitHub for a complete list.
167+
168+
.. _GH2809: https://github.com/pydata/pandas/issues/2809
169+
.. _GH2810: https://github.com/pydata/pandas/issues/2810
170+
.. _GH2837: https://github.com/pydata/pandas/issues/2837
171+

pandas/core/common.py

+104-53
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@
2424
from pandas.util.py3compat import StringIO, BytesIO
2525

2626
from pandas.core.config import get_option
27+
from pandas.core import array as pa
2728

2829
# XXX: HACK for NumPy 1.5.1 to suppress warnings
2930
try:
@@ -503,7 +504,7 @@ def take_1d(arr, indexer, out=None, fill_value=np.nan):
503504
dtype, fill_value = arr.dtype, arr.dtype.type()
504505
else:
505506
indexer = _ensure_int64(indexer)
506-
dtype = _maybe_promote(arr.dtype, fill_value)
507+
dtype = _maybe_promote(arr.dtype, fill_value)[0]
507508
if dtype != arr.dtype:
508509
mask = indexer == -1
509510
needs_masking = mask.any()
@@ -551,7 +552,7 @@ def take_2d_multi(arr, row_idx, col_idx, fill_value=np.nan, out=None):
551552
else:
552553
col_idx = _ensure_int64(col_idx)
553554

554-
dtype = _maybe_promote(arr.dtype, fill_value)
555+
dtype = _maybe_promote(arr.dtype, fill_value)[0]
555556
if dtype != arr.dtype:
556557
row_mask = row_idx == -1
557558
col_mask = col_idx == -1
@@ -587,7 +588,7 @@ def diff(arr, n, axis=0):
587588
n = int(n)
588589
dtype = arr.dtype
589590
if issubclass(dtype.type, np.integer):
590-
dtype = np.float_
591+
dtype = np.float64
591592
elif issubclass(dtype.type, np.bool_):
592593
dtype = np.object_
593594

@@ -628,7 +629,7 @@ def take_fast(arr, indexer, mask, needs_masking, axis=0, out=None,
628629
else:
629630
indexer = _ensure_int64(indexer)
630631
if needs_masking:
631-
dtype = _maybe_promote(arr.dtype, fill_value)
632+
dtype = _maybe_promote(arr.dtype, fill_value)[0]
632633
if dtype != arr.dtype and out is not None and out.dtype != dtype:
633634
raise Exception('Incompatible type for fill_value')
634635
else:
@@ -644,49 +645,110 @@ def take_fast(arr, indexer, mask, needs_masking, axis=0, out=None,
644645
return out
645646

646647

648+
def _infer_dtype_from_scalar(val):
649+
""" interpret the dtype from a scalar, upcast floats and ints
650+
return the new value and the dtype """
651+
652+
dtype = np.object_
653+
654+
# a 1-element ndarray
655+
if isinstance(val, pa.Array):
656+
if val.ndim != 0:
657+
raise ValueError("invalid ndarray passed to _infer_dtype_from_scalar")
658+
659+
dtype = val.dtype
660+
val = val.item()
661+
662+
elif isinstance(val, basestring):
663+
664+
# If we create an empty array using a string to infer
665+
# the dtype, NumPy will only allocate one character per entry
666+
# so this is kind of bad. Alternately we could use np.repeat
667+
# instead of np.empty (but then you still don't want things
668+
# coming out as np.str_!
669+
670+
dtype = np.object_
671+
672+
elif isinstance(val, np.datetime64):
673+
# ugly hacklet
674+
val = lib.Timestamp(val).value
675+
dtype = np.dtype('M8[ns]')
676+
677+
elif is_bool(val):
678+
dtype = np.bool_
679+
680+
# provide implicity upcast on scalars
681+
elif is_integer(val):
682+
dtype = np.int64
683+
684+
elif is_float(val):
685+
dtype = np.float64
686+
687+
elif is_complex(val):
688+
dtype = np.complex_
689+
690+
return dtype, val
691+
647692
def _maybe_promote(dtype, fill_value=np.nan):
693+
# returns tuple of (dtype, fill_value)
648694
if issubclass(dtype.type, np.datetime64):
649-
# for now: refuse to upcast
695+
# for now: refuse to upcast datetime64
650696
# (this is because datetime64 will not implicitly upconvert
651697
# to object correctly as of numpy 1.6.1)
652-
return dtype
698+
if isnull(fill_value):
699+
fill_value = tslib.iNaT
700+
else:
701+
try:
702+
fill_value = lib.Timestamp(fill_value).value
703+
except:
704+
# the proper thing to do here would probably be to upcast to
705+
# object (but numpy 1.6.1 doesn't do this properly)
706+
fill_value = tslib.iNaT
653707
elif is_float(fill_value):
654708
if issubclass(dtype.type, np.bool_):
655-
return np.object_
709+
dtype = np.object_
656710
elif issubclass(dtype.type, np.integer):
657-
return np.float_
658-
return dtype
711+
dtype = np.float64
659712
elif is_bool(fill_value):
660-
if issubclass(dtype.type, np.bool_):
661-
return dtype
662-
return np.object_
713+
if not issubclass(dtype.type, np.bool_):
714+
dtype = np.object_
663715
elif is_integer(fill_value):
664716
if issubclass(dtype.type, np.bool_):
665-
return np.object_
717+
dtype = np.object_
666718
elif issubclass(dtype.type, np.integer):
667719
# upcast to prevent overflow
668720
arr = np.asarray(fill_value)
669721
if arr != arr.astype(dtype):
670-
return arr.dtype
671-
return dtype
672-
return dtype
722+
dtype = arr.dtype
673723
elif is_complex(fill_value):
674724
if issubclass(dtype.type, np.bool_):
675-
return np.object_
725+
dtype = np.object_
676726
elif issubclass(dtype.type, (np.integer, np.floating)):
677-
return np.complex_
678-
return dtype
679-
return np.object_
727+
dtype = np.complex128
728+
else:
729+
dtype = np.object_
730+
return dtype, fill_value
680731

732+
def _maybe_upcast(values, fill_value=np.nan, copy=False):
733+
""" provide explicty type promotion and coercion
734+
if copy == True, then a copy is created even if no upcast is required """
735+
736+
new_dtype, fill_value = _maybe_promote(values.dtype, fill_value)
737+
if new_dtype != values.dtype:
738+
values = values.astype(new_dtype)
739+
elif copy:
740+
values = values.copy()
741+
return values, fill_value
742+
743+
def _possibly_cast_item(obj, item, dtype):
744+
chunk = obj[item]
745+
746+
if chunk.values.dtype != dtype:
747+
if dtype in (np.object_, np.bool_):
748+
obj[item] = chunk.astype(np.object_)
749+
elif not issubclass(dtype, (np.integer, np.bool_)): # pragma: no cover
750+
raise ValueError("Unexpected dtype encountered: %s" % dtype)
681751

682-
def _maybe_upcast(values):
683-
# TODO: convert remaining usage of _maybe_upcast to _maybe_promote
684-
if issubclass(values.dtype.type, np.integer):
685-
values = values.astype(np.float_)
686-
elif issubclass(values.dtype.type, np.bool_):
687-
values = values.astype(np.object_)
688-
return values
689-
690752

691753
def _interp_wrapper(f, wrap_dtype, na_override=None):
692754
def wrapper(arr, mask, limit=None):
@@ -808,7 +870,8 @@ def _consensus_name_attr(objs):
808870
def _possibly_convert_objects(values, convert_dates=True, convert_numeric=True):
809871
""" if we have an object dtype, try to coerce dates and/or numers """
810872

811-
if values.dtype == np.object_ and convert_dates:
873+
# convert dates
874+
if convert_dates and values.dtype == np.object_:
812875

813876
# we take an aggressive stance and convert to datetime64[ns]
814877
if convert_dates == 'coerce':
@@ -821,7 +884,8 @@ def _possibly_convert_objects(values, convert_dates=True, convert_numeric=True):
821884
else:
822885
values = lib.maybe_convert_objects(values, convert_datetime=convert_dates)
823886

824-
if values.dtype == np.object_ and convert_numeric:
887+
# convert to numeric
888+
if convert_numeric and values.dtype == np.object_:
825889
try:
826890
new_values = lib.maybe_convert_numeric(values,set(),coerce_numeric=True)
827891

@@ -834,6 +898,16 @@ def _possibly_convert_objects(values, convert_dates=True, convert_numeric=True):
834898

835899
return values
836900

901+
def _possibly_convert_platform(values):
902+
""" try to do platform conversion, allow ndarray or list here """
903+
904+
if isinstance(values, (list,tuple)):
905+
values = lib.list_to_object_array(values)
906+
if values.dtype == np.object_:
907+
values = lib.maybe_convert_objects(values)
908+
909+
return values
910+
837911

838912
def _possibly_cast_to_datetime(value, dtype, coerce = False):
839913
""" try to cast the array/value to a datetimelike dtype, converting float nan to iNaT """
@@ -876,29 +950,6 @@ def _possibly_cast_to_datetime(value, dtype, coerce = False):
876950
return value
877951

878952

879-
def _infer_dtype(value):
880-
if isinstance(value, (float, np.floating)):
881-
return np.float_
882-
elif isinstance(value, (bool, np.bool_)):
883-
return np.bool_
884-
elif isinstance(value, (int, long, np.integer)):
885-
return np.int_
886-
elif isinstance(value, (complex, np.complexfloating)):
887-
return np.complex_
888-
else:
889-
return np.object_
890-
891-
892-
def _possibly_cast_item(obj, item, dtype):
893-
chunk = obj[item]
894-
895-
if chunk.values.dtype != dtype:
896-
if dtype in (np.object_, np.bool_):
897-
obj[item] = chunk.astype(np.object_)
898-
elif not issubclass(dtype, (np.integer, np.bool_)): # pragma: no cover
899-
raise ValueError("Unexpected dtype encountered: %s" % dtype)
900-
901-
902953
def _is_bool_indexer(key):
903954
if isinstance(key, np.ndarray) and key.dtype == np.object_:
904955
key = np.asarray(key)

0 commit comments

Comments
 (0)