Skip to content

Commit 166a80d

Browse files
committed
ENH: allow propgation and coexistance of numeric dtypes (closes GH #622)
construction of multi numeric dtypes with other types in a dict validated get_numeric_data returns correct dtypes added blocks attribute (and as_blocks()) method that returns a dict of dtype -> homogeneous Frame to DataFrame added keyword 'raise_on_error' to astype, which can be set to false to exluded non-numeric columns fixed merging to correctly merge on multiple dtypes with blocks (e.g. float64 and float32 in other merger) changed implementation of get_dtype_counts() to use .blocks revised DataFrame.convert_objects to use blocks to be more efficient added Dtype printing to show on default with a Series added convert_dates='coerce' option to convert_objects, to force conversions to datetime64[ns] where can upcast integer to float as needed (on inplace ops #2793) added fully cythonized support for int8/int16 no support for float16 (it can exist, but no cython methods for it) TST: fixed test in test_from_records_sequencelike (dict orders can be different on different arch!) NOTE: using tuples will remove dtype info from the input stream (using a record array is ok though!) test updates for merging (multi-dtypes) added tests for replace (but skipped for now, algos not set for float32/16) tests for astype and convert in internals fixes for test_excel on 32-bit fixed test_resample_median_bug_1688 I belive separated out test_from_records_dictlike testing of panel constructors (GH #797) where ops now have a full test suite allow slightly less sensitive decimal tests for less precise dtypes BUG: fixed GH #2778, fillna on empty frame causes seg fault fixed bug in groupby where types were not being casted to original dtype respect the dtype of non-natural numeric (Decimal) don't upcast ints/bools to floats (if you say were agging on len, you can get an int) DOC: added astype conversion examples to whatsnew and docs (dsintro) updated RELEASE notes whatsnew for 0.10.2 added upcasting gotchas docs CLN: updated convert_objects to be more consistent across frame/series moved most groupby functions out of algos.pyx to generated.pyx fully support cython functions for pad/bfill/take/diff/groupby for float32 moved more block-like conversion loops from frame.py to internals.py (created apply method) (e.g. diff,fillna,where,shift,replace,interpolate,combining), to top-level methods in BlockManager
1 parent 3ba3119 commit 166a80d

37 files changed

+9634
-3178
lines changed

RELEASE.rst

+38-1
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,42 @@ Where to get it
2222
* Binary installers on PyPI: http://pypi.python.org/pypi/pandas
2323
* Documentation: http://pandas.pydata.org
2424

25+
pandas 0.10.2
26+
=============
27+
28+
**Release date:** 2013-??-??
29+
30+
**New features**
31+
32+
- Allow mixed dtypes (e.g ``float32/float64/int32/int16/int8``) to coexist in DataFrames and propogate in operations
33+
34+
**Improvements to existing features**
35+
36+
- added ``blocks`` attribute to DataFrames, to return a dict of dtypes to homogeneously dtyped DataFrames
37+
- added keyword ``convert_numeric`` to ``convert_objects()`` to try to convert object dtypes to numeric types
38+
- ``convert_dates`` in ``convert_objects`` can now be ``coerce`` which will return a datetime64[ns] dtype
39+
with non-convertibles set as ``NaT``; will preserve an all-nan object (e.g. strings)
40+
- Series print output now includes the dtype by default
41+
42+
**API Changes**
43+
44+
- Do not automatically upcast numeric specified dtypes to ``int64`` or ``float64`` (GH622_ and GH797_)
45+
- Guarantee that ``convert_objects()`` for Series/DataFrame always returns a copy
46+
- groupby operations will respect dtypes for numeric float operations (float32/float64); other types will be operated on,
47+
and will try to cast back to the input dtype (e.g. if an int is passed, as long as the output doesn't have nans,
48+
then an int will be returned)
49+
- backfill/pad/take/diff/ohlc will now support ``float32/int16/int8`` operations
50+
- Integer block types will upcast as needed in where operations (GH2793_)
51+
52+
**Bug Fixes**
53+
54+
- Fix seg fault on empty data frame when fillna with ``pad`` or ``backfill`` (GH2778_)
55+
56+
.. _GH622: https://github.com/pydata/pandas/issues/622
57+
.. _GH797: https://github.com/pydata/pandas/issues/797
58+
.. _GH2778: https://github.com/pydata/pandas/issues/2778
59+
.. _GH2793: https://github.com/pydata/pandas/issues/2793
60+
2561
pandas 0.10.1
2662
=============
2763

@@ -36,6 +72,7 @@ pandas 0.10.1
3672
- Restored inplace=True behavior returning self (same object) with
3773
deprecation warning until 0.11 (GH1893_)
3874
- ``HDFStore``
75+
3976
- refactored HFDStore to deal with non-table stores as objects, will allow future enhancements
4077
- removed keyword ``compression`` from ``put`` (replaced by keyword
4178
``complib`` to be consistent across library)
@@ -49,7 +86,7 @@ pandas 0.10.1
4986
- support data column indexing and selection, via ``data_columns`` keyword in append
5087
- support write chunking to reduce memory footprint, via ``chunksize``
5188
keyword to append
52-
- support automagic indexing via ``index`` keywork to append
89+
- support automagic indexing via ``index`` keyword to append
5390
- support ``expectedrows`` keyword in append to inform ``PyTables`` about
5491
the expected tablesize
5592
- support ``start`` and ``stop`` keywords in select to limit the row

doc/source/dsintro.rst

+90-24
Original file line numberDiff line numberDiff line change
@@ -450,15 +450,101 @@ DataFrame:
450450
df.xs('b')
451451
df.ix[2]
452452
453-
Note if a DataFrame contains columns of multiple dtypes, the dtype of the row
454-
will be chosen to accommodate all of the data types (dtype=object is the most
455-
general).
456-
457453
For a more exhaustive treatment of more sophisticated label-based indexing and
458454
slicing, see the :ref:`section on indexing <indexing>`. We will address the
459455
fundamentals of reindexing / conforming to new sets of lables in the
460456
:ref:`section on reindexing <basics.reindexing>`.
461457

458+
DataTypes
459+
~~~~~~~~~
460+
461+
.. _dsintro.column_types:
462+
463+
The main types stored in pandas objects are float, int, boolean, datetime64[ns],
464+
and object. A convenient ``dtypes`` attribute return a Series with the data type of
465+
each column.
466+
467+
.. ipython:: python
468+
469+
df['integer'] = 1
470+
df['int32'] = df['integer'].astype('int32')
471+
df['float32'] = Series([1.0]*len(df),dtype='float32')
472+
df['timestamp'] = Timestamp('20010102')
473+
df.dtypes
474+
475+
If a DataFrame contains columns of multiple dtypes, the dtype of the column
476+
will be chosen to accommodate all of the data types (dtype=object is the most
477+
general).
478+
479+
The related method ``get_dtype_counts`` will return the number of columns of
480+
each type:
481+
482+
.. ipython:: python
483+
484+
df.get_dtype_counts()
485+
486+
Numeric dtypes will propgate and can coexist in DataFrames (starting in v0.10.2).
487+
If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``,
488+
or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste.
489+
490+
.. ipython:: python
491+
492+
df1 = DataFrame(randn(8, 1), columns = ['A'], dtype = 'float32')
493+
df1
494+
df1.dtypes
495+
df2 = DataFrame(dict( A = Series(randn(8),dtype='float16'),
496+
B = Series(randn(8)),
497+
C = Series(np.array(randn(8),dtype='uint8')) ))
498+
df2
499+
df2.dtypes
500+
501+
# here you get some upcasting
502+
df3 = df1.reindex_like(df2).fillna(value=0.0) + df2
503+
df3
504+
df3.dtypes
505+
506+
# this is lower-common-denomicator upcasting (meaning you get the dtype which can accomodate all of the types)
507+
df3.values.dtype
508+
509+
Upcasting is always according to the **numpy** rules. If two different dtypes are involved in an operation, then the more *general* one will be used as the result of the operation.
510+
511+
DataType Conversion
512+
~~~~~~~~~~~~~~~~~~~
513+
514+
You can use the ``astype`` method to convert dtypes from one to another. These *always* return a copy.
515+
In addition, ``convert_objects`` will attempt to *soft* conversion of any *object* dtypes, meaning that if all the objects in a Series are of the same type, the Series
516+
will have that dtype.
517+
518+
.. ipython:: python
519+
520+
df3
521+
df3.dtypes
522+
523+
# conversion of dtypes
524+
df3.astype('float32').dtypes
525+
526+
To force conversion of specific types of number conversion, pass ``convert_numeric = True``.
527+
This will force strings and numbers alike to be numbers if possible, otherwise the will be set to ``np.nan``.
528+
To force conversion to ``datetime64[ns]``, pass ``convert_dates = 'coerce'``.
529+
This will convert any datetimelike object to dates, forcing other values to ``NaT``.
530+
531+
.. ipython:: python
532+
533+
# mixed type conversions
534+
df3['D'] = '1.'
535+
df3['E'] = '1'
536+
df3.convert_objects(convert_numeric=True).dtypes
537+
538+
# same, but specific dtype conversion
539+
df3['D'] = df3['D'].astype('float16')
540+
df3['E'] = df3['E'].astype('int32')
541+
df3.dtypes
542+
543+
# forcing date coercion
544+
s = Series([datetime(2001,1,1,0,0), 'foo', 1.0, 1, Timestamp('20010104'), '20010105'],dtype='O')
545+
s
546+
s.convert_objects(convert_dates='coerce')
547+
462548
Data alignment and arithmetic
463549
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
464550

@@ -633,26 +719,6 @@ You can also disable this feature via the ``expand_frame_repr`` option:
633719
reset_option('expand_frame_repr')
634720
635721
636-
DataFrame column types
637-
~~~~~~~~~~~~~~~~~~~~~~
638-
639-
.. _dsintro.column_types:
640-
641-
The four main types stored in pandas objects are float, int, boolean, and
642-
object. A convenient ``dtypes`` attribute return a Series with the data type of
643-
each column:
644-
645-
.. ipython:: python
646-
647-
baseball.dtypes
648-
649-
The related method ``get_dtype_counts`` will return the number of columns of
650-
each type:
651-
652-
.. ipython:: python
653-
654-
baseball.get_dtype_counts()
655-
656722
DataFrame column attribute access and IPython completion
657723
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
658724

doc/source/indexing.rst

+28
Original file line numberDiff line numberDiff line change
@@ -304,6 +304,34 @@ so that the original data can be modified without creating a copy:
304304
305305
df.mask(df >= 0)
306306
307+
Upcasting Gotchas
308+
~~~~~~~~~~~~~~~~~
309+
310+
Performing indexing operations on ``integer`` type data can easily upcast the data to ``floating``.
311+
The dtype of the input data will be preserved in cases where ``nans`` are not introduced (coming soon).
312+
313+
.. ipython:: python
314+
315+
dfi = df.astype('int32')
316+
dfi['E'] = 1
317+
dfi
318+
dfi.dtypes
319+
320+
casted = dfi[dfi>0]
321+
casted
322+
casted.dtypes
323+
324+
While float dtypes are unchanged.
325+
326+
.. ipython:: python
327+
328+
df2 = df.copy()
329+
df2['A'] = df2['A'].astype('float32')
330+
df2.dtypes
331+
332+
casted = df2[df2>0]
333+
casted
334+
casted.dtypes
307335
308336
Take Methods
309337
~~~~~~~~~~~~

doc/source/v0.10.2.txt

+95
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
.. _whatsnew_0102:
2+
3+
v0.10.2 (February ??, 2013)
4+
---------------------------
5+
6+
This is a minor release from 0.10.1 and includes many new features and
7+
enhancements along with a large number of bug fixes. There are also a number of
8+
important API changes that long-time pandas users should pay close attention
9+
to.
10+
11+
API changes
12+
~~~~~~~~~~~
13+
14+
Numeric dtypes will propgate and can coexist in DataFrames. If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``, or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste.
15+
16+
**Dtype Specification**
17+
18+
.. ipython:: python
19+
20+
df1 = DataFrame(randn(8, 1), columns = ['A'], dtype = 'float32')
21+
df1
22+
df1.dtypes
23+
df2 = DataFrame(dict( A = Series(randn(8),dtype='float16'), B = Series(randn(8)), C = Series(randn(8),dtype='uint8') ))
24+
df2
25+
df2.dtypes
26+
27+
# here you get some upcasting
28+
df3 = df1.reindex_like(df2).fillna(value=0.0) + df2
29+
df3
30+
df3.dtypes
31+
32+
**Dtype conversion**
33+
34+
.. ipython:: python
35+
36+
# this is lower-common-denomicator upcasting (meaning you get the dtype which can accomodate all of the types)
37+
df3.values.dtype
38+
39+
# conversion of dtypes
40+
df3.astype('float32').dtypes
41+
42+
# mixed type conversions
43+
df3['D'] = '1.'
44+
df3['E'] = '1'
45+
df3.convert_objects(convert_numeric=True).dtypes
46+
47+
# same, but specific dtype conversion
48+
df3['D'] = df3['D'].astype('float16')
49+
df3['E'] = df3['E'].astype('int32')
50+
df3.dtypes
51+
52+
# forcing date coercion
53+
s = Series([datetime(2001,1,1,0,0), 'foo', 1.0, 1,
54+
Timestamp('20010104'), '20010105'],dtype='O')
55+
s.convert_objects(convert_dates='coerce')
56+
57+
**Upcasting Gotchas**
58+
59+
Performing indexing operations on integer type data can easily upcast the data.
60+
The dtype of the input data will be preserved in cases where ``nans`` are not introduced (coming soon).
61+
62+
.. ipython:: python
63+
64+
dfi = df3.astype('int32')
65+
dfi['D'] = dfi['D'].astype('int64')
66+
dfi
67+
dfi.dtypes
68+
69+
casted = dfi[dfi>0]
70+
casted
71+
casted.dtypes
72+
73+
While float dtypes are unchanged.
74+
75+
.. ipython:: python
76+
77+
df4 = df3.copy()
78+
df4['A'] = df4['A'].astype('float32')
79+
df4.dtypes
80+
81+
casted = df4[df4>0]
82+
casted
83+
casted.dtypes
84+
85+
New features
86+
~~~~~~~~~~~~
87+
88+
**Enhancements**
89+
90+
**Bug Fixes**
91+
92+
See the `full release notes
93+
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
94+
on GitHub for a complete list.
95+

doc/source/whatsnew.rst

+2
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,8 @@ What's New
1616

1717
These are new features and improvements of note in each release.
1818

19+
.. include:: v0.10.2.txt
20+
1921
.. include:: v0.10.1.txt
2022

2123
.. include:: v0.10.0.txt

0 commit comments

Comments
 (0)