You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ENH: allow propgation and coexistance of numeric dtypes (closes GH #622)
construction of multi numeric dtypes with other types in a dict
validated get_numeric_data returns correct dtypes
added blocks attribute (and as_blocks()) method that returns a dict of dtype -> homogeneous Frame to DataFrame
added keyword 'raise_on_error' to astype, which can be set to false to exluded non-numeric columns
fixed merging to correctly merge on multiple dtypes with blocks (e.g. float64 and float32 in other merger)
changed implementation of get_dtype_counts() to use .blocks
revised DataFrame.convert_objects to use blocks to be more efficient
added Dtype printing to show on default with a Series
added convert_dates='coerce' option to convert_objects, to force conversions to datetime64[ns]
where can upcast integer to float as needed (on inplace ops #2793)
added fully cythonized support for int8/int16
no support for float16 (it can exist, but no cython methods for it)
TST: fixed test in test_from_records_sequencelike (dict orders can be different on different arch!)
NOTE: using tuples will remove dtype info from the input stream (using a record array is ok though!)
test updates for merging (multi-dtypes)
added tests for replace (but skipped for now, algos not set for float32/16)
tests for astype and convert in internals
fixes for test_excel on 32-bit
fixed test_resample_median_bug_1688 I belive
separated out test_from_records_dictlike
testing of panel constructors (GH #797)
where ops now have a full test suite
allow slightly less sensitive decimal tests for less precise dtypes
BUG: fixed GH #2778, fillna on empty frame causes seg fault
fixed bug in groupby where types were not being casted to original dtype
respect the dtype of non-natural numeric (Decimal)
don't upcast ints/bools to floats (if you say were agging on len, you can get an int)
DOC: added astype conversion examples to whatsnew and docs (dsintro)
updated RELEASE notes
whatsnew for 0.10.2
added upcasting gotchas docs
CLN: updated convert_objects to be more consistent across frame/series
moved most groupby functions out of algos.pyx to generated.pyx
fully support cython functions for pad/bfill/take/diff/groupby for float32
moved more block-like conversion loops from frame.py to internals.py (created apply method)
(e.g. diff,fillna,where,shift,replace,interpolate,combining), to top-level methods in BlockManager
If a DataFrame contains columns of multiple dtypes, the dtype of the column
476
+
will be chosen to accommodate all of the data types (dtype=object is the most
477
+
general).
478
+
479
+
The related method ``get_dtype_counts`` will return the number of columns of
480
+
each type:
481
+
482
+
.. ipython:: python
483
+
484
+
df.get_dtype_counts()
485
+
486
+
Numeric dtypes will propgate and can coexist in DataFrames (starting in v0.10.2).
487
+
If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``,
488
+
or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste.
# this is lower-common-denomicator upcasting (meaning you get the dtype which can accomodate all of the types)
507
+
df3.values.dtype
508
+
509
+
Upcasting is always according to the **numpy** rules. If two different dtypes are involved in an operation, then the more *general* one will be used as the result of the operation.
510
+
511
+
DataType Conversion
512
+
~~~~~~~~~~~~~~~~~~~
513
+
514
+
You can use the ``astype`` method to convert dtypes from one to another. These *always* return a copy.
515
+
In addition, ``convert_objects`` will attempt to *soft* conversion of any *object* dtypes, meaning that if all the objects in a Series are of the same type, the Series
516
+
will have that dtype.
517
+
518
+
.. ipython:: python
519
+
520
+
df3
521
+
df3.dtypes
522
+
523
+
# conversion of dtypes
524
+
df3.astype('float32').dtypes
525
+
526
+
To force conversion of specific types of number conversion, pass ``convert_numeric = True``.
527
+
This will force strings and numbers alike to be numbers if possible, otherwise the will be set to ``np.nan``.
528
+
To force conversion to ``datetime64[ns]``, pass ``convert_dates = 'coerce'``.
529
+
This will convert any datetimelike object to dates, forcing other values to ``NaT``.
530
+
531
+
.. ipython:: python
532
+
533
+
# mixed type conversions
534
+
df3['D'] ='1.'
535
+
df3['E'] ='1'
536
+
df3.convert_objects(convert_numeric=True).dtypes
537
+
538
+
# same, but specific dtype conversion
539
+
df3['D'] = df3['D'].astype('float16')
540
+
df3['E'] = df3['E'].astype('int32')
541
+
df3.dtypes
542
+
543
+
# forcing date coercion
544
+
s = Series([datetime(2001,1,1,0,0), 'foo', 1.0, 1, Timestamp('20010104'), '20010105'],dtype='O')
545
+
s
546
+
s.convert_objects(convert_dates='coerce')
547
+
462
548
Data alignment and arithmetic
463
549
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
464
550
@@ -633,26 +719,6 @@ You can also disable this feature via the ``expand_frame_repr`` option:
633
719
reset_option('expand_frame_repr')
634
720
635
721
636
-
DataFrame column types
637
-
~~~~~~~~~~~~~~~~~~~~~~
638
-
639
-
.. _dsintro.column_types:
640
-
641
-
The four main types stored in pandas objects are float, int, boolean, and
642
-
object. A convenient ``dtypes`` attribute return a Series with the data type of
643
-
each column:
644
-
645
-
.. ipython:: python
646
-
647
-
baseball.dtypes
648
-
649
-
The related method ``get_dtype_counts`` will return the number of columns of
650
-
each type:
651
-
652
-
.. ipython:: python
653
-
654
-
baseball.get_dtype_counts()
655
-
656
722
DataFrame column attribute access and IPython completion
This is a minor release from 0.10.1 and includes many new features and
7
+
enhancements along with a large number of bug fixes. There are also a number of
8
+
important API changes that long-time pandas users should pay close attention
9
+
to.
10
+
11
+
API changes
12
+
~~~~~~~~~~~
13
+
14
+
Numeric dtypes will propgate and can coexist in DataFrames. If a dtype is passed (either directly via the ``dtype`` keyword, a passed ``ndarray``, or a passed ``Series``, then it will be preserved in DataFrame operations. Furthermore, different numeric dtypes will **NOT** be combined. The following example will give you a taste.
0 commit comments