Skip to content

Commit 643e1cb

Browse files
committed
DOC: revamped dtypes section in basics.rst
fixed removal of foo temp files in 10min DOC: added to time series in 10min.rst
1 parent fbf1977 commit 643e1cb

File tree

3 files changed

+105
-17
lines changed

3 files changed

+105
-17
lines changed

doc/source/10min.rst

+45-2
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
88
import numpy as np
99
import random
10+
import os
1011
np.random.seed(123456)
1112
from pandas import *
1213
import pandas as pd
@@ -466,6 +467,45 @@ limited to, financial applications. See the :ref:`Time Series section <timeserie
466467
ts = pd.Series(randint(0, 500, len(rng)), index=rng)
467468
ts.resample('5Min', how='sum')
468469
470+
Time zone representation
471+
472+
.. ipython:: python
473+
474+
rng = pd.date_range('3/6/2012 00:00', periods=5, freq='D')
475+
ts = pd.Series(randn(len(rng)), rng)
476+
ts_utc = ts.tz_localize('UTC')
477+
ts_utc
478+
479+
Convert to another time zone
480+
481+
.. ipython:: python
482+
483+
ts_utc.tz_convert('US/Eastern')
484+
485+
Converting between time span representations
486+
487+
.. ipython:: python
488+
489+
rng = pd.date_range('1/1/2012', periods=5, freq='M')
490+
ts = pd.Series(randn(len(rng)), index=rng)
491+
ts
492+
ps = ts.to_period()
493+
ps
494+
ps.to_timestamp()
495+
496+
Converting between period and timestamp enables some convenient arithmetic
497+
functions to be used. In the following example, we convert a quarterly
498+
frequency with year ending in November to 9am of the end of the month following
499+
the quarter end:
500+
501+
.. ipython:: python
502+
503+
prng = period_range('1990Q1', '2000Q4', freq='Q-NOV')
504+
ts = Series(randn(len(prng)), prng)
505+
ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9
506+
ts.head()
507+
508+
469509
Plotting
470510
--------
471511

@@ -512,6 +552,11 @@ CSV
512552
513553
pd.read_csv('foo.csv')
514554
555+
.. ipython:: python
556+
:suppress:
557+
558+
os.remove('foo.csv')
559+
515560
HDF5
516561
~~~~
517562

@@ -532,9 +577,7 @@ Reading from a HDF5 Store
532577
533578
.. ipython:: python
534579
:suppress:
535-
:okexcept:
536580
537581
store.close()
538582
os.remove('foo.h5')
539-
os.remove('foo.csv')
540583

doc/source/basics.rst

+59-14
Original file line numberDiff line numberDiff line change
@@ -976,14 +976,14 @@ To be clear, no pandas methods have the side effect of modifying your data;
976976
almost all methods return new objects, leaving the original object
977977
untouched. If data is modified, it is because you did so explicitly.
978978

979-
DTypes
979+
dtypes
980980
------
981981

982982
.. _basics.dtypes:
983983

984-
The main types stored in pandas objects are float, int, boolean, datetime64[ns],
985-
and object. A convenient ``dtypes`` attribute for DataFrames returns a Series with
986-
the data type of each column.
984+
The main types stored in pandas objects are ``float``, ``int``, ``bool``, ``datetime64[ns]``, ``timedelta[ns]``,
985+
and ``object``. In addition these dtypes have item sizes, e.g. ``int64`` and ``int32``. A convenient ``dtypes``
986+
attribute for DataFrames returns a Series with the data type of each column.
987987

988988
.. ipython:: python
989989
@@ -992,11 +992,26 @@ the data type of each column.
992992
F = False,
993993
G = Series([1]*3,dtype='int8')))
994994
dft
995+
dft.dtypes
995996
996-
If a DataFrame contains columns of multiple dtypes, the dtype of the column
997-
will be chosen to accommodate all of the data types (dtype=object is the most
997+
On a ``Series`` use the ``dtype`` method.
998+
999+
.. ipython:: python
1000+
1001+
dft['A'].dtype
1002+
1003+
If a pandas object contains data multiple dtypes *IN A SINGLE COLUMN*, the dtype of the
1004+
column will be chosen to accommodate all of the data types (``object`` is the most
9981005
general).
9991006

1007+
.. ipython:: python
1008+
1009+
# these ints are coerced to floats
1010+
Series([1, 2, 3, 4, 5, 6.])
1011+
1012+
# string data forces an ``object`` dtype
1013+
Series([1, 2, 3, 6., 'foo'])
1014+
10001015
The related method ``get_dtype_counts`` will return the number of columns of
10011016
each type:
10021017

@@ -1019,15 +1034,42 @@ or a passed ``Series``, then it will be preserved in DataFrame operations. Furth
10191034
df2
10201035
df2.dtypes
10211036
1022-
# here you get some upcasting
1037+
defaults
1038+
~~~~~~~~
1039+
1040+
By default integer types are ``int64`` and float types are ``float64``, *REGARDLESS* of platform (32-bit or 64-bit).
1041+
1042+
The following will all result in ``int64`` dtypes.
1043+
1044+
.. ipython:: python
1045+
1046+
DataFrame([1,2],columns=['a']).dtypes
1047+
DataFrame({'a' : [1,2] }).dtypes
1048+
DataFrame({'a' : 1 }, index=range(2)).dtypes
1049+
1050+
Numpy, however will choose *platform-dependent* types when creating arrays.
1051+
Thus, ``DataFrame(np.array([1,2]))`` **WILL** result in ``int32`` on 32-bit platform.
1052+
1053+
1054+
upcasting
1055+
~~~~~~~~~
1056+
1057+
Types can potentially be *upcasted* when combined with other types, meaning they are promoted from the current type (say ``int`` to ``float``)
1058+
1059+
.. ipython:: python
1060+
10231061
df3 = df1.reindex_like(df2).fillna(value=0.0) + df2
10241062
df3
10251063
df3.dtypes
10261064
1027-
# this is lower-common-denomicator upcasting (meaning you get the dtype which can accomodate all of the types)
1065+
The ``values`` attribute on a DataFrame return the *lower-common-denominator* of the dtypes, meaning the dtype that can accomodate **ALL** of the types in the resulting homogenous dtyped numpy array. This can
1066+
force some *upcasting*.
1067+
1068+
.. ipython:: python
1069+
10281070
df3.values.dtype
10291071
1030-
Astype
1072+
astype
10311073
~~~~~~
10321074

10331075
.. _basics.cast:
@@ -1044,7 +1086,7 @@ then the more *general* one will be used as the result of the operation.
10441086
# conversion of dtypes
10451087
df3.astype('float32').dtypes
10461088
1047-
Object Conversion
1089+
object conversion
10481090
~~~~~~~~~~~~~~~~~
10491091

10501092
To force conversion of specific types of number conversion, pass ``convert_numeric = True``.
@@ -1067,16 +1109,19 @@ the objects in a Series are of the same type, the Series will have that dtype.
10671109
df3['E'] = df3['E'].astype('int32')
10681110
df3.dtypes
10691111
1070-
# forcing date coercion
1112+
This is a *forced coercion* on datelike types. This might be useful if you are reading in data which is mostly dates, but occasionally has non-dates intermixed and you want to make those values ``nan``.
1113+
1114+
.. ipython:: python
1115+
10711116
s = Series([datetime(2001,1,1,0,0), 'foo', 1.0, 1, Timestamp('20010104'), '20010105'],dtype='O')
10721117
s
10731118
s.convert_objects(convert_dates='coerce')
10741119
10751120
1076-
Upcasting Gotchas
1077-
~~~~~~~~~~~~~~~~~
1121+
gotchas
1122+
~~~~~~~
10781123

1079-
Performing indexing operations on ``integer`` type data can easily upcast the data to ``floating``.
1124+
Performing selection operations on ``integer`` type data can easily upcast the data to ``floating``.
10801125
The dtype of the input data will be preserved in cases where ``nans`` are not introduced (starting in 0.11.0)
10811126
See also :ref:`integer na gotchas <gotchas.intna>`
10821127

doc/source/dsintro.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -447,7 +447,7 @@ DataFrame:
447447

448448
.. ipython:: python
449449
450-
df.loc('b')
450+
df.loc['b']
451451
df.iloc[2]
452452
453453
For a more exhaustive treatment of more sophisticated label-based indexing and

0 commit comments

Comments
 (0)