Skip to content

Commit 602bd19

Browse files
adamkleinwesm
authored andcommitted
continued on 0.6.0 docs
1 parent 6d26eea commit 602bd19

File tree

6 files changed

+119
-48
lines changed

6 files changed

+119
-48
lines changed

doc/source/basics.rst

Lines changed: 34 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,21 @@ the previous section:
2727
major_axis=DateRange('1/1/2000', periods=5),
2828
minor_axis=['A', 'B', 'C', 'D'])
2929
30+
.. _basics.head_tail:
31+
32+
Head and Tail
33+
-------------
34+
35+
To view a small sample of a Series or DataFrame object, use the ``head`` and
36+
``tail`` methods. The default number of elements to display is five, but you
37+
may pass a custom number.
38+
39+
.. ipython:: python
40+
41+
long_series = Series(randn(1000))
42+
long_series.head()
43+
long_series.tail(3)
44+
3045
.. _basics.attrs:
3146

3247
Attributes and the raw ndarray(s)
@@ -76,15 +91,15 @@ unlike the axis labels, cannot be assigned to.
7691
Flexible binary operations
7792
--------------------------
7893

79-
With binary operations between pandas data structures, we have a couple items
94+
With binary operations between pandas data structures, there are two key points
8095
of interest:
8196

82-
* How to describe broadcasting behavior between higher- (e.g. DataFrame) and
97+
* Broadcasting behavior between higher- (e.g. DataFrame) and
8398
lower-dimensional (e.g. Series) objects.
84-
* Behavior of missing data in computations
99+
* Missing data in computations
85100

86-
We will demonstrate the currently-available functions to illustrate these
87-
issues independently, though they can be performed simultaneously.
101+
We will demonstrate how to manage these issues independently, though they can
102+
be handled simultaneously.
88103

89104
Matching / broadcasting behavior
90105
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -179,6 +194,20 @@ function implementing this operation is ``combine_first``, which we illustrate:
179194
df2
180195
df1.combine_first(df2)
181196
197+
General DataFrame Combine
198+
~~~~~~~~~~~~~~~~~~~~~~~~~
199+
200+
The ``combine_first`` method above calls the more general DataFrame method
201+
``combine``. This method takes another DataFrame and a combiner function,
202+
aligns the input DataFrame and then passes the combiner function pairs of
203+
Series (ie, columns whose names are the same).
204+
205+
So, for instance, to reproduce ``combine_first`` as above:
206+
207+
.. ipython:: python
208+
209+
combiner = lambda x, y: np.where(isnull(x), y, x)
210+
df1.combine(df2, combiner)
182211
183212
.. _basics.stats:
184213

doc/source/dsintro.rst

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ objects. To get started, import numpy and load pandas into your namespace:
1616
import numpy as np
1717
from pandas import *
1818
randn = np.random.randn
19-
np.set_printoptions(precision=4, suppress=True)
19+
np.set_printoptions(precision=4, suppress=True, max_columns=10)
2020
2121
.. ipython:: python
2222
@@ -455,6 +455,19 @@ Operations with scalars are just as you would expect:
455455
1 / df
456456
df ** 4
457457
458+
.. _dsintro.boolean:
459+
460+
As of 0.6, boolean operators work:
461+
462+
.. ipython:: python
463+
464+
df1 = DataFrame({'a' : [1, 0, 1], 'b' : [0, 1, 1] }, dtype=bool)
465+
df2 = DataFrame({'a' : [0, 1, 1], 'b' : [1, 1, 0] }, dtype=bool)
466+
df1 & df2
467+
df1 | df2
468+
df1 ^ df2
469+
-df1
470+
458471
Transposing
459472
~~~~~~~~~~~
460473

doc/source/groupby.rst

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -177,6 +177,13 @@ number:
177177
178178
s.groupby(level='second').sum()
179179
180+
As of v0.6, the aggregation functions such as ``sum`` will take the level
181+
parameter directly:
182+
183+
.. ipython:: python
184+
185+
s.sum(level='second')
186+
180187
More on the ``sum`` function and aggregation later. Grouping with multiple
181188
levels (as opposed to a single level) is not yet supported, though implementing
182189
it is not difficult.

doc/source/indexing.rst

Lines changed: 53 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,12 @@ The axis labeling information in pandas objects serves many purposes:
2222
- Enables automatic and explicit data alignment
2323
- Allows intuitive getting and setting of subsets of the data set
2424

25-
In this section / chapter, we will focus on the latter set of functionality,
26-
namely how to slice, dice, and generally get and set subsets of pandas
27-
objects. The primary focus will be on Series and DataFrame as they have
28-
received more development attention in this area. More work will be invested in
29-
Panel and future higher-dimensional data structures in the future, especially
30-
in label-based advanced indexing.
25+
In this section / chapter, we will focus on the final point: namely, how to
26+
slice, dice, and generally get and set subsets of pandas objects. The primary
27+
focus will be on Series and DataFrame as they have received more development
28+
attention in this area. Expect more work to be invested higher-dimensional data
29+
structures (including Panel) in the future, especially in label-based advanced
30+
indexing.
3131

3232
.. _indexing.basics:
3333

@@ -115,19 +115,16 @@ label, respectively.
115115
panel.major_xs(date)
116116
panel.minor_xs('A')
117117
118-
.. note::
119-
120-
See :ref:`advanced indexing <indexing.advanced>` below for an alternate and
121-
more concise way of doing the same thing.
122-
123118
Slicing ranges
124119
~~~~~~~~~~~~~~
125120

126-
:ref:`Advanced indexing <indexing.advanced>` detailed below is the most robust
127-
and consistent way of slicing ranges, e.g. ``obj[5:10]``, across all of the data
128-
structures and their axes (except in the case of integer labels, more on that
129-
later). On Series, this syntax works exactly as expected as with an ndarray,
130-
returning a slice of the values and the corresponding labels:
121+
The most robust and consistent way of slicing ranges along arbitrary axes is
122+
described in the :ref:`Advanced indexing <indexing.advanced>` section detailing
123+
the ``.ix`` method. For now, we explain the semantics of slicing using the
124+
``[]`` operator.
125+
126+
With Series, the syntax works exactly as with an ndarray, returning a slice of
127+
the values and the corresponding labels:
131128

132129
.. ipython:: python
133130
@@ -154,28 +151,37 @@ largely as a convenience since it is such a common operation.
154151
Boolean indexing
155152
~~~~~~~~~~~~~~~~
156153

157-
Using a boolean vector to index a Series works exactly like an ndarray:
154+
.. _indexing.boolean:
155+
156+
Using a boolean vector to index a Series works exactly as in a numpy ndarray:
158157

159158
.. ipython:: python
160159
161160
s[s > 0]
162161
s[(s < 0) & (s > -0.5)]
163162
164-
Again as a convenience, selecting rows from a DataFrame using a boolean vector
165-
the same length as the DataFrame's index (for example, something derived from
166-
one of the columns of the DataFrame) is supported:
163+
You may select rows from a DataFrame using a boolean vector the same length as
164+
the DataFrame's index (for example, something derived from one of the columns
165+
of the DataFrame):
167166

168167
.. ipython:: python
169168
170169
df[df['A'] > 0]
171170
172-
As we will see later on, the same operation could be accomplished by
173-
reindexing. However, the syntax would be more verbose; hence, the inclusion of
174-
this indexing method.
171+
Consider the ``isin`` method of Series, which returns a boolean vector that is
172+
true wherever the Series elements exist in the passed list. This allows you to
173+
select out rows where one or more columns have values you want:
174+
175+
.. ipython:: python
176+
177+
df2 = DataFrame({'a' : ['one', 'one', 'two', 'three', 'two', 'one', 'six'],
178+
'b' : ['x', 'y', 'y', 'x', 'y', 'x', 'x'],
179+
'c' : np.random.randn(7)})
180+
df2[df2['a'].isin(['one', 'two'])]
175181
176-
With the advanced indexing capabilities discussed later, you are able to do
177-
boolean indexing in any of axes or combine a boolean vector with an indexing
178-
expression on one of the other axes
182+
Note, with the :ref:`advanced indexing <indexing.advanced>` ``ix`` method, you
183+
may select along more than one axis using boolean vectors combined with other
184+
indexing expressions.
179185

180186
Indexing a DataFrame with a boolean DataFrame
181187
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -202,19 +208,32 @@ Take Methods
202208

203209
TODO: Fill Me In
204210

205-
206-
Slicing ranges
211+
Duplicate Data
207212
~~~~~~~~~~~~~~
208213

209-
Similar to Python lists and ndarrays, for convenience DataFrame
210-
supports slicing:
214+
.. _indexing.duplicate:
211215

212-
.. ipython:: python
216+
If you want to indentify and remove duplicate rows in a DataFrame, there are
217+
two methods that will help: ``duplicated`` and ``drop_duplicates``. Each
218+
takes as an argument the columns to use to identify duplicated rows.
219+
220+
``duplicated`` returns a boolean vector whose length is the number of rows, and
221+
which indicates whether a row is duplicated.
213222

214-
df[:2]
215-
df[::-1]
216-
df[-3:].T
223+
``drop_duplicates`` removes duplicate rows.
224+
225+
By default, the first observed row of a duplicate set is considered unique, but
226+
each method has a ``take_last`` parameter that indicates the last observed row
227+
should be taken instead.
228+
229+
.. ipython:: python
217230
231+
df2 = DataFrame({'a' : ['one', 'one', 'two', 'three', 'two', 'one', 'six'],
232+
'b' : ['x', 'y', 'y', 'x', 'y', 'x', 'x'],
233+
'c' : np.random.randn(7)})
234+
df2.duplicated(['a','b'])
235+
df2.drop_duplicates(['a','b'])
236+
df2.drop_duplicates(['a','b'], take_last=True)
218237
219238
.. _indexing.advanced:
220239

doc/source/io.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,9 @@ data into a DataFrame object. They can take a number of arguments:
5757
below in the section on :ref:`iterating and chunking <io.chunking>`
5858
- ``iterator``: If True, return a ``TextParser`` to enable reading a file
5959
into memory piece by piece
60+
- ``skip_footer``: number of lines to skip at bottom of file (default 0)
61+
- ``converters``: a dictionary of functions for converting values in certain
62+
columns, where keys are either integers or column labels
6063

6164
.. ipython:: python
6265
:suppress:

doc/source/whatsnew/v0.6.0.rst

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,18 +6,18 @@ v.0.6.0 (November 25, 2011)
66
New Features
77
~~~~~~~~~~~~
88
- Add ``melt`` function to ``pandas.core.reshape``
9-
- Add ``level`` parameter to group by level in Series and DataFrame descriptive statistics (PR313_)
10-
- Add ``head`` and ``tail`` methods to Series, analogous to to DataFrame (PR296_)
11-
- Add ``Series.isin`` function which checks if each value is contained in a passed sequence (GH289_)
9+
:ref:`Added <groupby.multindex>` ``level`` parameter to group by level in Series and DataFrame descriptive statistics (PR313_)
10+
- :ref:`Added <basics.head_tail>` ``head`` and ``tail`` methods to Series, analogous to to DataFrame (PR296_)
11+
- :ref:`Added <indexing.boolean>` ``Series.isin`` function which checks if each value is contained in a passed sequence (GH289_)
1212
- Add ``float_format`` option to ``Series.to_string``
13-
- MAYBE DOCUMENTED? Add ``skip_footer`` (GH291_) and ``converters`` (GH343_) options to ``read_csv`` and ``read_table``
14-
- Add proper, tested weighted least squares to standard and panel OLS (GH303_)
15-
- Add ``drop_duplicates`` and ``duplicated`` functions for removing duplicate DataFrame rows and checking for duplicate rows, respectively (GH319_)
16-
- Implement logical (boolean) operators '&', '|', '^', '~' on DataFrame (GH347_)
13+
- :ref:`Added <io.parse_dates>` ``skip_footer`` (GH291_) and ``converters`` (GH343_) options to ``read_csv`` and ``read_table``
14+
- Added proper, tested weighted least squares to standard and panel OLS (GH303_)
15+
- :ref:`Added <indexing.duplicate>` ``drop_duplicates`` and ``duplicated`` functions for removing duplicate DataFrame rows and checking for duplicate rows, respectively (GH319_)
16+
- :ref:`Implemented <dsintro.boolean>` operators '&', '|', '^', '-' on DataFrame (GH347_)
1717
- MAYBE ? Add ``Series.mad``, mean absolute deviation, matching DataFrame
1818
- MAYBE? Add ``QuarterEnd`` DateOffset (PR321_)
1919
- Add matrix multiplication function ``dot`` to DataFrame (GH65_)
20-
- Add ``orient``5 option to ``Panel.from_dict`` to ease creation of mixed-type Panels (GH359_, GH301_)
20+
- Add ``orient`` option to ``Panel.from_dict`` to ease creation of mixed-type Panels (GH359_, GH301_)
2121
- Add ``DataFrame.from_dict`` with similar ``orient`` option
2222
- Can now pass list of tuples or list of lists to ``DataFrame.from_records`` for fast conversion to DataFrame (GH357_)
2323
- Can pass multiple levels to groupby, e.g. ``df.groupby(level=[0, 1])`` (GH103_)

0 commit comments

Comments
 (0)