4
4
.. ipython :: python
5
5
:suppress:
6
6
7
- from pandas import *
7
+ import pandas as pd
8
+ import numpy as np
8
9
options.display.max_rows= 15
9
10
10
11
Comparison with R / R libraries
@@ -38,25 +39,25 @@ The :meth:`~pandas.DataFrame.query` method is similar to the base R ``subset``
38
39
function. In R you might want to get the rows of a ``data.frame `` where one
39
40
column's values are less than another column's values:
40
41
41
- .. code-block :: r
42
+ .. code-block :: r
42
43
43
- df <- data.frame(a=rnorm(10), b=rnorm(10))
44
- subset(df, a <= b)
45
- df[df$a <= df$b,] # note the comma
44
+ df <- data.frame(a=rnorm(10), b=rnorm(10))
45
+ subset(df, a <= b)
46
+ df[df$a <= df$b,] # note the comma
46
47
47
48
In ``pandas ``, there are a few ways to perform subsetting. You can use
48
49
:meth: `~pandas.DataFrame.query ` or pass an expression as if it were an
49
50
index/slice as well as standard boolean indexing:
50
51
51
- .. ipython :: python
52
+ .. ipython :: python
52
53
53
- from pandas import DataFrame
54
- from numpy.random import randn
54
+ from pandas import DataFrame
55
+ from numpy import random
55
56
56
- df = DataFrame({' a' : randn(10 ), ' b' : randn(10 )})
57
- df.query(' a <= b' )
58
- df[df.a <= df.b]
59
- df.loc[df.a <= df.b]
57
+ df = DataFrame({' a' : random. randn(10 ), ' b' : random. randn(10 )})
58
+ df.query(' a <= b' )
59
+ df[df.a <= df.b]
60
+ df.loc[df.a <= df.b]
60
61
61
62
For more details and examples see :ref: `the query documentation
62
63
<indexing.query>`.
@@ -70,20 +71,20 @@ For more details and examples see :ref:`the query documentation
70
71
An expression using a data.frame called ``df `` in R with the columns ``a `` and
71
72
``b `` would be evaluated using ``with `` like so:
72
73
73
- .. code-block :: r
74
+ .. code-block :: r
74
75
75
- df <- data.frame(a=rnorm(10), b=rnorm(10))
76
- with(df, a + b)
77
- df$a + df$b # same as the previous expression
76
+ df <- data.frame(a=rnorm(10), b=rnorm(10))
77
+ with(df, a + b)
78
+ df$a + df$b # same as the previous expression
78
79
79
80
In ``pandas `` the equivalent expression, using the
80
81
:meth: `~pandas.DataFrame.eval ` method, would be:
81
82
82
- .. ipython :: python
83
+ .. ipython :: python
83
84
84
- df = DataFrame({' a' : randn(10 ), ' b' : randn(10 )})
85
- df.eval(' a + b' )
86
- df.a + df.b # same as the previous expression
85
+ df = DataFrame({' a' : random. randn(10 ), ' b' : random. randn(10 )})
86
+ df.eval(' a + b' )
87
+ df.a + df.b # same as the previous expression
87
88
88
89
In certain cases :meth: `~pandas.DataFrame.eval ` will be much faster than
89
90
evaluation in pure Python. For more details and examples see :ref: `the eval
98
99
plyr
99
100
----
100
101
102
+ ``plyr `` is an R library for the split-apply-combine strategy for data
103
+ analysis. The functions revolve around three data structures in R, ``a ``
104
+ for ``arrays ``, ``l `` for ``lists ``, and ``d `` for ``data.frame ``. The
105
+ table below shows how these data structures could be mapped in Python.
106
+
107
+ +------------+-------------------------------+
108
+ | R | Python |
109
+ +============+===============================+
110
+ | array | list |
111
+ +------------+-------------------------------+
112
+ | lists | dictionary or list of objects |
113
+ +------------+-------------------------------+
114
+ | data.frame | dataframe |
115
+ +------------+-------------------------------+
116
+
117
+ |ddply |_
118
+ ~~~~~~~~
119
+
120
+ An expression using a data.frame called ``df `` in R where you want to
121
+ summarize ``x `` by ``month ``:
122
+
123
+
124
+
125
+ .. code-block :: r
126
+
127
+ require(plyr)
128
+ df <- data.frame(
129
+ x = runif(120, 1, 168),
130
+ y = runif(120, 7, 334),
131
+ z = runif(120, 1.7, 20.7),
132
+ month = rep(c(5,6,7,8),30),
133
+ week = sample(1:4, 120, TRUE)
134
+ )
135
+
136
+ ddply(df, .(month, week), summarize,
137
+ mean = round(mean(x), 2),
138
+ sd = round(sd(x), 2))
139
+
140
+ In ``pandas `` the equivalent expression, using the
141
+ :meth: `~pandas.DataFrame.groupby ` method, would be:
142
+
143
+
144
+
145
+ .. ipython :: python
146
+
147
+ df = DataFrame({
148
+ ' x' : random.uniform(1 ., 168 ., 120 ),
149
+ ' y' : random.uniform(7 ., 334 ., 120 ),
150
+ ' z' : random.uniform(1.7 , 20.7 , 120 ),
151
+ ' month' : [5 ,6 ,7 ,8 ]* 30 ,
152
+ ' week' : random.randint(1 ,4 , 120 )
153
+ })
154
+
155
+ grouped = df.groupby([' month' ,' week' ])
156
+ print grouped[' x' ].agg([np.mean, np.std])
157
+
158
+
159
+ For more details and examples see :ref: `the groupby documentation
160
+ <groupby.aggregate>`.
161
+
101
162
reshape / reshape2
102
163
------------------
103
164
165
+ |meltarray |_
166
+ ~~~~~~~~~~~~~
167
+
168
+ An expression using a 3 dimensional array called ``a `` in R where you want to
169
+ melt it into a data.frame:
170
+
171
+ .. code-block :: r
172
+
173
+ a <- array(c(1:23, NA), c(2,3,4))
174
+ data.frame(melt(a))
175
+
176
+ In Python, since ``a `` is a list, you can simply use list comprehension.
177
+
178
+ .. ipython :: python
179
+
180
+ a = np.array(range (1 ,24 )+ [np.NAN ]).reshape(2 ,3 ,4 )
181
+ DataFrame([tuple (list (x)+ [val]) for x, val in np.ndenumerate(a)])
182
+
183
+ |meltlist |_
184
+ ~~~~~~~~~~~~
185
+
186
+ An expression using a list called ``a `` in R where you want to melt it
187
+ into a data.frame:
188
+
189
+ .. code-block :: r
190
+
191
+ a <- as.list(c(1:4, NA))
192
+ data.frame(melt(a))
193
+
194
+ In Python, this list would be a list of tuples, so
195
+ :meth: `~pandas.DataFrame ` method would convert it to a dataframe as required.
196
+
197
+ .. ipython :: python
198
+
199
+ a = list (enumerate (range (1 ,5 )+ [np.NAN ]))
200
+ DataFrame(a)
201
+
202
+ For more details and examples see :ref: `the Into to Data Structures
203
+ documentation <basics.dataframe.from_items>`.
204
+
205
+ |meltdf |_
206
+ ~~~~~~~~~~~~~~~~
207
+
208
+ An expression using a data.frame called ``cheese `` in R where you want to
209
+ reshape the data.frame:
210
+
211
+ .. code-block :: r
212
+
213
+ cheese <- data.frame(
214
+ first = c('John', 'Mary'),
215
+ last = c('Doe', 'Bo'),
216
+ height = c(5.5, 6.0),
217
+ weight = c(130, 150)
218
+ )
219
+ melt(cheese, id=c("first", "last"))
220
+
221
+ In Python, the :meth: `~pandas.melt ` method is the R equivalent:
222
+
223
+ .. ipython :: python
224
+
225
+ cheese = DataFrame({' first' : [' John' , ' Mary' ],
226
+ ' last' : [' Doe' , ' Bo' ],
227
+ ' height' : [5.5 , 6.0 ],
228
+ ' weight' : [130 , 150 ]})
229
+ pd.melt(cheese, id_vars = [' first' , ' last' ])
230
+ cheese.set_index([' first' , ' last' ]).stack() # alternative way
231
+
232
+ For more details and examples see :ref: `the reshaping documentation
233
+ <reshaping.melt>`.
234
+
235
+ |cast |_
236
+ ~~~~~~~
237
+
238
+ An expression using a data.frame called ``df `` in R to cast into a higher
239
+ dimensional array:
240
+
241
+ .. code-block :: r
242
+
243
+ df <- data.frame(
244
+ x = runif(12, 1, 168),
245
+ y = runif(12, 7, 334),
246
+ z = runif(12, 1.7, 20.7),
247
+ month = rep(c(5,6,7),4),
248
+ week = rep(c(1,2), 6)
249
+ )
250
+
251
+ mdf <- melt(df, id=c("month", "week"))
252
+ acast(mdf, week ~ month ~ variable, mean)
253
+
254
+ In Python the best way is to make use of :meth: `~pandas.pivot_table `:
255
+
256
+ .. ipython :: python
257
+
258
+ df = DataFrame({
259
+ ' x' : random.uniform(1 ., 168 ., 12 ),
260
+ ' y' : random.uniform(7 ., 334 ., 12 ),
261
+ ' z' : random.uniform(1.7 , 20.7 , 12 ),
262
+ ' month' : [5 ,6 ,7 ]* 4 ,
263
+ ' week' : [1 ,2 ]* 6
264
+ })
265
+ mdf = pd.melt(df, id_vars = [' month' , ' week' ])
266
+ pd.pivot_table(mdf, values = ' value' , rows = [' variable' ,' week' ],
267
+ cols = [' month' ], aggfunc = np.mean)
268
+
269
+ For more details and examples see :ref: `the reshaping documentation
270
+ <reshaping.pivot>`.
104
271
105
272
.. |with | replace :: ``with ``
106
273
.. _with : http://finzi.psych.upenn.edu/R/library/base/html/with.html
107
274
108
275
.. |subset | replace :: ``subset ``
109
276
.. _subset : http://finzi.psych.upenn.edu/R/library/base/html/subset.html
277
+
278
+ .. |ddply | replace :: ``ddply ``
279
+ .. _ddply : http://www.inside-r.org/packages/cran/plyr/docs/ddply
280
+
281
+ .. |meltarray | replace :: ``melt.array ``
282
+ .. _meltarray : http://www.inside-r.org/packages/cran/reshape2/docs/melt.array
283
+
284
+ .. |meltlist | replace :: ``melt.list ``
285
+ .. meltlist: http://www.inside-r.org/packages/cran/reshape2/docs/melt.list
286
+
287
+ .. |meltdf | replace :: ``melt.data.frame ``
288
+ .. meltdf: http://www.inside-r.org/packages/cran/reshape2/docs/melt.data.frame
289
+
290
+ .. |cast | replace :: ``cast ``
291
+ .. cast: http://www.inside-r.org/packages/cran/reshape2/docs/cast
292
+
0 commit comments