4
4
.. ipython :: python
5
5
:suppress:
6
6
7
- from pandas import *
8
- import numpy.random as random
9
- from numpy import *
7
+ import pandas as pd
8
+ import numpy as np
10
9
options.display.max_rows= 15
11
10
12
11
Comparison with R / R libraries
@@ -40,25 +39,25 @@ The :meth:`~pandas.DataFrame.query` method is similar to the base R ``subset``
40
39
function. In R you might want to get the rows of a ``data.frame `` where one
41
40
column's values are less than another column's values:
42
41
43
- .. code-block :: r
42
+ .. code-block :: r
44
43
45
- df <- data.frame(a=rnorm(10), b=rnorm(10))
46
- subset(df, a <= b)
47
- df[df$a <= df$b,] # note the comma
44
+ df <- data.frame(a=rnorm(10), b=rnorm(10))
45
+ subset(df, a <= b)
46
+ df[df$a <= df$b,] # note the comma
48
47
49
48
In ``pandas ``, there are a few ways to perform subsetting. You can use
50
49
:meth: `~pandas.DataFrame.query ` or pass an expression as if it were an
51
50
index/slice as well as standard boolean indexing:
52
51
53
- .. ipython :: python
52
+ .. ipython :: python
54
53
55
- from pandas import DataFrame
56
- from numpy.random import randn
54
+ from pandas import DataFrame
55
+ from numpy import random
57
56
58
- df = DataFrame({' a' : randn(10 ), ' b' : randn(10 )})
59
- df.query(' a <= b' )
60
- df[df.a <= df.b]
61
- df.loc[df.a <= df.b]
57
+ df = DataFrame({' a' : random. randn(10 ), ' b' : random. randn(10 )})
58
+ df.query(' a <= b' )
59
+ df[df.a <= df.b]
60
+ df.loc[df.a <= df.b]
62
61
63
62
For more details and examples see :ref: `the query documentation
64
63
<indexing.query>`.
@@ -72,20 +71,20 @@ For more details and examples see :ref:`the query documentation
72
71
An expression using a data.frame called ``df `` in R with the columns ``a `` and
73
72
``b `` would be evaluated using ``with `` like so:
74
73
75
- .. code-block :: r
74
+ .. code-block :: r
76
75
77
- df <- data.frame(a=rnorm(10), b=rnorm(10))
78
- with(df, a + b)
79
- df$a + df$b # same as the previous expression
76
+ df <- data.frame(a=rnorm(10), b=rnorm(10))
77
+ with(df, a + b)
78
+ df$a + df$b # same as the previous expression
80
79
81
80
In ``pandas `` the equivalent expression, using the
82
81
:meth: `~pandas.DataFrame.eval ` method, would be:
83
82
84
- .. ipython :: python
83
+ .. ipython :: python
85
84
86
- df = DataFrame({' a' : randn(10 ), ' b' : randn(10 )})
87
- df.eval(' a + b' )
88
- df.a + df.b # same as the previous expression
85
+ df = DataFrame({' a' : random. randn(10 ), ' b' : random. randn(10 )})
86
+ df.eval(' a + b' )
87
+ df.a + df.b # same as the previous expression
89
88
90
89
In certain cases :meth: `~pandas.DataFrame.eval ` will be much faster than
91
90
evaluation in pure Python. For more details and examples see :ref: `the eval
@@ -123,38 +122,38 @@ summarize ``x`` by ``month``:
123
122
124
123
125
124
126
- .. code-block :: r
125
+ .. code-block :: r
127
126
128
- require(plyr)
129
- df <- data.frame(
130
- x = runif(120, 1, 168),
131
- y = runif(120, 7, 334),
132
- z = runif(120, 1.7, 20.7),
133
- month = rep(c(5,6,7,8),30),
134
- week = sample(1:4, 120, TRUE)
135
- )
127
+ require(plyr)
128
+ df <- data.frame(
129
+ x = runif(120, 1, 168),
130
+ y = runif(120, 7, 334),
131
+ z = runif(120, 1.7, 20.7),
132
+ month = rep(c(5,6,7,8),30),
133
+ week = sample(1:4, 120, TRUE)
134
+ )
136
135
137
- ddply(df, .(month, week), summarize,
138
- mean = round(mean(x), 2),
139
- sd = round(sd(x), 2))
136
+ ddply(df, .(month, week), summarize,
137
+ mean = round(mean(x), 2),
138
+ sd = round(sd(x), 2))
140
139
141
140
In ``pandas `` the equivalent expression, using the
142
141
:meth: `~pandas.DataFrame.groupby ` method, would be:
143
142
144
143
145
144
146
- .. ipython :: python
145
+ .. ipython :: python
147
146
148
- df = DataFrame({
149
- ' x' : random.uniform(1 ., 168 ., 120 ),
150
- ' y' : random.uniform(7 ., 334 ., 120 ),
151
- ' z' : random.uniform(1.7 , 20.7 , 120 ),
152
- ' month' : [5 ,6 ,7 ,8 ]* 30 ,
153
- ' week' : random.randint(1 ,4 , 120 )
154
- })
147
+ df = DataFrame({
148
+ ' x' : random.uniform(1 ., 168 ., 120 ),
149
+ ' y' : random.uniform(7 ., 334 ., 120 ),
150
+ ' z' : random.uniform(1.7 , 20.7 , 120 ),
151
+ ' month' : [5 ,6 ,7 ,8 ]* 30 ,
152
+ ' week' : random.randint(1 ,4 , 120 )
153
+ })
155
154
156
- grouped = df.groupby([' month' ,' week' ])
157
- print grouped[' x' ].agg([mean, std])
155
+ grouped = df.groupby([' month' ,' week' ])
156
+ print grouped[' x' ].agg([np. mean, np. std])
158
157
159
158
160
159
For more details and examples see :ref: `the groupby documentation
@@ -169,35 +168,36 @@ reshape / reshape2
169
168
An expression using a 3 dimensional array called ``a `` in R where you want to
170
169
melt it into a data.frame:
171
170
172
- .. code-block :: r
171
+ .. code-block :: r
173
172
174
- a <- array(c(1:23, NA), c(2,3,4))
175
- data.frame(melt(a))
173
+ a <- array(c(1:23, NA), c(2,3,4))
174
+ data.frame(melt(a))
176
175
177
176
In Python, since ``a `` is a list, you can simply use list comprehension.
178
177
179
- .. ipython :: python
180
- a = array(range (1 ,24 )+ [NAN ]).reshape(2 ,3 ,4 )
181
- DataFrame([tuple (list (x)+ [val]) for x, val in ndenumerate(a)])
178
+ .. ipython :: python
179
+
180
+ a = np.array(range (1 ,24 )+ [np.NAN ]).reshape(2 ,3 ,4 )
181
+ DataFrame([tuple (list (x)+ [val]) for x, val in np.ndenumerate(a)])
182
182
183
183
|meltlist |_
184
184
~~~~~~~~~~~~
185
185
186
186
An expression using a list called ``a `` in R where you want to melt it
187
187
into a data.frame:
188
188
189
- .. code-block :: r
189
+ .. code-block :: r
190
190
191
- a <- as.list(c(1:4, NA))
192
- data.frame(melt(a))
191
+ a <- as.list(c(1:4, NA))
192
+ data.frame(melt(a))
193
193
194
194
In Python, this list would be a list of tuples, so
195
195
:meth: `~pandas.DataFrame ` method would convert it to a dataframe as required.
196
196
197
- .. ipython :: python
197
+ .. ipython :: python
198
198
199
- a = list (enumerate (range (1 ,5 )+ [NAN ]))
200
- DataFrame(a)
199
+ a = list (enumerate (range (1 ,5 )+ [np. NAN ]))
200
+ DataFrame(a)
201
201
202
202
For more details and examples see :ref: `the Into to Data Structures
203
203
documentation <basics.dataframe.from_items>`.
@@ -208,26 +208,26 @@ documentation <basics.dataframe.from_items>`.
208
208
An expression using a data.frame called ``cheese `` in R where you want to
209
209
reshape the data.frame:
210
210
211
- .. code-block :: r
211
+ .. code-block :: r
212
212
213
- cheese <- data.frame(
214
- first = c('John, Mary'),
215
- last = c('Doe', 'Bo'),
216
- height = c(5.5, 6.0),
217
- weight = c(130, 150)
218
- )
219
- melt(cheese, id=c("first", "last"))
213
+ cheese <- data.frame(
214
+ first = c('John, Mary'),
215
+ last = c('Doe', 'Bo'),
216
+ height = c(5.5, 6.0),
217
+ weight = c(130, 150)
218
+ )
219
+ melt(cheese, id=c("first", "last"))
220
220
221
221
In Python, the :meth: `~pandas.melt ` method is the R equivalent:
222
222
223
- .. ipython :: python
223
+ .. ipython :: python
224
224
225
- cheese = DataFrame({' first' : [' John' , ' Mary' ],
226
- ' last' : [' Doe' , ' Bo' ],
227
- ' height' : [5.5 , 6.0 ],
228
- ' weight' : [130 , 150 ]})
229
- melt(cheese, id_vars = [' first' , ' last' ])
230
- cheese.set_index([' first' , ' last' ]).stack() # alternative way
225
+ cheese = DataFrame({' first' : [' John' , ' Mary' ],
226
+ ' last' : [' Doe' , ' Bo' ],
227
+ ' height' : [5.5 , 6.0 ],
228
+ ' weight' : [130 , 150 ]})
229
+ pd. melt(cheese, id_vars = [' first' , ' last' ])
230
+ cheese.set_index([' first' , ' last' ]).stack() # alternative way
231
231
232
232
For more details and examples see :ref: `the reshaping documentation
233
233
<reshaping.melt>`.
@@ -238,33 +238,33 @@ For more details and examples see :ref:`the reshaping documentation
238
238
An expression using a data.frame called ``df `` in R to cast into a higher
239
239
dimensional array:
240
240
241
- .. code-block :: r
241
+ .. code-block :: r
242
242
243
- df <- data.frame(
244
- x = runif(12, 1, 168),
245
- y = runif(12, 7, 334),
246
- z = runif(12, 1.7, 20.7),
247
- month = rep(c(5,6,7),4),
248
- week = rep(c(1,2), 6)
249
- )
243
+ df <- data.frame(
244
+ x = runif(12, 1, 168),
245
+ y = runif(12, 7, 334),
246
+ z = runif(12, 1.7, 20.7),
247
+ month = rep(c(5,6,7),4),
248
+ week = rep(c(1,2), 6)
249
+ )
250
250
251
- mdf <- melt(df, id=c("month", "week"))
252
- acast(mdf, week ~ month ~ variable, mean)
251
+ mdf <- melt(df, id=c("month", "week"))
252
+ acast(mdf, week ~ month ~ variable, mean)
253
253
254
254
In Python the best way is to make use of :meth: `~pandas.pivot_table `:
255
255
256
- .. ipython :: python
257
-
258
- df = DataFrame({
259
- ' x' : random.uniform(1 ., 168 ., 12 ),
260
- ' y' : random.uniform(7 ., 334 ., 12 ),
261
- ' z' : random.uniform(1.7 , 20.7 , 12 ),
262
- ' month' : [5 ,6 ,7 ]* 4 ,
263
- ' week' : [1 ,2 ]* 6
264
- })
265
- mdf = melt(df, id_vars = [' month' , ' week' ])
266
- pivot_table(mdf, values = ' value' , rows = [' variable' ,' week' ],
267
- cols = [' month' ], aggfunc = mean)
256
+ .. ipython :: python
257
+
258
+ df = DataFrame({
259
+ ' x' : random.uniform(1 ., 168 ., 12 ),
260
+ ' y' : random.uniform(7 ., 334 ., 12 ),
261
+ ' z' : random.uniform(1.7 , 20.7 , 12 ),
262
+ ' month' : [5 ,6 ,7 ]* 4 ,
263
+ ' week' : [1 ,2 ]* 6
264
+ })
265
+ mdf = pd. melt(df, id_vars = [' month' , ' week' ])
266
+ pd. pivot_table(mdf, values = ' value' , rows = [' variable' ,' week' ],
267
+ cols = [' month' ], aggfunc = np. mean)
268
268
269
269
For more details and examples see :ref: `the reshaping documentation
270
270
<reshaping.pivot>`.
0 commit comments