Skip to content

Commit 12eb775

Browse files
committed
Merge pull request #5761 from chappers/r-compare
DOC: Flesh out the R comparison section of docs (GH3980)
2 parents 8552b6a + 75d010e commit 12eb775

File tree

1 file changed

+203
-20
lines changed

1 file changed

+203
-20
lines changed

doc/source/comparison_with_r.rst

+203-20
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,8 @@
44
.. ipython:: python
55
:suppress:
66
7-
from pandas import *
7+
import pandas as pd
8+
import numpy as np
89
options.display.max_rows=15
910
1011
Comparison with R / R libraries
@@ -38,25 +39,25 @@ The :meth:`~pandas.DataFrame.query` method is similar to the base R ``subset``
3839
function. In R you might want to get the rows of a ``data.frame`` where one
3940
column's values are less than another column's values:
4041

41-
.. code-block:: r
42+
.. code-block:: r
4243
43-
df <- data.frame(a=rnorm(10), b=rnorm(10))
44-
subset(df, a <= b)
45-
df[df$a <= df$b,] # note the comma
44+
df <- data.frame(a=rnorm(10), b=rnorm(10))
45+
subset(df, a <= b)
46+
df[df$a <= df$b,] # note the comma
4647
4748
In ``pandas``, there are a few ways to perform subsetting. You can use
4849
:meth:`~pandas.DataFrame.query` or pass an expression as if it were an
4950
index/slice as well as standard boolean indexing:
5051

51-
.. ipython:: python
52+
.. ipython:: python
5253
53-
from pandas import DataFrame
54-
from numpy.random import randn
54+
from pandas import DataFrame
55+
from numpy import random
5556
56-
df = DataFrame({'a': randn(10), 'b': randn(10)})
57-
df.query('a <= b')
58-
df[df.a <= df.b]
59-
df.loc[df.a <= df.b]
57+
df = DataFrame({'a': random.randn(10), 'b': random.randn(10)})
58+
df.query('a <= b')
59+
df[df.a <= df.b]
60+
df.loc[df.a <= df.b]
6061
6162
For more details and examples see :ref:`the query documentation
6263
<indexing.query>`.
@@ -70,20 +71,20 @@ For more details and examples see :ref:`the query documentation
7071
An expression using a data.frame called ``df`` in R with the columns ``a`` and
7172
``b`` would be evaluated using ``with`` like so:
7273

73-
.. code-block:: r
74+
.. code-block:: r
7475
75-
df <- data.frame(a=rnorm(10), b=rnorm(10))
76-
with(df, a + b)
77-
df$a + df$b # same as the previous expression
76+
df <- data.frame(a=rnorm(10), b=rnorm(10))
77+
with(df, a + b)
78+
df$a + df$b # same as the previous expression
7879
7980
In ``pandas`` the equivalent expression, using the
8081
:meth:`~pandas.DataFrame.eval` method, would be:
8182

82-
.. ipython:: python
83+
.. ipython:: python
8384
84-
df = DataFrame({'a': randn(10), 'b': randn(10)})
85-
df.eval('a + b')
86-
df.a + df.b # same as the previous expression
85+
df = DataFrame({'a': random.randn(10), 'b': random.randn(10)})
86+
df.eval('a + b')
87+
df.a + df.b # same as the previous expression
8788
8889
In certain cases :meth:`~pandas.DataFrame.eval` will be much faster than
8990
evaluation in pure Python. For more details and examples see :ref:`the eval
@@ -98,12 +99,194 @@ xts
9899
plyr
99100
----
100101

102+
``plyr`` is an R library for the split-apply-combine strategy for data
103+
analysis. The functions revolve around three data structures in R, ``a``
104+
for ``arrays``, ``l`` for ``lists``, and ``d`` for ``data.frame``. The
105+
table below shows how these data structures could be mapped in Python.
106+
107+
+------------+-------------------------------+
108+
| R | Python |
109+
+============+===============================+
110+
| array | list |
111+
+------------+-------------------------------+
112+
| lists | dictionary or list of objects |
113+
+------------+-------------------------------+
114+
| data.frame | dataframe |
115+
+------------+-------------------------------+
116+
117+
|ddply|_
118+
~~~~~~~~
119+
120+
An expression using a data.frame called ``df`` in R where you want to
121+
summarize ``x`` by ``month``:
122+
123+
124+
125+
.. code-block:: r
126+
127+
require(plyr)
128+
df <- data.frame(
129+
x = runif(120, 1, 168),
130+
y = runif(120, 7, 334),
131+
z = runif(120, 1.7, 20.7),
132+
month = rep(c(5,6,7,8),30),
133+
week = sample(1:4, 120, TRUE)
134+
)
135+
136+
ddply(df, .(month, week), summarize,
137+
mean = round(mean(x), 2),
138+
sd = round(sd(x), 2))
139+
140+
In ``pandas`` the equivalent expression, using the
141+
:meth:`~pandas.DataFrame.groupby` method, would be:
142+
143+
144+
145+
.. ipython:: python
146+
147+
df = DataFrame({
148+
'x': random.uniform(1., 168., 120),
149+
'y': random.uniform(7., 334., 120),
150+
'z': random.uniform(1.7, 20.7, 120),
151+
'month': [5,6,7,8]*30,
152+
'week': random.randint(1,4, 120)
153+
})
154+
155+
grouped = df.groupby(['month','week'])
156+
print grouped['x'].agg([np.mean, np.std])
157+
158+
159+
For more details and examples see :ref:`the groupby documentation
160+
<groupby.aggregate>`.
161+
101162
reshape / reshape2
102163
------------------
103164

165+
|meltarray|_
166+
~~~~~~~~~~~~~
167+
168+
An expression using a 3 dimensional array called ``a`` in R where you want to
169+
melt it into a data.frame:
170+
171+
.. code-block:: r
172+
173+
a <- array(c(1:23, NA), c(2,3,4))
174+
data.frame(melt(a))
175+
176+
In Python, since ``a`` is a list, you can simply use list comprehension.
177+
178+
.. ipython:: python
179+
180+
a = np.array(range(1,24)+[np.NAN]).reshape(2,3,4)
181+
DataFrame([tuple(list(x)+[val]) for x, val in np.ndenumerate(a)])
182+
183+
|meltlist|_
184+
~~~~~~~~~~~~
185+
186+
An expression using a list called ``a`` in R where you want to melt it
187+
into a data.frame:
188+
189+
.. code-block:: r
190+
191+
a <- as.list(c(1:4, NA))
192+
data.frame(melt(a))
193+
194+
In Python, this list would be a list of tuples, so
195+
:meth:`~pandas.DataFrame` method would convert it to a dataframe as required.
196+
197+
.. ipython:: python
198+
199+
a = list(enumerate(range(1,5)+[np.NAN]))
200+
DataFrame(a)
201+
202+
For more details and examples see :ref:`the Into to Data Structures
203+
documentation <basics.dataframe.from_items>`.
204+
205+
|meltdf|_
206+
~~~~~~~~~~~~~~~~
207+
208+
An expression using a data.frame called ``cheese`` in R where you want to
209+
reshape the data.frame:
210+
211+
.. code-block:: r
212+
213+
cheese <- data.frame(
214+
first = c('John', 'Mary'),
215+
last = c('Doe', 'Bo'),
216+
height = c(5.5, 6.0),
217+
weight = c(130, 150)
218+
)
219+
melt(cheese, id=c("first", "last"))
220+
221+
In Python, the :meth:`~pandas.melt` method is the R equivalent:
222+
223+
.. ipython:: python
224+
225+
cheese = DataFrame({'first' : ['John', 'Mary'],
226+
'last' : ['Doe', 'Bo'],
227+
'height' : [5.5, 6.0],
228+
'weight' : [130, 150]})
229+
pd.melt(cheese, id_vars=['first', 'last'])
230+
cheese.set_index(['first', 'last']).stack() # alternative way
231+
232+
For more details and examples see :ref:`the reshaping documentation
233+
<reshaping.melt>`.
234+
235+
|cast|_
236+
~~~~~~~
237+
238+
An expression using a data.frame called ``df`` in R to cast into a higher
239+
dimensional array:
240+
241+
.. code-block:: r
242+
243+
df <- data.frame(
244+
x = runif(12, 1, 168),
245+
y = runif(12, 7, 334),
246+
z = runif(12, 1.7, 20.7),
247+
month = rep(c(5,6,7),4),
248+
week = rep(c(1,2), 6)
249+
)
250+
251+
mdf <- melt(df, id=c("month", "week"))
252+
acast(mdf, week ~ month ~ variable, mean)
253+
254+
In Python the best way is to make use of :meth:`~pandas.pivot_table`:
255+
256+
.. ipython:: python
257+
258+
df = DataFrame({
259+
'x': random.uniform(1., 168., 12),
260+
'y': random.uniform(7., 334., 12),
261+
'z': random.uniform(1.7, 20.7, 12),
262+
'month': [5,6,7]*4,
263+
'week': [1,2]*6
264+
})
265+
mdf = pd.melt(df, id_vars=['month', 'week'])
266+
pd.pivot_table(mdf, values='value', rows=['variable','week'],
267+
cols=['month'], aggfunc=np.mean)
268+
269+
For more details and examples see :ref:`the reshaping documentation
270+
<reshaping.pivot>`.
104271

105272
.. |with| replace:: ``with``
106273
.. _with: http://finzi.psych.upenn.edu/R/library/base/html/with.html
107274

108275
.. |subset| replace:: ``subset``
109276
.. _subset: http://finzi.psych.upenn.edu/R/library/base/html/subset.html
277+
278+
.. |ddply| replace:: ``ddply``
279+
.. _ddply: http://www.inside-r.org/packages/cran/plyr/docs/ddply
280+
281+
.. |meltarray| replace:: ``melt.array``
282+
.. _meltarray: http://www.inside-r.org/packages/cran/reshape2/docs/melt.array
283+
284+
.. |meltlist| replace:: ``melt.list``
285+
.. meltlist: http://www.inside-r.org/packages/cran/reshape2/docs/melt.list
286+
287+
.. |meltdf| replace:: ``melt.data.frame``
288+
.. meltdf: http://www.inside-r.org/packages/cran/reshape2/docs/melt.data.frame
289+
290+
.. |cast| replace:: ``cast``
291+
.. cast: http://www.inside-r.org/packages/cran/reshape2/docs/cast
292+

0 commit comments

Comments
 (0)