Skip to content

Commit 1e0b228

Browse files
leifwalshjreback
authored andcommitted
DOC: expanding comparison with R section
closes #12472 closes #9815
1 parent d444ffa commit 1e0b228

File tree

1 file changed

+73
-0
lines changed

1 file changed

+73
-0
lines changed

doc/source/comparison_with_r.rst

+73
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,79 @@ For transfer of ``DataFrame`` objects from ``pandas`` to R, one option is to
3131
use HDF5 files, see :ref:`io.external_compatibility` for an
3232
example.
3333

34+
35+
Quick Reference
36+
---------------
37+
38+
We'll start off with a quick reference guide pairing some common R
39+
operations using `dplyr
40+
<http://cran.r-project.org/web/packages/dplyr/index.html>`__ with
41+
pandas equivalents.
42+
43+
44+
Querying, Filtering, Sampling
45+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
46+
47+
=========================================== ===========================================
48+
R pandas
49+
=========================================== ===========================================
50+
``dim(df)`` ``df.shape``
51+
``head(df)`` ``df.head()``
52+
``slice(df, 1:10)`` ``df.iloc[:9]``
53+
``filter(df, col1 == 1, col2 == 1)`` ``df.query('col1 == 1 & col2 == 1')``
54+
``df[df$col1 == 1 & df$col2 == 1,]`` ``df[(df.col1 == 1) & (df.col2 == 1)]``
55+
``select(df, col1, col2)`` ``df[['col1', 'col2']]``
56+
``select(df, col1:col3)`` ``df.loc[:, 'col1':'col3']``
57+
``select(df, -(col1:col3))`` ``df.drop(cols_to_drop, axis=1)`` but see [#select_range]_
58+
``distinct(select(df, col1))`` ``df[['col1']].drop_duplicates()``
59+
``distinct(select(df, col1, col2))`` ``df[['col1', 'col2']].drop_duplicates()``
60+
``sample_n(df, 10)`` ``df.sample(n=10)``
61+
``sample_frac(df, 0.01)`` ``df.sample(frac=0.01)``
62+
=========================================== ===========================================
63+
64+
.. [#select_range] R's shorthand for a subrange of columns
65+
(``select(df, col1:col3)``) can be approached
66+
cleanly in pandas, if you have the list of columns,
67+
for example ``df[cols[1:3]]`` or
68+
``df.drop(cols[1:3])``, but doing this by column
69+
name is a bit messy.
70+
71+
72+
Sorting
73+
~~~~~~~
74+
75+
=========================================== ===========================================
76+
R pandas
77+
=========================================== ===========================================
78+
``arrange(df, col1, col2)`` ``df.sort_values(['col1', 'col2'])``
79+
``arrange(df, desc(col1))`` ``df.sort_values('col1', ascending=False)``
80+
=========================================== ===========================================
81+
82+
Transforming
83+
~~~~~~~~~~~~
84+
85+
=========================================== ===========================================
86+
R pandas
87+
=========================================== ===========================================
88+
``select(df, col_one = col1)`` ``df.rename(columns={'col1': 'col_one'})['col_one']``
89+
``rename(df, col_one = col1)`` ``df.rename(columns={'col1': 'col_one'})``
90+
``mutate(df, c=a-b)`` ``df.assign(c=df.a-df.b)``
91+
=========================================== ===========================================
92+
93+
94+
Grouping and Summarizing
95+
~~~~~~~~~~~~~~~~~~~~~~~~
96+
97+
============================================== ===========================================
98+
R pandas
99+
============================================== ===========================================
100+
``summary(df)`` ``df.describe()``
101+
``gdf <- group_by(df, col1)`` ``gdf = df.groupby('col1')``
102+
``summarise(gdf, avg=mean(col1, na.rm=TRUE))`` ``df.groupby('col1').agg({'col1': 'mean'})``
103+
``summarise(gdf, total=sum(col1))`` ``df.groupby('col1').sum()``
104+
============================================== ===========================================
105+
106+
34107
Base R
35108
------
36109

0 commit comments

Comments
 (0)