@@ -31,6 +31,79 @@ For transfer of ``DataFrame`` objects from ``pandas`` to R, one option is to
31
31
use HDF5 files, see :ref: `io.external_compatibility ` for an
32
32
example.
33
33
34
+
35
+ Quick Reference
36
+ ---------------
37
+
38
+ We'll start off with a quick reference guide pairing some common R
39
+ operations using `dplyr
40
+ <http://cran.r-project.org/web/packages/dplyr/index.html> `__ with
41
+ pandas equivalents.
42
+
43
+
44
+ Querying, Filtering, Sampling
45
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
46
+
47
+ =========================================== ===========================================
48
+ R pandas
49
+ =========================================== ===========================================
50
+ ``dim(df) `` ``df.shape ``
51
+ ``head(df) `` ``df.head() ``
52
+ ``slice(df, 1:10) `` ``df.iloc[:9] ``
53
+ ``filter(df, col1 == 1, col2 == 1) `` ``df.query('col1 == 1 & col2 == 1') ``
54
+ ``df[df$col1 == 1 & df$col2 == 1,] `` ``df[(df.col1 == 1) & (df.col2 == 1)] ``
55
+ ``select(df, col1, col2) `` ``df[['col1', 'col2']] ``
56
+ ``select(df, col1:col3) `` ``df.loc[:, 'col1':'col3'] ``
57
+ ``select(df, -(col1:col3)) `` ``df.drop(cols_to_drop, axis=1) `` but see [#select_range ]_
58
+ ``distinct(select(df, col1)) `` ``df[['col1']].drop_duplicates() ``
59
+ ``distinct(select(df, col1, col2)) `` ``df[['col1', 'col2']].drop_duplicates() ``
60
+ ``sample_n(df, 10) `` ``df.sample(n=10) ``
61
+ ``sample_frac(df, 0.01) `` ``df.sample(frac=0.01) ``
62
+ =========================================== ===========================================
63
+
64
+ .. [#select_range ] R's shorthand for a subrange of columns
65
+ (``select(df, col1:col3) ``) can be approached
66
+ cleanly in pandas, if you have the list of columns,
67
+ for example ``df[cols[1:3]] `` or
68
+ ``df.drop(cols[1:3]) ``, but doing this by column
69
+ name is a bit messy.
70
+
71
+
72
+ Sorting
73
+ ~~~~~~~
74
+
75
+ =========================================== ===========================================
76
+ R pandas
77
+ =========================================== ===========================================
78
+ ``arrange(df, col1, col2) `` ``df.sort_values(['col1', 'col2']) ``
79
+ ``arrange(df, desc(col1)) `` ``df.sort_values('col1', ascending=False) ``
80
+ =========================================== ===========================================
81
+
82
+ Transforming
83
+ ~~~~~~~~~~~~
84
+
85
+ =========================================== ===========================================
86
+ R pandas
87
+ =========================================== ===========================================
88
+ ``select(df, col_one = col1) `` ``df.rename(columns={'col1': 'col_one'})['col_one'] ``
89
+ ``rename(df, col_one = col1) `` ``df.rename(columns={'col1': 'col_one'}) ``
90
+ ``mutate(df, c=a-b) `` ``df.assign(c=df.a-df.b) ``
91
+ =========================================== ===========================================
92
+
93
+
94
+ Grouping and Summarizing
95
+ ~~~~~~~~~~~~~~~~~~~~~~~~
96
+
97
+ ============================================== ===========================================
98
+ R pandas
99
+ ============================================== ===========================================
100
+ ``summary(df) `` ``df.describe() ``
101
+ ``gdf <- group_by(df, col1) `` ``gdf = df.groupby('col1') ``
102
+ ``summarise(gdf, avg=mean(col1, na.rm=TRUE)) `` ``df.groupby('col1').agg({'col1': 'mean'}) ``
103
+ ``summarise(gdf, total=sum(col1)) `` ``df.groupby('col1').sum() ``
104
+ ============================================== ===========================================
105
+
106
+
34
107
Base R
35
108
------
36
109
0 commit comments