3
3
4
4
Comparison with SQL
5
5
********************
6
- Since many potential pandas users have some familiarity with
7
- `SQL <http://en.wikipedia.org/wiki/SQL >`_, this page is meant to provide some examples of how
6
+ Since many potential pandas users have some familiarity with
7
+ `SQL <http://en.wikipedia.org/wiki/SQL >`_, this page is meant to provide some examples of how
8
8
various SQL operations would be performed using pandas.
9
9
10
- If you're new to pandas, you might want to first read through :ref: `10 Minutes to Pandas<10min> `
10
+ If you're new to pandas, you might want to first read through :ref: `10 Minutes to Pandas<10min> `
11
11
to familiarize yourself with the library.
12
12
13
13
As is customary, we import pandas and numpy as follows:
@@ -17,8 +17,8 @@ As is customary, we import pandas and numpy as follows:
17
17
import pandas as pd
18
18
import numpy as np
19
19
20
- Most of the examples will utilize the ``tips `` dataset found within pandas tests. We'll read
21
- the data into a DataFrame called `tips ` and assume we have a database table of the same name and
20
+ Most of the examples will utilize the ``tips `` dataset found within pandas tests. We'll read
21
+ the data into a DataFrame called `tips ` and assume we have a database table of the same name and
22
22
structure.
23
23
24
24
.. ipython :: python
@@ -44,7 +44,7 @@ With pandas, column selection is done by passing a list of column names to your
44
44
45
45
tips[[' total_bill' , ' tip' , ' smoker' , ' time' ]].head(5 )
46
46
47
- Calling the DataFrame without the list of column names would display all columns (akin to SQL's
47
+ Calling the DataFrame without the list of column names would display all columns (akin to SQL's
48
48
``* ``).
49
49
50
50
WHERE
@@ -58,14 +58,14 @@ Filtering in SQL is done via a WHERE clause.
58
58
WHERE time = 'Dinner'
59
59
LIMIT 5;
60
60
61
- DataFrames can be filtered in multiple ways; the most intuitive of which is using
61
+ DataFrames can be filtered in multiple ways; the most intuitive of which is using
62
62
`boolean indexing <http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing >`_.
63
63
64
64
.. ipython :: python
65
65
66
66
tips[tips[' time' ] == ' Dinner' ].head(5 )
67
67
68
- The above statement is simply passing a ``Series `` of True/False objects to the DataFrame,
68
+ The above statement is simply passing a ``Series `` of True/False objects to the DataFrame,
69
69
returning all rows with True.
70
70
71
71
.. ipython :: python
@@ -74,7 +74,7 @@ returning all rows with True.
74
74
is_dinner.value_counts()
75
75
tips[is_dinner].head(5 )
76
76
77
- Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and &
77
+ Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and &
78
78
(AND).
79
79
80
80
.. code-block :: sql
@@ -101,16 +101,16 @@ Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame usi
101
101
# tips by parties of at least 5 diners OR bill total was more than $45
102
102
tips[(tips[' size' ] >= 5 ) | (tips[' total_bill' ] > 45 )]
103
103
104
- NULL checking is done using the :meth: `~pandas.Series.notnull ` and :meth: `~pandas.Series.isnull `
104
+ NULL checking is done using the :meth: `~pandas.Series.notnull ` and :meth: `~pandas.Series.isnull `
105
105
methods.
106
106
107
107
.. ipython :: python
108
-
108
+
109
109
frame = pd.DataFrame({' col1' : [' A' , ' B' , np.NaN, ' C' , ' D' ],
110
110
' col2' : [' F' , np.NaN, ' G' , ' H' , ' I' ]})
111
111
frame
112
112
113
- Assume we have a table of the same structure as our DataFrame above. We can see only the records
113
+ Assume we have a table of the same structure as our DataFrame above. We can see only the records
114
114
where ``col2 `` IS NULL with the following query:
115
115
116
116
.. code-block :: sql
@@ -138,12 +138,12 @@ Getting items where ``col1`` IS NOT NULL can be done with :meth:`~pandas.Series.
138
138
139
139
GROUP BY
140
140
--------
141
- In pandas, SQL's GROUP BY operations performed using the similarly named
142
- :meth: `~pandas.DataFrame.groupby ` method. :meth: `~pandas.DataFrame.groupby ` typically refers to a
141
+ In pandas, SQL's GROUP BY operations performed using the similarly named
142
+ :meth: `~pandas.DataFrame.groupby ` method. :meth: `~pandas.DataFrame.groupby ` typically refers to a
143
143
process where we'd like to split a dataset into groups, apply some function (typically aggregation)
144
144
, and then combine the groups together.
145
145
146
- A common SQL operation would be getting the count of records in each group throughout a dataset.
146
+ A common SQL operation would be getting the count of records in each group throughout a dataset.
147
147
For instance, a query getting us the number of tips left by sex:
148
148
149
149
.. code-block :: sql
@@ -163,23 +163,23 @@ The pandas equivalent would be:
163
163
164
164
tips.groupby(' sex' ).size()
165
165
166
- Notice that in the pandas code we used :meth: `~pandas.DataFrameGroupBy.size ` and not
167
- :meth: `~pandas.DataFrameGroupBy.count `. This is because :meth: `~pandas.DataFrameGroupBy.count `
166
+ Notice that in the pandas code we used :meth: `~pandas.DataFrameGroupBy.size ` and not
167
+ :meth: `~pandas.DataFrameGroupBy.count `. This is because :meth: `~pandas.DataFrameGroupBy.count `
168
168
applies the function to each column, returning the number of ``not null `` records within each.
169
169
170
170
.. ipython :: python
171
171
172
172
tips.groupby(' sex' ).count()
173
173
174
- Alternatively, we could have applied the :meth: `~pandas.DataFrameGroupBy.count ` method to an
174
+ Alternatively, we could have applied the :meth: `~pandas.DataFrameGroupBy.count ` method to an
175
175
individual column:
176
176
177
177
.. ipython :: python
178
178
179
179
tips.groupby(' sex' )[' total_bill' ].count()
180
180
181
- Multiple functions can also be applied at once. For instance, say we'd like to see how tip amount
182
- differs by day of the week - :meth: `~pandas.DataFrameGroupBy.agg ` allows you to pass a dictionary
181
+ Multiple functions can also be applied at once. For instance, say we'd like to see how tip amount
182
+ differs by day of the week - :meth: `~pandas.DataFrameGroupBy.agg ` allows you to pass a dictionary
183
183
to your grouped DataFrame, indicating which functions to apply to specific columns.
184
184
185
185
.. code-block :: sql
@@ -198,7 +198,7 @@ to your grouped DataFrame, indicating which functions to apply to specific colum
198
198
199
199
tips.groupby(' day' ).agg({' tip' : np.mean, ' day' : np.size})
200
200
201
- Grouping by more than one column is done by passing a list of columns to the
201
+ Grouping by more than one column is done by passing a list of columns to the
202
202
:meth: `~pandas.DataFrame.groupby ` method.
203
203
204
204
.. code-block :: sql
@@ -207,7 +207,7 @@ Grouping by more than one column is done by passing a list of columns to the
207
207
FROM tip
208
208
GROUP BY smoker, day;
209
209
/*
210
- smoker day
210
+ smoker day
211
211
No Fri 4 2.812500
212
212
Sat 45 3.102889
213
213
Sun 57 3.167895
@@ -226,16 +226,16 @@ Grouping by more than one column is done by passing a list of columns to the
226
226
227
227
JOIN
228
228
----
229
- JOINs can be performed with :meth: `~pandas.DataFrame.join ` or :meth: `~pandas.merge `. By default,
230
- :meth: `~pandas.DataFrame.join ` will join the DataFrames on their indices. Each method has
231
- parameters allowing you to specify the type of join to perform (LEFT, RIGHT, INNER, FULL) or the
229
+ JOINs can be performed with :meth: `~pandas.DataFrame.join ` or :meth: `~pandas.merge `. By default,
230
+ :meth: `~pandas.DataFrame.join ` will join the DataFrames on their indices. Each method has
231
+ parameters allowing you to specify the type of join to perform (LEFT, RIGHT, INNER, FULL) or the
232
232
columns to join on (column names or indices).
233
233
234
234
.. ipython :: python
235
235
236
236
df1 = pd.DataFrame({' key' : [' A' , ' B' , ' C' , ' D' ],
237
237
' value' : np.random.randn(4 )})
238
- df2 = pd.DataFrame({' key' : [' B' , ' D' , ' D' , ' E' ],
238
+ df2 = pd.DataFrame({' key' : [' B' , ' D' , ' D' , ' E' ],
239
239
' value' : np.random.randn(4 )})
240
240
241
241
Assume we have two database tables of the same name and structure as our DataFrames.
@@ -256,7 +256,7 @@ INNER JOIN
256
256
# merge performs an INNER JOIN by default
257
257
pd.merge(df1, df2, on = ' key' )
258
258
259
- :meth: `~pandas.merge ` also offers parameters for cases when you'd like to join one DataFrame's
259
+ :meth: `~pandas.merge ` also offers parameters for cases when you'd like to join one DataFrame's
260
260
column with another DataFrame's index.
261
261
262
262
.. ipython :: python
@@ -296,7 +296,7 @@ RIGHT JOIN
296
296
297
297
FULL JOIN
298
298
~~~~~~~~~
299
- pandas also allows for FULL JOINs, which display both sides of the dataset, whether or not the
299
+ pandas also allows for FULL JOINs, which display both sides of the dataset, whether or not the
300
300
joined columns find a match. As of writing, FULL JOINs are not supported in all RDBMS (MySQL).
301
301
302
302
.. code-block :: sql
@@ -364,7 +364,7 @@ SQL's UNION is similar to UNION ALL, however UNION will remove duplicate rows.
364
364
Los Angeles 5
365
365
*/
366
366
367
- In pandas, you can use :meth: `~pandas.concat ` in conjunction with
367
+ In pandas, you can use :meth: `~pandas.concat ` in conjunction with
368
368
:meth: `~pandas.DataFrame.drop_duplicates `.
369
369
370
370
.. ipython :: python
@@ -377,4 +377,4 @@ UPDATE
377
377
378
378
379
379
DELETE
380
- ------
380
+ ------
0 commit comments