@@ -10,16 +10,6 @@ performed in pandas.
10
10
11
11
.. include :: includes/introduction.rst
12
12
13
- .. note ::
14
-
15
- Throughout this tutorial, the pandas ``DataFrame `` will be displayed by calling
16
- ``df.head() ``, which displays the first N (default 5) rows of the ``DataFrame ``.
17
- This is often used in interactive work (e.g. `Jupyter notebook
18
- <https://jupyter.org/> `_ or terminal) -- the equivalent in Stata would be:
19
-
20
- .. code-block :: stata
21
-
22
- list in 1/ 5
23
13
24
14
Data structures
25
15
---------------
@@ -116,7 +106,7 @@ the data set if presented with a url.
116
106
" /pandas/master/pandas/tests/io/data/csv/tips.csv"
117
107
)
118
108
tips = pd.read_csv(url)
119
- tips.head()
109
+ tips
120
110
121
111
Like ``import delimited ``, :func: `read_csv ` can take a number of parameters to specify
122
112
how the data should be parsed. For example, if the data were instead tab delimited,
@@ -141,6 +131,18 @@ such as Excel, SAS, HDF5, Parquet, and SQL databases. These are all read via a
141
131
function. See the :ref: `IO documentation<io> ` for more details.
142
132
143
133
134
+ Limiting output
135
+ ~~~~~~~~~~~~~~~
136
+
137
+ .. include :: includes/limit.rst
138
+
139
+ The equivalent in Stata would be:
140
+
141
+ .. code-block :: stata
142
+
143
+ list in 1/ 5
144
+
145
+
144
146
Exporting data
145
147
~~~~~~~~~~~~~~
146
148
@@ -179,18 +181,8 @@ the column from the data set.
179
181
generate new_bill = total_bill / 2
180
182
drop new_bill
181
183
182
- pandas provides similar vectorized operations by
183
- specifying the individual ``Series `` in the ``DataFrame ``.
184
- New columns can be assigned in the same way. The :meth: `DataFrame.drop ` method
185
- drops a column from the ``DataFrame ``.
184
+ .. include :: includes/column_operations.rst
186
185
187
- .. ipython :: python
188
-
189
- tips[" total_bill" ] = tips[" total_bill" ] - 2
190
- tips[" new_bill" ] = tips[" total_bill" ] / 2
191
- tips.head()
192
-
193
- tips = tips.drop(" new_bill" , axis = 1 )
194
186
195
187
Filtering
196
188
~~~~~~~~~
@@ -256,20 +248,7 @@ Stata provides keywords to select, drop, and rename columns.
256
248
257
249
rename total_bill total_bill_2
258
250
259
- The same operations are expressed in pandas below. Note that in contrast to Stata, these
260
- operations do not happen in place. To make these changes persist, assign the operation back
261
- to a variable.
262
-
263
- .. ipython :: python
264
-
265
- # keep
266
- tips[[" sex" , " total_bill" , " tip" ]].head()
267
-
268
- # drop
269
- tips.drop(" sex" , axis = 1 ).head()
270
-
271
- # rename
272
- tips.rename(columns = {" total_bill" : " total_bill_2" }).head()
251
+ .. include :: includes/column_selection.rst
273
252
274
253
275
254
Sorting by values
@@ -428,12 +407,14 @@ or the intersection of the two by using the values created in the
428
407
restore
429
408
merge 1:n key using df2.dta
430
409
431
- .. include :: includes/merge_setup .rst
410
+ .. include :: includes/merge .rst
432
411
433
412
434
413
Missing data
435
414
------------
436
415
416
+ Both pandas and Stata have a representation for missing data.
417
+
437
418
.. include :: includes/missing_intro.rst
438
419
439
420
One difference is that missing data cannot be compared to its sentinel value.
0 commit comments