@@ -8,7 +8,7 @@ For potential users coming from `Stata <https://en.wikipedia.org/wiki/Stata>`__
8
8
this page is meant to demonstrate how different Stata operations would be
9
9
performed in pandas.
10
10
11
- .. include :: comparison_boilerplate .rst
11
+ .. include :: includes/introduction .rst
12
12
13
13
.. note ::
14
14
@@ -89,16 +89,7 @@ specifying the column names.
89
89
5 6
90
90
end
91
91
92
- A pandas ``DataFrame `` can be constructed in many different ways,
93
- but for a small number of values, it is often convenient to specify it as
94
- a Python dictionary, where the keys are the column names
95
- and the values are the data.
96
-
97
- .. ipython :: python
98
-
99
- df = pd.DataFrame({" x" : [1 , 3 , 5 ], " y" : [2 , 4 , 6 ]})
100
- df
101
-
92
+ .. include :: includes/construct_dataframe.rst
102
93
103
94
Reading external data
104
95
~~~~~~~~~~~~~~~~~~~~~
@@ -210,12 +201,7 @@ Filtering in Stata is done with an ``if`` clause on one or more columns.
210
201
211
202
list if total_bill > 10
212
203
213
- DataFrames can be filtered in multiple ways; the most intuitive of which is using
214
- :ref: `boolean indexing <indexing.boolean >`.
215
-
216
- .. ipython :: python
217
-
218
- tips[tips[" total_bill" ] > 10 ].head()
204
+ .. include :: includes/filtering.rst
219
205
220
206
If/then logic
221
207
~~~~~~~~~~~~~
@@ -227,18 +213,7 @@ In Stata, an ``if`` clause can also be used to create new columns.
227
213
generate bucket = " low" if total_bill < 10
228
214
replace bucket = " high" if total_bill > = 10
229
215
230
- The same operation in pandas can be accomplished using
231
- the ``where `` method from ``numpy ``.
232
-
233
- .. ipython :: python
234
-
235
- tips[" bucket" ] = np.where(tips[" total_bill" ] < 10 , " low" , " high" )
236
- tips.head()
237
-
238
- .. ipython :: python
239
- :suppress:
240
-
241
- tips = tips.drop(" bucket" , axis = 1 )
216
+ .. include :: includes/if_then.rst
242
217
243
218
Date functionality
244
219
~~~~~~~~~~~~~~~~~~
@@ -266,28 +241,7 @@ functions, pandas supports other Time Series features
266
241
not available in Stata (such as time zone handling and custom offsets) --
267
242
see the :ref: `timeseries documentation<timeseries> ` for more details.
268
243
269
- .. ipython :: python
270
-
271
- tips[" date1" ] = pd.Timestamp(" 2013-01-15" )
272
- tips[" date2" ] = pd.Timestamp(" 2015-02-15" )
273
- tips[" date1_year" ] = tips[" date1" ].dt.year
274
- tips[" date2_month" ] = tips[" date2" ].dt.month
275
- tips[" date1_next" ] = tips[" date1" ] + pd.offsets.MonthBegin()
276
- tips[" months_between" ] = tips[" date2" ].dt.to_period(" M" ) - tips[
277
- " date1"
278
- ].dt.to_period(" M" )
279
-
280
- tips[
281
- [" date1" , " date2" , " date1_year" , " date2_month" , " date1_next" , " months_between" ]
282
- ].head()
283
-
284
- .. ipython :: python
285
- :suppress:
286
-
287
- tips = tips.drop(
288
- [" date1" , " date2" , " date1_year" , " date2_month" , " date1_next" , " months_between" ],
289
- axis = 1 ,
290
- )
244
+ .. include :: includes/time_date.rst
291
245
292
246
Selection of columns
293
247
~~~~~~~~~~~~~~~~~~~~
@@ -327,14 +281,7 @@ Sorting in Stata is accomplished via ``sort``
327
281
328
282
sort sex total_bill
329
283
330
- pandas objects have a :meth: `DataFrame.sort_values ` method, which
331
- takes a list of columns to sort by.
332
-
333
- .. ipython :: python
334
-
335
- tips = tips.sort_values([" sex" , " total_bill" ])
336
- tips.head()
337
-
284
+ .. include :: includes/sorting.rst
338
285
339
286
String processing
340
287
-----------------
@@ -350,14 +297,7 @@ Stata determines the length of a character string with the :func:`strlen` and
350
297
generate strlen_time = strlen(time)
351
298
generate ustrlen_time = ustrlen(time)
352
299
353
- Python determines the length of a character string with the ``len `` function.
354
- In Python 3, all strings are Unicode strings. ``len `` includes trailing blanks.
355
- Use ``len `` and ``rstrip `` to exclude trailing blanks.
356
-
357
- .. ipython :: python
358
-
359
- tips[" time" ].str.len().head()
360
- tips[" time" ].str.rstrip().str.len().head()
300
+ .. include :: includes/length.rst
361
301
362
302
363
303
Finding position of substring
0 commit comments