@@ -8,7 +8,7 @@ For potential users coming from `SAS <https://en.wikipedia.org/wiki/SAS_(softwar
8
8
this page is meant to demonstrate how different SAS operations would be
9
9
performed in pandas.
10
10
11
- .. include :: comparison_boilerplate .rst
11
+ .. include :: includes/introduction .rst
12
12
13
13
.. note ::
14
14
@@ -93,16 +93,7 @@ specifying the column names.
93
93
;
94
94
run;
95
95
96
- A pandas ``DataFrame `` can be constructed in many different ways,
97
- but for a small number of values, it is often convenient to specify it as
98
- a Python dictionary, where the keys are the column names
99
- and the values are the data.
100
-
101
- .. ipython :: python
102
-
103
- df = pd.DataFrame({" x" : [1 , 3 , 5 ], " y" : [2 , 4 , 6 ]})
104
- df
105
-
96
+ .. include :: includes/construct_dataframe.rst
106
97
107
98
Reading external data
108
99
~~~~~~~~~~~~~~~~~~~~~
@@ -217,12 +208,7 @@ or more columns.
217
208
DATA step begins and can also be used in PROC statements */
218
209
run;
219
210
220
- DataFrames can be filtered in multiple ways; the most intuitive of which is using
221
- :ref: `boolean indexing <indexing.boolean >`
222
-
223
- .. ipython :: python
224
-
225
- tips[tips[" total_bill" ] > 10 ].head()
211
+ .. include :: includes/filtering.rst
226
212
227
213
If/then logic
228
214
~~~~~~~~~~~~~
@@ -239,18 +225,7 @@ In SAS, if/then logic can be used to create new columns.
239
225
else bucket = ' high' ;
240
226
run;
241
227
242
- The same operation in pandas can be accomplished using
243
- the ``where `` method from ``numpy ``.
244
-
245
- .. ipython :: python
246
-
247
- tips[" bucket" ] = np.where(tips[" total_bill" ] < 10 , " low" , " high" )
248
- tips.head()
249
-
250
- .. ipython :: python
251
- :suppress:
252
-
253
- tips = tips.drop(" bucket" , axis = 1 )
228
+ .. include :: includes/if_then.rst
254
229
255
230
Date functionality
256
231
~~~~~~~~~~~~~~~~~~
@@ -278,28 +253,7 @@ functions pandas supports other Time Series features
278
253
not available in Base SAS (such as resampling and custom offsets) -
279
254
see the :ref: `timeseries documentation<timeseries> ` for more details.
280
255
281
- .. ipython :: python
282
-
283
- tips[" date1" ] = pd.Timestamp(" 2013-01-15" )
284
- tips[" date2" ] = pd.Timestamp(" 2015-02-15" )
285
- tips[" date1_year" ] = tips[" date1" ].dt.year
286
- tips[" date2_month" ] = tips[" date2" ].dt.month
287
- tips[" date1_next" ] = tips[" date1" ] + pd.offsets.MonthBegin()
288
- tips[" months_between" ] = tips[" date2" ].dt.to_period(" M" ) - tips[
289
- " date1"
290
- ].dt.to_period(" M" )
291
-
292
- tips[
293
- [" date1" , " date2" , " date1_year" , " date2_month" , " date1_next" , " months_between" ]
294
- ].head()
295
-
296
- .. ipython :: python
297
- :suppress:
298
-
299
- tips = tips.drop(
300
- [" date1" , " date2" , " date1_year" , " date2_month" , " date1_next" , " months_between" ],
301
- axis = 1 ,
302
- )
256
+ .. include :: includes/time_date.rst
303
257
304
258
Selection of columns
305
259
~~~~~~~~~~~~~~~~~~~~
@@ -349,14 +303,7 @@ Sorting in SAS is accomplished via ``PROC SORT``
349
303
by sex total_bill;
350
304
run;
351
305
352
- pandas objects have a :meth: `~DataFrame.sort_values ` method, which
353
- takes a list of columns to sort by.
354
-
355
- .. ipython :: python
356
-
357
- tips = tips.sort_values([" sex" , " total_bill" ])
358
- tips.head()
359
-
306
+ .. include :: includes/sorting.rst
360
307
361
308
String processing
362
309
-----------------
@@ -377,14 +324,7 @@ functions. ``LENGTHN`` excludes trailing blanks and ``LENGTHC`` includes trailin
377
324
put(LENGTHC(time));
378
325
run;
379
326
380
- Python determines the length of a character string with the ``len `` function.
381
- ``len `` includes trailing blanks. Use ``len `` and ``rstrip `` to exclude
382
- trailing blanks.
383
-
384
- .. ipython :: python
385
-
386
- tips[" time" ].str.len().head()
387
- tips[" time" ].str.rstrip().str.len().head()
327
+ .. include :: includes/length.rst
388
328
389
329
390
330
Find
0 commit comments