Skip to content

Commit cce169a

Browse files
committed
DOC: create shared includes for content shared by comparison docs
This will help ensure consistency between the examples.
1 parent fb35344 commit cce169a

10 files changed

+94
-137
lines changed

doc/source/getting_started/comparison/comparison_with_sas.rst

+6-59
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ For potential users coming from `SAS <https://en.wikipedia.org/wiki/SAS_(softwar
88
this page is meant to demonstrate how different SAS operations would be
99
performed in pandas.
1010

11-
.. include:: comparison_boilerplate.rst
11+
.. include:: includes/introduction.rst
1212

1313
.. note::
1414

@@ -93,16 +93,7 @@ specifying the column names.
9393
;
9494
run;
9595
96-
A pandas ``DataFrame`` can be constructed in many different ways,
97-
but for a small number of values, it is often convenient to specify it as
98-
a Python dictionary, where the keys are the column names
99-
and the values are the data.
100-
101-
.. ipython:: python
102-
103-
df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
104-
df
105-
96+
.. include:: includes/construct_dataframe.rst
10697

10798
Reading external data
10899
~~~~~~~~~~~~~~~~~~~~~
@@ -217,12 +208,7 @@ or more columns.
217208
DATA step begins and can also be used in PROC statements */
218209
run;
219210
220-
DataFrames can be filtered in multiple ways; the most intuitive of which is using
221-
:ref:`boolean indexing <indexing.boolean>`
222-
223-
.. ipython:: python
224-
225-
tips[tips["total_bill"] > 10].head()
211+
.. include:: includes/filtering.rst
226212

227213
If/then logic
228214
~~~~~~~~~~~~~
@@ -239,18 +225,7 @@ In SAS, if/then logic can be used to create new columns.
239225
else bucket = 'high';
240226
run;
241227
242-
The same operation in pandas can be accomplished using
243-
the ``where`` method from ``numpy``.
244-
245-
.. ipython:: python
246-
247-
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
248-
tips.head()
249-
250-
.. ipython:: python
251-
:suppress:
252-
253-
tips = tips.drop("bucket", axis=1)
228+
.. include:: includes/if_then.rst
254229

255230
Date functionality
256231
~~~~~~~~~~~~~~~~~~
@@ -278,28 +253,7 @@ functions pandas supports other Time Series features
278253
not available in Base SAS (such as resampling and custom offsets) -
279254
see the :ref:`timeseries documentation<timeseries>` for more details.
280255

281-
.. ipython:: python
282-
283-
tips["date1"] = pd.Timestamp("2013-01-15")
284-
tips["date2"] = pd.Timestamp("2015-02-15")
285-
tips["date1_year"] = tips["date1"].dt.year
286-
tips["date2_month"] = tips["date2"].dt.month
287-
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
288-
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
289-
"date1"
290-
].dt.to_period("M")
291-
292-
tips[
293-
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
294-
].head()
295-
296-
.. ipython:: python
297-
:suppress:
298-
299-
tips = tips.drop(
300-
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
301-
axis=1,
302-
)
256+
.. include:: includes/time_date.rst
303257

304258
Selection of columns
305259
~~~~~~~~~~~~~~~~~~~~
@@ -349,14 +303,7 @@ Sorting in SAS is accomplished via ``PROC SORT``
349303
by sex total_bill;
350304
run;
351305
352-
pandas objects have a :meth:`~DataFrame.sort_values` method, which
353-
takes a list of columns to sort by.
354-
355-
.. ipython:: python
356-
357-
tips = tips.sort_values(["sex", "total_bill"])
358-
tips.head()
359-
306+
.. include:: includes/sorting.rst
360307

361308
String processing
362309
-----------------

doc/source/getting_started/comparison/comparison_with_spreadsheets.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ terminology and link to documentation for Excel, but much will be the same/simil
1414
`Apple Numbers <https://www.apple.com/mac/numbers/compatibility/functions.html>`_, and other
1515
Excel-compatible spreadsheet software.
1616

17-
.. include:: comparison_boilerplate.rst
17+
.. include:: includes/introduction.rst
1818

1919
Data structures
2020
---------------

doc/source/getting_started/comparison/comparison_with_sql.rst

+3-18
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Since many potential pandas users have some familiarity with
88
`SQL <https://en.wikipedia.org/wiki/SQL>`_, this page is meant to provide some examples of how
99
various SQL operations would be performed using pandas.
1010

11-
.. include:: comparison_boilerplate.rst
11+
.. include:: includes/introduction.rst
1212

1313
Most of the examples will utilize the ``tips`` dataset found within pandas tests. We'll read
1414
the data into a DataFrame called ``tips`` and assume we have a database table of the same name and
@@ -65,24 +65,9 @@ Filtering in SQL is done via a WHERE clause.
6565
6666
SELECT *
6767
FROM tips
68-
WHERE time = 'Dinner'
69-
LIMIT 5;
70-
71-
DataFrames can be filtered in multiple ways; the most intuitive of which is using
72-
:ref:`boolean indexing <indexing.boolean>`
73-
74-
.. ipython:: python
75-
76-
tips[tips["time"] == "Dinner"].head(5)
77-
78-
The above statement is simply passing a ``Series`` of True/False objects to the DataFrame,
79-
returning all rows with True.
80-
81-
.. ipython:: python
68+
WHERE time = 'Dinner';
8269
83-
is_dinner = tips["time"] == "Dinner"
84-
is_dinner.value_counts()
85-
tips[is_dinner].head(5)
70+
.. include:: includes/filtering.rst
8671

8772
Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and &
8873
(AND).

doc/source/getting_started/comparison/comparison_with_stata.rst

+6-59
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ For potential users coming from `Stata <https://en.wikipedia.org/wiki/Stata>`__
88
this page is meant to demonstrate how different Stata operations would be
99
performed in pandas.
1010

11-
.. include:: comparison_boilerplate.rst
11+
.. include:: includes/introduction.rst
1212

1313
.. note::
1414

@@ -89,16 +89,7 @@ specifying the column names.
8989
5 6
9090
end
9191
92-
A pandas ``DataFrame`` can be constructed in many different ways,
93-
but for a small number of values, it is often convenient to specify it as
94-
a Python dictionary, where the keys are the column names
95-
and the values are the data.
96-
97-
.. ipython:: python
98-
99-
df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
100-
df
101-
92+
.. include:: includes/construct_dataframe.rst
10293

10394
Reading external data
10495
~~~~~~~~~~~~~~~~~~~~~
@@ -210,12 +201,7 @@ Filtering in Stata is done with an ``if`` clause on one or more columns.
210201
211202
list if total_bill > 10
212203
213-
DataFrames can be filtered in multiple ways; the most intuitive of which is using
214-
:ref:`boolean indexing <indexing.boolean>`.
215-
216-
.. ipython:: python
217-
218-
tips[tips["total_bill"] > 10].head()
204+
.. include:: includes/filtering.rst
219205

220206
If/then logic
221207
~~~~~~~~~~~~~
@@ -227,18 +213,7 @@ In Stata, an ``if`` clause can also be used to create new columns.
227213
generate bucket = "low" if total_bill < 10
228214
replace bucket = "high" if total_bill >= 10
229215
230-
The same operation in pandas can be accomplished using
231-
the ``where`` method from ``numpy``.
232-
233-
.. ipython:: python
234-
235-
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
236-
tips.head()
237-
238-
.. ipython:: python
239-
:suppress:
240-
241-
tips = tips.drop("bucket", axis=1)
216+
.. include:: includes/if_then.rst
242217

243218
Date functionality
244219
~~~~~~~~~~~~~~~~~~
@@ -266,28 +241,7 @@ functions, pandas supports other Time Series features
266241
not available in Stata (such as time zone handling and custom offsets) --
267242
see the :ref:`timeseries documentation<timeseries>` for more details.
268243

269-
.. ipython:: python
270-
271-
tips["date1"] = pd.Timestamp("2013-01-15")
272-
tips["date2"] = pd.Timestamp("2015-02-15")
273-
tips["date1_year"] = tips["date1"].dt.year
274-
tips["date2_month"] = tips["date2"].dt.month
275-
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
276-
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
277-
"date1"
278-
].dt.to_period("M")
279-
280-
tips[
281-
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
282-
].head()
283-
284-
.. ipython:: python
285-
:suppress:
286-
287-
tips = tips.drop(
288-
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
289-
axis=1,
290-
)
244+
.. include:: includes/time_date.rst
291245

292246
Selection of columns
293247
~~~~~~~~~~~~~~~~~~~~
@@ -327,14 +281,7 @@ Sorting in Stata is accomplished via ``sort``
327281
328282
sort sex total_bill
329283
330-
pandas objects have a :meth:`DataFrame.sort_values` method, which
331-
takes a list of columns to sort by.
332-
333-
.. ipython:: python
334-
335-
tips = tips.sort_values(["sex", "total_bill"])
336-
tips.head()
337-
284+
.. include:: includes/sorting.rst
338285

339286
String processing
340287
-----------------
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
:orphan:
2+
3+
A pandas ``DataFrame`` can be constructed in many different ways,
4+
but for a small number of values, it is often convenient to specify it as
5+
a Python dictionary, where the keys are the column names
6+
and the values are the data.
7+
8+
.. ipython:: python
9+
10+
df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
11+
df
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
:orphan:
2+
3+
DataFrames can be filtered in multiple ways; the most intuitive of which is using
4+
:ref:`boolean indexing <indexing.boolean>`
5+
6+
.. ipython:: python
7+
8+
tips[tips["total_bill"] > 10]
9+
10+
The above statement is simply passing a ``Series`` of ``True``/``False`` objects to the DataFrame,
11+
returning all rows with ``True``.
12+
13+
.. ipython:: python
14+
15+
is_dinner = tips["time"] == "Dinner"
16+
is_dinner
17+
is_dinner.value_counts()
18+
tips[is_dinner]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
:orphan:
2+
3+
The same operation in pandas can be accomplished using
4+
the ``where`` method from ``numpy``.
5+
6+
.. ipython:: python
7+
8+
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
9+
tips.head()
10+
11+
.. ipython:: python
12+
:suppress:
13+
14+
tips = tips.drop("bucket", axis=1)

doc/source/getting_started/comparison/comparison_boilerplate.rst renamed to doc/source/getting_started/comparison/includes/introduction.rst

+2
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
:orphan:
2+
13
If you're new to pandas, you might want to first read through :ref:`10 Minutes to pandas<10min>`
24
to familiarize yourself with the library.
35

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
:orphan:
2+
3+
pandas objects have a :meth:`DataFrame.sort_values` method, which
4+
takes a list of columns to sort by.
5+
6+
.. ipython:: python
7+
8+
tips = tips.sort_values(["sex", "total_bill"])
9+
tips.head()
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
:orphan:
2+
3+
.. ipython:: python
4+
5+
tips["date1"] = pd.Timestamp("2013-01-15")
6+
tips["date2"] = pd.Timestamp("2015-02-15")
7+
tips["date1_year"] = tips["date1"].dt.year
8+
tips["date2_month"] = tips["date2"].dt.month
9+
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
10+
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
11+
"date1"
12+
].dt.to_period("M")
13+
14+
tips[
15+
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
16+
].head()
17+
18+
.. ipython:: python
19+
:suppress:
20+
21+
tips = tips.drop(
22+
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
23+
axis=1,
24+
)

0 commit comments

Comments
 (0)