Skip to content

Commit 28b938d

Browse files
authored
DOC: create shared includes for content shared by comparison docs (#38887)
1 parent 1e4c9df commit 28b938d

12 files changed

+96
-154
lines changed

doc/source/getting_started/comparison/comparison_with_sas.rst

+7-67
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ For potential users coming from `SAS <https://en.wikipedia.org/wiki/SAS_(softwar
88
this page is meant to demonstrate how different SAS operations would be
99
performed in pandas.
1010

11-
.. include:: comparison_boilerplate.rst
11+
.. include:: includes/introduction.rst
1212

1313
.. note::
1414

@@ -93,16 +93,7 @@ specifying the column names.
9393
;
9494
run;
9595
96-
A pandas ``DataFrame`` can be constructed in many different ways,
97-
but for a small number of values, it is often convenient to specify it as
98-
a Python dictionary, where the keys are the column names
99-
and the values are the data.
100-
101-
.. ipython:: python
102-
103-
df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
104-
df
105-
96+
.. include:: includes/construct_dataframe.rst
10697

10798
Reading external data
10899
~~~~~~~~~~~~~~~~~~~~~
@@ -217,12 +208,7 @@ or more columns.
217208
DATA step begins and can also be used in PROC statements */
218209
run;
219210
220-
DataFrames can be filtered in multiple ways; the most intuitive of which is using
221-
:ref:`boolean indexing <indexing.boolean>`
222-
223-
.. ipython:: python
224-
225-
tips[tips["total_bill"] > 10].head()
211+
.. include:: includes/filtering.rst
226212

227213
If/then logic
228214
~~~~~~~~~~~~~
@@ -239,18 +225,7 @@ In SAS, if/then logic can be used to create new columns.
239225
else bucket = 'high';
240226
run;
241227
242-
The same operation in pandas can be accomplished using
243-
the ``where`` method from ``numpy``.
244-
245-
.. ipython:: python
246-
247-
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
248-
tips.head()
249-
250-
.. ipython:: python
251-
:suppress:
252-
253-
tips = tips.drop("bucket", axis=1)
228+
.. include:: includes/if_then.rst
254229

255230
Date functionality
256231
~~~~~~~~~~~~~~~~~~
@@ -278,28 +253,7 @@ functions pandas supports other Time Series features
278253
not available in Base SAS (such as resampling and custom offsets) -
279254
see the :ref:`timeseries documentation<timeseries>` for more details.
280255

281-
.. ipython:: python
282-
283-
tips["date1"] = pd.Timestamp("2013-01-15")
284-
tips["date2"] = pd.Timestamp("2015-02-15")
285-
tips["date1_year"] = tips["date1"].dt.year
286-
tips["date2_month"] = tips["date2"].dt.month
287-
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
288-
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
289-
"date1"
290-
].dt.to_period("M")
291-
292-
tips[
293-
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
294-
].head()
295-
296-
.. ipython:: python
297-
:suppress:
298-
299-
tips = tips.drop(
300-
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
301-
axis=1,
302-
)
256+
.. include:: includes/time_date.rst
303257

304258
Selection of columns
305259
~~~~~~~~~~~~~~~~~~~~
@@ -349,14 +303,7 @@ Sorting in SAS is accomplished via ``PROC SORT``
349303
by sex total_bill;
350304
run;
351305
352-
pandas objects have a :meth:`~DataFrame.sort_values` method, which
353-
takes a list of columns to sort by.
354-
355-
.. ipython:: python
356-
357-
tips = tips.sort_values(["sex", "total_bill"])
358-
tips.head()
359-
306+
.. include:: includes/sorting.rst
360307

361308
String processing
362309
-----------------
@@ -377,14 +324,7 @@ functions. ``LENGTHN`` excludes trailing blanks and ``LENGTHC`` includes trailin
377324
put(LENGTHC(time));
378325
run;
379326
380-
Python determines the length of a character string with the ``len`` function.
381-
``len`` includes trailing blanks. Use ``len`` and ``rstrip`` to exclude
382-
trailing blanks.
383-
384-
.. ipython:: python
385-
386-
tips["time"].str.len().head()
387-
tips["time"].str.rstrip().str.len().head()
327+
.. include:: includes/length.rst
388328

389329

390330
Find

doc/source/getting_started/comparison/comparison_with_spreadsheets.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ terminology and link to documentation for Excel, but much will be the same/simil
1414
`Apple Numbers <https://www.apple.com/mac/numbers/compatibility/functions.html>`_, and other
1515
Excel-compatible spreadsheet software.
1616

17-
.. include:: comparison_boilerplate.rst
17+
.. include:: includes/introduction.rst
1818

1919
Data structures
2020
---------------

doc/source/getting_started/comparison/comparison_with_sql.rst

+3-18
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ Since many potential pandas users have some familiarity with
88
`SQL <https://en.wikipedia.org/wiki/SQL>`_, this page is meant to provide some examples of how
99
various SQL operations would be performed using pandas.
1010

11-
.. include:: comparison_boilerplate.rst
11+
.. include:: includes/introduction.rst
1212

1313
Most of the examples will utilize the ``tips`` dataset found within pandas tests. We'll read
1414
the data into a DataFrame called ``tips`` and assume we have a database table of the same name and
@@ -65,24 +65,9 @@ Filtering in SQL is done via a WHERE clause.
6565
6666
SELECT *
6767
FROM tips
68-
WHERE time = 'Dinner'
69-
LIMIT 5;
70-
71-
DataFrames can be filtered in multiple ways; the most intuitive of which is using
72-
:ref:`boolean indexing <indexing.boolean>`
73-
74-
.. ipython:: python
75-
76-
tips[tips["time"] == "Dinner"].head(5)
77-
78-
The above statement is simply passing a ``Series`` of True/False objects to the DataFrame,
79-
returning all rows with True.
80-
81-
.. ipython:: python
68+
WHERE time = 'Dinner';
8269
83-
is_dinner = tips["time"] == "Dinner"
84-
is_dinner.value_counts()
85-
tips[is_dinner].head(5)
70+
.. include:: includes/filtering.rst
8671

8772
Just like SQL's OR and AND, multiple conditions can be passed to a DataFrame using | (OR) and &
8873
(AND).

doc/source/getting_started/comparison/comparison_with_stata.rst

+7-67
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ For potential users coming from `Stata <https://en.wikipedia.org/wiki/Stata>`__
88
this page is meant to demonstrate how different Stata operations would be
99
performed in pandas.
1010

11-
.. include:: comparison_boilerplate.rst
11+
.. include:: includes/introduction.rst
1212

1313
.. note::
1414

@@ -89,16 +89,7 @@ specifying the column names.
8989
5 6
9090
end
9191
92-
A pandas ``DataFrame`` can be constructed in many different ways,
93-
but for a small number of values, it is often convenient to specify it as
94-
a Python dictionary, where the keys are the column names
95-
and the values are the data.
96-
97-
.. ipython:: python
98-
99-
df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
100-
df
101-
92+
.. include:: includes/construct_dataframe.rst
10293

10394
Reading external data
10495
~~~~~~~~~~~~~~~~~~~~~
@@ -210,12 +201,7 @@ Filtering in Stata is done with an ``if`` clause on one or more columns.
210201
211202
list if total_bill > 10
212203
213-
DataFrames can be filtered in multiple ways; the most intuitive of which is using
214-
:ref:`boolean indexing <indexing.boolean>`.
215-
216-
.. ipython:: python
217-
218-
tips[tips["total_bill"] > 10].head()
204+
.. include:: includes/filtering.rst
219205

220206
If/then logic
221207
~~~~~~~~~~~~~
@@ -227,18 +213,7 @@ In Stata, an ``if`` clause can also be used to create new columns.
227213
generate bucket = "low" if total_bill < 10
228214
replace bucket = "high" if total_bill >= 10
229215
230-
The same operation in pandas can be accomplished using
231-
the ``where`` method from ``numpy``.
232-
233-
.. ipython:: python
234-
235-
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
236-
tips.head()
237-
238-
.. ipython:: python
239-
:suppress:
240-
241-
tips = tips.drop("bucket", axis=1)
216+
.. include:: includes/if_then.rst
242217

243218
Date functionality
244219
~~~~~~~~~~~~~~~~~~
@@ -266,28 +241,7 @@ functions, pandas supports other Time Series features
266241
not available in Stata (such as time zone handling and custom offsets) --
267242
see the :ref:`timeseries documentation<timeseries>` for more details.
268243

269-
.. ipython:: python
270-
271-
tips["date1"] = pd.Timestamp("2013-01-15")
272-
tips["date2"] = pd.Timestamp("2015-02-15")
273-
tips["date1_year"] = tips["date1"].dt.year
274-
tips["date2_month"] = tips["date2"].dt.month
275-
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
276-
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
277-
"date1"
278-
].dt.to_period("M")
279-
280-
tips[
281-
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
282-
].head()
283-
284-
.. ipython:: python
285-
:suppress:
286-
287-
tips = tips.drop(
288-
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
289-
axis=1,
290-
)
244+
.. include:: includes/time_date.rst
291245

292246
Selection of columns
293247
~~~~~~~~~~~~~~~~~~~~
@@ -327,14 +281,7 @@ Sorting in Stata is accomplished via ``sort``
327281
328282
sort sex total_bill
329283
330-
pandas objects have a :meth:`DataFrame.sort_values` method, which
331-
takes a list of columns to sort by.
332-
333-
.. ipython:: python
334-
335-
tips = tips.sort_values(["sex", "total_bill"])
336-
tips.head()
337-
284+
.. include:: includes/sorting.rst
338285

339286
String processing
340287
-----------------
@@ -350,14 +297,7 @@ Stata determines the length of a character string with the :func:`strlen` and
350297
generate strlen_time = strlen(time)
351298
generate ustrlen_time = ustrlen(time)
352299
353-
Python determines the length of a character string with the ``len`` function.
354-
In Python 3, all strings are Unicode strings. ``len`` includes trailing blanks.
355-
Use ``len`` and ``rstrip`` to exclude trailing blanks.
356-
357-
.. ipython:: python
358-
359-
tips["time"].str.len().head()
360-
tips["time"].str.rstrip().str.len().head()
300+
.. include:: includes/length.rst
361301

362302

363303
Finding position of substring
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
A pandas ``DataFrame`` can be constructed in many different ways,
2+
but for a small number of values, it is often convenient to specify it as
3+
a Python dictionary, where the keys are the column names
4+
and the values are the data.
5+
6+
.. ipython:: python
7+
8+
df = pd.DataFrame({"x": [1, 3, 5], "y": [2, 4, 6]})
9+
df
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
DataFrames can be filtered in multiple ways; the most intuitive of which is using
2+
:ref:`boolean indexing <indexing.boolean>`
3+
4+
.. ipython:: python
5+
6+
tips[tips["total_bill"] > 10]
7+
8+
The above statement is simply passing a ``Series`` of ``True``/``False`` objects to the DataFrame,
9+
returning all rows with ``True``.
10+
11+
.. ipython:: python
12+
13+
is_dinner = tips["time"] == "Dinner"
14+
is_dinner
15+
is_dinner.value_counts()
16+
tips[is_dinner]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
The same operation in pandas can be accomplished using
2+
the ``where`` method from ``numpy``.
3+
4+
.. ipython:: python
5+
6+
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
7+
tips.head()
8+
9+
.. ipython:: python
10+
:suppress:
11+
12+
tips = tips.drop("bucket", axis=1)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
Python determines the length of a character string with the ``len`` function.
2+
In Python 3, all strings are Unicode strings. ``len`` includes trailing blanks.
3+
Use ``len`` and ``rstrip`` to exclude trailing blanks.
4+
5+
.. ipython:: python
6+
7+
tips["time"].str.len().head()
8+
tips["time"].str.rstrip().str.len().head()
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
pandas objects have a :meth:`DataFrame.sort_values` method, which
2+
takes a list of columns to sort by.
3+
4+
.. ipython:: python
5+
6+
tips = tips.sort_values(["sex", "total_bill"])
7+
tips.head()
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
.. ipython:: python
2+
3+
tips["date1"] = pd.Timestamp("2013-01-15")
4+
tips["date2"] = pd.Timestamp("2015-02-15")
5+
tips["date1_year"] = tips["date1"].dt.year
6+
tips["date2_month"] = tips["date2"].dt.month
7+
tips["date1_next"] = tips["date1"] + pd.offsets.MonthBegin()
8+
tips["months_between"] = tips["date2"].dt.to_period("M") - tips[
9+
"date1"
10+
].dt.to_period("M")
11+
12+
tips[
13+
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
14+
].head()
15+
16+
.. ipython:: python
17+
:suppress:
18+
19+
tips = tips.drop(
20+
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"],
21+
axis=1,
22+
)

setup.cfg

+4-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,10 @@ ignore = E203, # space before : (needed for how black formats slicing)
4949
E711, # comparison to none should be 'if cond is none:'
5050

5151
exclude =
52-
doc/source/development/contributing_docstring.rst
52+
doc/source/development/contributing_docstring.rst,
53+
# work around issue of undefined variable warnings
54+
# https://github.com/pandas-dev/pandas/pull/38837#issuecomment-752884156
55+
doc/source/getting_started/comparison/includes/*.rst
5356

5457
[tool:pytest]
5558
# sync minversion with setup.cfg & install.rst

0 commit comments

Comments
 (0)