Skip to content

Commit 8b2ebdb

Browse files
authored
DOC: remove use of head() in the comparison docs (#38935)
1 parent 3526a71 commit 8b2ebdb

16 files changed

+128
-105
lines changed

doc/source/getting_started/comparison/comparison_with_sas.rst

+19-37
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,13 @@
44

55
Comparison with SAS
66
********************
7+
78
For potential users coming from `SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__
89
this page is meant to demonstrate how different SAS operations would be
910
performed in pandas.
1011

1112
.. include:: includes/introduction.rst
1213

13-
.. note::
14-
15-
Throughout this tutorial, the pandas ``DataFrame`` will be displayed by calling
16-
``df.head()``, which displays the first N (default 5) rows of the ``DataFrame``.
17-
This is often used in interactive work (e.g. `Jupyter notebook
18-
<https://jupyter.org/>`_ or terminal) - the equivalent in SAS would be:
19-
20-
.. code-block:: sas
21-
22-
proc print data=df(obs=5);
23-
run;
2414

2515
Data structures
2616
---------------
@@ -120,7 +110,7 @@ The pandas method is :func:`read_csv`, which works similarly.
120110
"pandas/master/pandas/tests/io/data/csv/tips.csv"
121111
)
122112
tips = pd.read_csv(url)
123-
tips.head()
113+
tips
124114
125115
126116
Like ``PROC IMPORT``, ``read_csv`` can take a number of parameters to specify
@@ -138,6 +128,19 @@ In addition to text/csv, pandas supports a variety of other data formats
138128
such as Excel, HDF5, and SQL databases. These are all read via a ``pd.read_*``
139129
function. See the :ref:`IO documentation<io>` for more details.
140130

131+
Limiting output
132+
~~~~~~~~~~~~~~~
133+
134+
.. include:: includes/limit.rst
135+
136+
The equivalent in SAS would be:
137+
138+
.. code-block:: sas
139+
140+
proc print data=df(obs=5);
141+
run;
142+
143+
141144
Exporting data
142145
~~~~~~~~~~~~~~
143146

@@ -173,20 +176,8 @@ be used on new or existing columns.
173176
new_bill = total_bill / 2;
174177
run;
175178
176-
pandas provides similar vectorized operations by
177-
specifying the individual ``Series`` in the ``DataFrame``.
178-
New columns can be assigned in the same way.
179+
.. include:: includes/column_operations.rst
179180

180-
.. ipython:: python
181-
182-
tips["total_bill"] = tips["total_bill"] - 2
183-
tips["new_bill"] = tips["total_bill"] / 2.0
184-
tips.head()
185-
186-
.. ipython:: python
187-
:suppress:
188-
189-
tips = tips.drop("new_bill", axis=1)
190181

191182
Filtering
192183
~~~~~~~~~
@@ -278,18 +269,7 @@ drop, and rename columns.
278269
rename total_bill=total_bill_2;
279270
run;
280271
281-
The same operations are expressed in pandas below.
282-
283-
.. ipython:: python
284-
285-
# keep
286-
tips[["sex", "total_bill", "tip"]].head()
287-
288-
# drop
289-
tips.drop("sex", axis=1).head()
290-
291-
# rename
292-
tips.rename(columns={"total_bill": "total_bill_2"}).head()
272+
.. include:: includes/column_selection.rst
293273

294274

295275
Sorting by values
@@ -442,6 +422,8 @@ input frames.
442422
Missing data
443423
------------
444424

425+
Both pandas and SAS have a representation for missing data.
426+
445427
.. include:: includes/missing_intro.rst
446428

447429
One difference is that missing data cannot be compared to its sentinel value.

doc/source/getting_started/comparison/comparison_with_sql.rst

+19-7
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ structure.
2121
"/pandas/master/pandas/tests/io/data/csv/tips.csv"
2222
)
2323
tips = pd.read_csv(url)
24-
tips.head()
24+
tips
2525
2626
SELECT
2727
------
@@ -31,14 +31,13 @@ to select all columns):
3131
.. code-block:: sql
3232
3333
SELECT total_bill, tip, smoker, time
34-
FROM tips
35-
LIMIT 5;
34+
FROM tips;
3635
3736
With pandas, column selection is done by passing a list of column names to your DataFrame:
3837

3938
.. ipython:: python
4039
41-
tips[["total_bill", "tip", "smoker", "time"]].head(5)
40+
tips[["total_bill", "tip", "smoker", "time"]]
4241
4342
Calling the DataFrame without the list of column names would display all columns (akin to SQL's
4443
``*``).
@@ -48,14 +47,13 @@ In SQL, you can add a calculated column:
4847
.. code-block:: sql
4948
5049
SELECT *, tip/total_bill as tip_rate
51-
FROM tips
52-
LIMIT 5;
50+
FROM tips;
5351
5452
With pandas, you can use the :meth:`DataFrame.assign` method of a DataFrame to append a new column:
5553

5654
.. ipython:: python
5755
58-
tips.assign(tip_rate=tips["tip"] / tips["total_bill"]).head(5)
56+
tips.assign(tip_rate=tips["tip"] / tips["total_bill"])
5957
6058
WHERE
6159
-----
@@ -368,6 +366,20 @@ In pandas, you can use :meth:`~pandas.concat` in conjunction with
368366
369367
pd.concat([df1, df2]).drop_duplicates()
370368
369+
370+
LIMIT
371+
-----
372+
373+
.. code-block:: sql
374+
375+
SELECT * FROM tips
376+
LIMIT 10;
377+
378+
.. ipython:: python
379+
380+
tips.head(10)
381+
382+
371383
pandas equivalents for some SQL analytic and aggregate functions
372384
----------------------------------------------------------------
373385

doc/source/getting_started/comparison/comparison_with_stata.rst

+18-37
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,6 @@ performed in pandas.
1010

1111
.. include:: includes/introduction.rst
1212

13-
.. note::
14-
15-
Throughout this tutorial, the pandas ``DataFrame`` will be displayed by calling
16-
``df.head()``, which displays the first N (default 5) rows of the ``DataFrame``.
17-
This is often used in interactive work (e.g. `Jupyter notebook
18-
<https://jupyter.org/>`_ or terminal) -- the equivalent in Stata would be:
19-
20-
.. code-block:: stata
21-
22-
list in 1/5
2313

2414
Data structures
2515
---------------
@@ -116,7 +106,7 @@ the data set if presented with a url.
116106
"/pandas/master/pandas/tests/io/data/csv/tips.csv"
117107
)
118108
tips = pd.read_csv(url)
119-
tips.head()
109+
tips
120110
121111
Like ``import delimited``, :func:`read_csv` can take a number of parameters to specify
122112
how the data should be parsed. For example, if the data were instead tab delimited,
@@ -141,6 +131,18 @@ such as Excel, SAS, HDF5, Parquet, and SQL databases. These are all read via a
141131
function. See the :ref:`IO documentation<io>` for more details.
142132

143133

134+
Limiting output
135+
~~~~~~~~~~~~~~~
136+
137+
.. include:: includes/limit.rst
138+
139+
The equivalent in Stata would be:
140+
141+
.. code-block:: stata
142+
143+
list in 1/5
144+
145+
144146
Exporting data
145147
~~~~~~~~~~~~~~
146148

@@ -179,18 +181,8 @@ the column from the data set.
179181
generate new_bill = total_bill / 2
180182
drop new_bill
181183
182-
pandas provides similar vectorized operations by
183-
specifying the individual ``Series`` in the ``DataFrame``.
184-
New columns can be assigned in the same way. The :meth:`DataFrame.drop` method
185-
drops a column from the ``DataFrame``.
184+
.. include:: includes/column_operations.rst
186185

187-
.. ipython:: python
188-
189-
tips["total_bill"] = tips["total_bill"] - 2
190-
tips["new_bill"] = tips["total_bill"] / 2
191-
tips.head()
192-
193-
tips = tips.drop("new_bill", axis=1)
194186

195187
Filtering
196188
~~~~~~~~~
@@ -256,20 +248,7 @@ Stata provides keywords to select, drop, and rename columns.
256248
257249
rename total_bill total_bill_2
258250
259-
The same operations are expressed in pandas below. Note that in contrast to Stata, these
260-
operations do not happen in place. To make these changes persist, assign the operation back
261-
to a variable.
262-
263-
.. ipython:: python
264-
265-
# keep
266-
tips[["sex", "total_bill", "tip"]].head()
267-
268-
# drop
269-
tips.drop("sex", axis=1).head()
270-
271-
# rename
272-
tips.rename(columns={"total_bill": "total_bill_2"}).head()
251+
.. include:: includes/column_selection.rst
273252

274253

275254
Sorting by values
@@ -428,12 +407,14 @@ or the intersection of the two by using the values created in the
428407
restore
429408
merge 1:n key using df2.dta
430409
431-
.. include:: includes/merge_setup.rst
410+
.. include:: includes/merge.rst
432411

433412

434413
Missing data
435414
------------
436415

416+
Both pandas and Stata have a representation for missing data.
417+
437418
.. include:: includes/missing_intro.rst
438419

439420
One difference is that missing data cannot be compared to its sentinel value.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
pandas provides similar vectorized operations by specifying the individual ``Series`` in the
2+
``DataFrame``. New columns can be assigned in the same way. The :meth:`DataFrame.drop` method drops
3+
a column from the ``DataFrame``.
4+
5+
.. ipython:: python
6+
7+
tips["total_bill"] = tips["total_bill"] - 2
8+
tips["new_bill"] = tips["total_bill"] / 2
9+
tips
10+
11+
tips = tips.drop("new_bill", axis=1)
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
The same operations are expressed in pandas below. Note that these operations do not happen in
2+
place. To make these changes persist, assign the operation back to a variable.
3+
4+
Keep certain columns
5+
''''''''''''''''''''
6+
7+
.. ipython:: python
8+
9+
tips[["sex", "total_bill", "tip"]]
10+
11+
Drop a column
12+
'''''''''''''
13+
14+
.. ipython:: python
15+
16+
tips.drop("sex", axis=1)
17+
18+
Rename a column
19+
'''''''''''''''
20+
21+
.. ipython:: python
22+
23+
tips.rename(columns={"total_bill": "total_bill_2"})

doc/source/getting_started/comparison/includes/extract_substring.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ indexes are zero-based.
44

55
.. ipython:: python
66
7-
tips["sex"].str[0:1].head()
7+
tips["sex"].str[0:1]

doc/source/getting_started/comparison/includes/find_substring.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ zero-based.
55

66
.. ipython:: python
77
8-
tips["sex"].str.find("ale").head()
8+
tips["sex"].str.find("ale")

doc/source/getting_started/comparison/includes/groupby.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ pandas provides a flexible ``groupby`` mechanism that allows similar aggregation
44
.. ipython:: python
55
66
tips_summed = tips.groupby(["sex", "smoker"])[["total_bill", "tip"]].sum()
7-
tips_summed.head()
7+
tips_summed

doc/source/getting_started/comparison/includes/if_then.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ the ``where`` method from ``numpy``.
44
.. ipython:: python
55
66
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
7-
tips.head()
7+
tips
88
99
.. ipython:: python
1010
:suppress:

doc/source/getting_started/comparison/includes/length.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,5 @@ Use ``len`` and ``rstrip`` to exclude trailing blanks.
44

55
.. ipython:: python
66
7-
tips["time"].str.len().head()
8-
tips["time"].str.rstrip().str.len().head()
7+
tips["time"].str.len()
8+
tips["time"].str.rstrip().str.len()
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
By default, pandas will truncate output of large ``DataFrame``\s to show the first and last rows.
2+
This can be overridden by :ref:`changing the pandas options <options>`, or using
3+
:meth:`DataFrame.head` or :meth:`DataFrame.tail`.
4+
5+
.. ipython:: python
6+
7+
tips.head(5)
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,31 @@
1-
This doesn't work in pandas. Instead, the :func:`pd.isna` or :func:`pd.notna` functions
2-
should be used for comparisons.
1+
In pandas, :meth:`Series.isna` and :meth:`Series.notna` can be used to filter the rows.
32

43
.. ipython:: python
54
6-
outer_join[pd.isna(outer_join["value_x"])]
7-
outer_join[pd.notna(outer_join["value_x"])]
5+
outer_join[outer_join["value_x"].isna()]
6+
outer_join[outer_join["value_x"].notna()]
87
9-
pandas also provides a variety of methods to work with missing data -- some of
10-
which would be challenging to express in Stata. For example, there are methods to
11-
drop all rows with any missing values, replacing missing values with a specified
12-
value, like the mean, or forward filling from previous rows. See the
13-
:ref:`missing data documentation<missing_data>` for more.
8+
pandas provides :ref:`a variety of methods to work with missing data <missing_data>`. Here are some examples:
9+
10+
Drop rows with missing values
11+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1412

1513
.. ipython:: python
1614
17-
# Drop rows with any missing value
1815
outer_join.dropna()
1916
20-
# Fill forwards
17+
Forward fill from previous rows
18+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
19+
20+
.. ipython:: python
21+
2122
outer_join.fillna(method="ffill")
2223
23-
# Impute missing values with the mean
24+
Replace missing values with a specified value
25+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
26+
27+
Using the mean:
28+
29+
.. ipython:: python
30+
2431
outer_join["value_x"].fillna(outer_join["value_x"].mean())

0 commit comments

Comments
 (0)