Skip to content

Commit 0507b1e

Browse files
committed
DOC: remove use of head() in the comparison docs
This helps to clarify the examples by removing code that isn't relevant. Added a dedicated section to the SAS, SQL, and Stata pages.
1 parent 531d5cf commit 0507b1e

12 files changed

+71
-47
lines changed

doc/source/getting_started/comparison/comparison_with_sas.rst

+19-16
Original file line numberDiff line numberDiff line change
@@ -4,23 +4,13 @@
44

55
Comparison with SAS
66
********************
7+
78
For potential users coming from `SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__
89
this page is meant to demonstrate how different SAS operations would be
910
performed in pandas.
1011

1112
.. include:: includes/introduction.rst
1213

13-
.. note::
14-
15-
Throughout this tutorial, the pandas ``DataFrame`` will be displayed by calling
16-
``df.head()``, which displays the first N (default 5) rows of the ``DataFrame``.
17-
This is often used in interactive work (e.g. `Jupyter notebook
18-
<https://jupyter.org/>`_ or terminal) - the equivalent in SAS would be:
19-
20-
.. code-block:: sas
21-
22-
proc print data=df(obs=5);
23-
run;
2414

2515
Data structures
2616
---------------
@@ -120,7 +110,7 @@ The pandas method is :func:`read_csv`, which works similarly.
120110
"pandas/master/pandas/tests/io/data/csv/tips.csv"
121111
)
122112
tips = pd.read_csv(url)
123-
tips.head()
113+
tips
124114
125115
126116
Like ``PROC IMPORT``, ``read_csv`` can take a number of parameters to specify
@@ -138,6 +128,19 @@ In addition to text/csv, pandas supports a variety of other data formats
138128
such as Excel, HDF5, and SQL databases. These are all read via a ``pd.read_*``
139129
function. See the :ref:`IO documentation<io>` for more details.
140130

131+
Limiting output
132+
~~~~~~~~~~~~~~~
133+
134+
.. include:: includes/limit.rst
135+
136+
The equivalent in SAS would be:
137+
138+
.. code-block:: sas
139+
140+
proc print data=df(obs=5);
141+
run;
142+
143+
141144
Exporting data
142145
~~~~~~~~~~~~~~
143146

@@ -181,7 +184,7 @@ New columns can be assigned in the same way.
181184
182185
tips["total_bill"] = tips["total_bill"] - 2
183186
tips["new_bill"] = tips["total_bill"] / 2.0
184-
tips.head()
187+
tips
185188
186189
.. ipython:: python
187190
:suppress:
@@ -283,13 +286,13 @@ The same operations are expressed in pandas below.
283286
.. ipython:: python
284287
285288
# keep
286-
tips[["sex", "total_bill", "tip"]].head()
289+
tips[["sex", "total_bill", "tip"]]
287290
288291
# drop
289-
tips.drop("sex", axis=1).head()
292+
tips.drop("sex", axis=1)
290293
291294
# rename
292-
tips.rename(columns={"total_bill": "total_bill_2"}).head()
295+
tips.rename(columns={"total_bill": "total_bill_2"})
293296
294297
295298
Sorting by values

doc/source/getting_started/comparison/comparison_with_sql.rst

+19-7
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ structure.
2121
"/pandas/master/pandas/tests/io/data/csv/tips.csv"
2222
)
2323
tips = pd.read_csv(url)
24-
tips.head()
24+
tips
2525
2626
SELECT
2727
------
@@ -31,14 +31,13 @@ to select all columns):
3131
.. code-block:: sql
3232
3333
SELECT total_bill, tip, smoker, time
34-
FROM tips
35-
LIMIT 5;
34+
FROM tips;
3635
3736
With pandas, column selection is done by passing a list of column names to your DataFrame:
3837

3938
.. ipython:: python
4039
41-
tips[["total_bill", "tip", "smoker", "time"]].head(5)
40+
tips[["total_bill", "tip", "smoker", "time"]]
4241
4342
Calling the DataFrame without the list of column names would display all columns (akin to SQL's
4443
``*``).
@@ -48,14 +47,13 @@ In SQL, you can add a calculated column:
4847
.. code-block:: sql
4948
5049
SELECT *, tip/total_bill as tip_rate
51-
FROM tips
52-
LIMIT 5;
50+
FROM tips;
5351
5452
With pandas, you can use the :meth:`DataFrame.assign` method of a DataFrame to append a new column:
5553

5654
.. ipython:: python
5755
58-
tips.assign(tip_rate=tips["tip"] / tips["total_bill"]).head(5)
56+
tips.assign(tip_rate=tips["tip"] / tips["total_bill"])
5957
6058
WHERE
6159
-----
@@ -368,6 +366,20 @@ In pandas, you can use :meth:`~pandas.concat` in conjunction with
368366
369367
pd.concat([df1, df2]).drop_duplicates()
370368
369+
370+
LIMIT
371+
-----
372+
373+
.. code-block:: sql
374+
375+
SELECT * FROM tips
376+
LIMIT 10;
377+
378+
.. ipython:: python
379+
380+
tips.head(10)
381+
382+
371383
pandas equivalents for some SQL analytic and aggregate functions
372384
----------------------------------------------------------------
373385

doc/source/getting_started/comparison/comparison_with_stata.rst

+17-15
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,6 @@ performed in pandas.
1010

1111
.. include:: includes/introduction.rst
1212

13-
.. note::
14-
15-
Throughout this tutorial, the pandas ``DataFrame`` will be displayed by calling
16-
``df.head()``, which displays the first N (default 5) rows of the ``DataFrame``.
17-
This is often used in interactive work (e.g. `Jupyter notebook
18-
<https://jupyter.org/>`_ or terminal) -- the equivalent in Stata would be:
19-
20-
.. code-block:: stata
21-
22-
list in 1/5
2313

2414
Data structures
2515
---------------
@@ -116,7 +106,7 @@ the data set if presented with a url.
116106
"/pandas/master/pandas/tests/io/data/csv/tips.csv"
117107
)
118108
tips = pd.read_csv(url)
119-
tips.head()
109+
tips
120110
121111
Like ``import delimited``, :func:`read_csv` can take a number of parameters to specify
122112
how the data should be parsed. For example, if the data were instead tab delimited,
@@ -141,6 +131,18 @@ such as Excel, SAS, HDF5, Parquet, and SQL databases. These are all read via a
141131
function. See the :ref:`IO documentation<io>` for more details.
142132

143133

134+
Limiting output
135+
~~~~~~~~~~~~~~~
136+
137+
.. include:: includes/limit.rst
138+
139+
The equivalent in Stata would be:
140+
141+
.. code-block:: stata
142+
143+
list in 1/5
144+
145+
144146
Exporting data
145147
~~~~~~~~~~~~~~
146148

@@ -188,7 +190,7 @@ drops a column from the ``DataFrame``.
188190
189191
tips["total_bill"] = tips["total_bill"] - 2
190192
tips["new_bill"] = tips["total_bill"] / 2
191-
tips.head()
193+
tips
192194
193195
tips = tips.drop("new_bill", axis=1)
194196
@@ -263,13 +265,13 @@ to a variable.
263265
.. ipython:: python
264266
265267
# keep
266-
tips[["sex", "total_bill", "tip"]].head()
268+
tips[["sex", "total_bill", "tip"]]
267269
268270
# drop
269-
tips.drop("sex", axis=1).head()
271+
tips.drop("sex", axis=1)
270272
271273
# rename
272-
tips.rename(columns={"total_bill": "total_bill_2"}).head()
274+
tips.rename(columns={"total_bill": "total_bill_2"})
273275
274276
275277
Sorting by values

doc/source/getting_started/comparison/includes/extract_substring.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ indexes are zero-based.
44

55
.. ipython:: python
66
7-
tips["sex"].str[0:1].head()
7+
tips["sex"].str[0:1]

doc/source/getting_started/comparison/includes/find_substring.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ zero-based.
55

66
.. ipython:: python
77
8-
tips["sex"].str.find("ale").head()
8+
tips["sex"].str.find("ale")

doc/source/getting_started/comparison/includes/groupby.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,4 @@ pandas provides a flexible ``groupby`` mechanism that allows similar aggregation
44
.. ipython:: python
55
66
tips_summed = tips.groupby(["sex", "smoker"])[["total_bill", "tip"]].sum()
7-
tips_summed.head()
7+
tips_summed

doc/source/getting_started/comparison/includes/if_then.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ the ``where`` method from ``numpy``.
44
.. ipython:: python
55
66
tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
7-
tips.head()
7+
tips
88
99
.. ipython:: python
1010
:suppress:

doc/source/getting_started/comparison/includes/length.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -4,5 +4,5 @@ Use ``len`` and ``rstrip`` to exclude trailing blanks.
44

55
.. ipython:: python
66
7-
tips["time"].str.len().head()
8-
tips["time"].str.rstrip().str.len().head()
7+
tips["time"].str.len()
8+
tips["time"].str.rstrip().str.len()
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
By default, pandas will truncate output of large ``DataFrame``\s to show the first and last rows.
2+
This can be overridden by :ref:`changing the pandas options <options>`, or using
3+
:meth:`DataFrame.head` or :meth:`DataFrame.tail`.
4+
5+
.. ipython:: python
6+
7+
tips.head(5)

doc/source/getting_started/comparison/includes/sorting.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ pandas has a :meth:`DataFrame.sort_values` method, which takes a list of columns
33
.. ipython:: python
44
55
tips = tips.sort_values(["sex", "total_bill"])
6-
tips.head()
6+
tips

doc/source/getting_started/comparison/includes/time_date.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
1212
tips[
1313
["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
14-
].head()
14+
]
1515
1616
.. ipython:: python
1717
:suppress:

doc/source/getting_started/comparison/includes/transform.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ succinctly expressed in one operation.
55
66
gb = tips.groupby("smoker")["total_bill"]
77
tips["adj_total_bill"] = tips["total_bill"] - gb.transform("mean")
8-
tips.head()
8+
tips

0 commit comments

Comments
 (0)