DOC: remove use of head() in the comparison docs

afeld · afeld · commit 0507b1e572f2 · 2021-01-03T21:00:08.000-05:00
This helps to clarify the examples by removing code that isn't relevant.
Added a dedicated section to the SAS, SQL, and Stata pages.
diff --git a/doc/source/getting_started/comparison/comparison_with_sas.rst b/doc/source/getting_started/comparison/comparison_with_sas.rst
@@ -4,23 +4,13 @@
 
 Comparison with SAS
 ********************
+
 For potential users coming from `SAS <https://en.wikipedia.org/wiki/SAS_(software)>`__
 this page is meant to demonstrate how different SAS operations would be
 performed in pandas.
 
 .. include:: includes/introduction.rst
 
-.. note::
-
-   Throughout this tutorial, the pandas ``DataFrame`` will be displayed by calling
-   ``df.head()``, which displays the first N (default 5) rows of the ``DataFrame``.
-   This is often used in interactive work (e.g. `Jupyter notebook
-   <https://jupyter.org/>`_ or terminal) - the equivalent in SAS would be:
-
-   .. code-block:: sas
-
-      proc print data=df(obs=5);
-      run;
 
 Data structures
 ---------------
@@ -120,7 +110,7 @@ The pandas method is :func:`read_csv`, which works similarly.
        "pandas/master/pandas/tests/io/data/csv/tips.csv"
    )
    tips = pd.read_csv(url)
-   tips.head()
+   tips
 
 
 Like ``PROC IMPORT``, ``read_csv`` can take a number of parameters to specify
@@ -138,6 +128,19 @@ In addition to text/csv, pandas supports a variety of other data formats
 such as Excel, HDF5, and SQL databases.  These are all read via a ``pd.read_*``
 function.  See the :ref:`IO documentation<io>` for more details.
 
+Limiting output
+~~~~~~~~~~~~~~~
+
+.. include:: includes/limit.rst
+
+The equivalent in SAS would be:
+
+.. code-block:: sas
+
+   proc print data=df(obs=5);
+   run;
+
+
 Exporting data
 ~~~~~~~~~~~~~~
 
@@ -181,7 +184,7 @@ New columns can be assigned in the same way.
 
    tips["total_bill"] = tips["total_bill"] - 2
    tips["new_bill"] = tips["total_bill"] / 2.0
-   tips.head()
+   tips
 
 .. ipython:: python
    :suppress:
@@ -283,13 +286,13 @@ The same operations are expressed in pandas below.
 .. ipython:: python
 
    # keep
-   tips[["sex", "total_bill", "tip"]].head()
+   tips[["sex", "total_bill", "tip"]]
 
    # drop
-   tips.drop("sex", axis=1).head()
+   tips.drop("sex", axis=1)
 
    # rename
-   tips.rename(columns={"total_bill": "total_bill_2"}).head()
+   tips.rename(columns={"total_bill": "total_bill_2"})
 
 
 Sorting by values
diff --git a/doc/source/getting_started/comparison/comparison_with_sql.rst b/doc/source/getting_started/comparison/comparison_with_sql.rst
@@ -21,7 +21,7 @@ structure.
         "/pandas/master/pandas/tests/io/data/csv/tips.csv"
     )
     tips = pd.read_csv(url)
-    tips.head()
+    tips
 
 SELECT
 ------
@@ -31,14 +31,13 @@ to select all columns):
 .. code-block:: sql
 
     SELECT total_bill, tip, smoker, time
-    FROM tips
-    LIMIT 5;
+    FROM tips;
 
 With pandas, column selection is done by passing a list of column names to your DataFrame:
 
 .. ipython:: python
 
-    tips[["total_bill", "tip", "smoker", "time"]].head(5)
+    tips[["total_bill", "tip", "smoker", "time"]]
 
 Calling the DataFrame without the list of column names would display all columns (akin to SQL's
 ``*``).
@@ -48,14 +47,13 @@ In SQL, you can add a calculated column:
 .. code-block:: sql
 
     SELECT *, tip/total_bill as tip_rate
-    FROM tips
-    LIMIT 5;
+    FROM tips;
 
 With pandas, you can use the :meth:`DataFrame.assign` method of a DataFrame to append a new column:
 
 .. ipython:: python
 
-    tips.assign(tip_rate=tips["tip"] / tips["total_bill"]).head(5)
+    tips.assign(tip_rate=tips["tip"] / tips["total_bill"])
 
 WHERE
 -----
@@ -368,6 +366,20 @@ In pandas, you can use :meth:`~pandas.concat` in conjunction with
 
     pd.concat([df1, df2]).drop_duplicates()
 
+
+LIMIT
+-----
+
+.. code-block:: sql
+
+    SELECT * FROM tips
+    LIMIT 10;
+
+.. ipython:: python
+
+    tips.head(10)
+
+
 pandas equivalents for some SQL analytic and aggregate functions
 ----------------------------------------------------------------
 
diff --git a/doc/source/getting_started/comparison/comparison_with_stata.rst b/doc/source/getting_started/comparison/comparison_with_stata.rst
@@ -10,16 +10,6 @@ performed in pandas.
 
 .. include:: includes/introduction.rst
 
-.. note::
-
-   Throughout this tutorial, the pandas ``DataFrame`` will be displayed by calling
-   ``df.head()``, which displays the first N (default 5) rows of the ``DataFrame``.
-   This is often used in interactive work (e.g. `Jupyter notebook
-   <https://jupyter.org/>`_ or terminal) -- the equivalent in Stata would be:
-
-   .. code-block:: stata
-
-      list in 1/5
 
 Data structures
 ---------------
@@ -116,7 +106,7 @@ the data set if presented with a url.
        "/pandas/master/pandas/tests/io/data/csv/tips.csv"
    )
    tips = pd.read_csv(url)
-   tips.head()
+   tips
 
 Like ``import delimited``, :func:`read_csv` can take a number of parameters to specify
 how the data should be parsed.  For example, if the data were instead tab delimited,
@@ -141,6 +131,18 @@ such as Excel, SAS, HDF5, Parquet, and SQL databases.  These are all read via a
 function.  See the :ref:`IO documentation<io>` for more details.
 
 
+Limiting output
+~~~~~~~~~~~~~~~
+
+.. include:: includes/limit.rst
+
+The equivalent in Stata would be:
+
+.. code-block:: stata
+
+   list in 1/5
+
+
 Exporting data
 ~~~~~~~~~~~~~~
 
@@ -188,7 +190,7 @@ drops a column from the ``DataFrame``.
 
    tips["total_bill"] = tips["total_bill"] - 2
    tips["new_bill"] = tips["total_bill"] / 2
-   tips.head()
+   tips
 
    tips = tips.drop("new_bill", axis=1)
 
@@ -263,13 +265,13 @@ to a variable.
 .. ipython:: python
 
    # keep
-   tips[["sex", "total_bill", "tip"]].head()
+   tips[["sex", "total_bill", "tip"]]
 
    # drop
-   tips.drop("sex", axis=1).head()
+   tips.drop("sex", axis=1)
 
    # rename
-   tips.rename(columns={"total_bill": "total_bill_2"}).head()
+   tips.rename(columns={"total_bill": "total_bill_2"})
 
 
 Sorting by values
diff --git a/doc/source/getting_started/comparison/includes/extract_substring.rst b/doc/source/getting_started/comparison/includes/extract_substring.rst
@@ -4,4 +4,4 @@ indexes are zero-based.
 
 .. ipython:: python
 
-   tips["sex"].str[0:1].head()
+   tips["sex"].str[0:1]
diff --git a/doc/source/getting_started/comparison/includes/find_substring.rst b/doc/source/getting_started/comparison/includes/find_substring.rst
@@ -5,4 +5,4 @@ zero-based.
 
 .. ipython:: python
 
-   tips["sex"].str.find("ale").head()
+   tips["sex"].str.find("ale")
diff --git a/doc/source/getting_started/comparison/includes/groupby.rst b/doc/source/getting_started/comparison/includes/groupby.rst
@@ -4,4 +4,4 @@ pandas provides a flexible ``groupby`` mechanism that allows similar aggregation
 .. ipython:: python
 
    tips_summed = tips.groupby(["sex", "smoker"])[["total_bill", "tip"]].sum()
-   tips_summed.head()
+   tips_summed
diff --git a/doc/source/getting_started/comparison/includes/if_then.rst b/doc/source/getting_started/comparison/includes/if_then.rst
@@ -4,7 +4,7 @@ the ``where`` method from ``numpy``.
 .. ipython:: python
 
    tips["bucket"] = np.where(tips["total_bill"] < 10, "low", "high")
-   tips.head()
+   tips
 
 .. ipython:: python
    :suppress:
diff --git a/doc/source/getting_started/comparison/includes/length.rst b/doc/source/getting_started/comparison/includes/length.rst
@@ -4,5 +4,5 @@ Use ``len`` and ``rstrip`` to exclude trailing blanks.
 
 .. ipython:: python
 
-   tips["time"].str.len().head()
-   tips["time"].str.rstrip().str.len().head()
+   tips["time"].str.len()
+   tips["time"].str.rstrip().str.len()
diff --git a/doc/source/getting_started/comparison/includes/limit.rst b/doc/source/getting_started/comparison/includes/limit.rst
@@ -0,0 +1,7 @@
+By default, pandas will truncate output of large ``DataFrame``\s to show the first and last rows.
+This can be overridden by :ref:`changing the pandas options <options>`, or using
+:meth:`DataFrame.head` or :meth:`DataFrame.tail`.
+
+.. ipython:: python
+
+   tips.head(5)
diff --git a/doc/source/getting_started/comparison/includes/sorting.rst b/doc/source/getting_started/comparison/includes/sorting.rst
@@ -3,4 +3,4 @@ pandas has a :meth:`DataFrame.sort_values` method, which takes a list of columns
 .. ipython:: python
 
    tips = tips.sort_values(["sex", "total_bill"])
-   tips.head()
+   tips
diff --git a/doc/source/getting_started/comparison/includes/time_date.rst b/doc/source/getting_started/comparison/includes/time_date.rst
@@ -11,7 +11,7 @@
 
    tips[
        ["date1", "date2", "date1_year", "date2_month", "date1_next", "months_between"]
-   ].head()
+   ]
 
 .. ipython:: python
    :suppress:
diff --git a/doc/source/getting_started/comparison/includes/transform.rst b/doc/source/getting_started/comparison/includes/transform.rst
@@ -5,4 +5,4 @@ succinctly expressed in one operation.
 
    gb = tips.groupby("smoker")["total_bill"]
    tips["adj_total_bill"] = tips["total_bill"] - gb.transform("mean")
-   tips.head()
+   tips

Original file line number	Diff line number	Diff line change
`@@ -4,4 +4,4 @@ indexes are zero-based.`
`4`	`4`
`5`	`5`	`.. ipython:: python`
`6`	`6`
`7`		`- tips["sex"].str[0:1].head()`
	`7`	`+ tips["sex"].str[0:1]`
Original file line number	Diff line number	Diff line change
`@@ -5,4 +5,4 @@ zero-based.`
`5`	`5`
`6`	`6`	`.. ipython:: python`
`7`	`7`
`8`		`- tips["sex"].str.find("ale").head()`
	`8`	`+ tips["sex"].str.find("ale")`