DOC: update tutorial docs (#46249)

fabiangabel · web-flow · commit 98298aaf33af · 2022-03-17T20:53:03.000-04:00
diff --git a/doc/source/getting_started/intro_tutorials/03_subset_data.rst b/doc/source/getting_started/intro_tutorials/03_subset_data.rst
@@ -358,9 +358,9 @@ See the user guide section on :ref:`different choices for indexing <indexing.cho
    of column/row labels, a slice of labels, a conditional expression or
    a colon.
 -  Select specific rows and/or columns using ``loc`` when using the row
-   and column names
+   and column names.
 -  Select specific rows and/or columns using ``iloc`` when using the
-   positions in the table
+   positions in the table.
 -  You can assign new values to a selection based on ``loc``/``iloc``.
 
 .. raw:: html
diff --git a/doc/source/getting_started/intro_tutorials/04_plotting.rst b/doc/source/getting_started/intro_tutorials/04_plotting.rst
@@ -88,7 +88,7 @@ method. Hence, the :meth:`~DataFrame.plot` method works on both ``Series`` and
     <ul class="task-bullet">
         <li>
 
-I want to visually compare the :math:`N0_2` values measured in London versus Paris.
+I want to visually compare the :math:`NO_2` values measured in London versus Paris.
 
 .. ipython:: python
 
@@ -197,26 +197,26 @@ I want to further customize, extend or save the resulting plot.
     </ul>
 
 Each of the plot objects created by pandas is a
-`matplotlib <https://matplotlib.org/>`__ object. As Matplotlib provides
+`Matplotlib <https://matplotlib.org/>`__ object. As Matplotlib provides
 plenty of options to customize plots, making the link between pandas and
-Matplotlib explicit enables all the power of matplotlib to the plot.
+Matplotlib explicit enables all the power of Matplotlib to the plot.
 This strategy is applied in the previous example:
 
 ::
 
-   fig, axs = plt.subplots(figsize=(12, 4))        # Create an empty matplotlib Figure and Axes
+   fig, axs = plt.subplots(figsize=(12, 4))        # Create an empty Matplotlib Figure and Axes
    air_quality.plot.area(ax=axs)                   # Use pandas to put the area plot on the prepared Figure/Axes
-   axs.set_ylabel("NO$_2$ concentration")          # Do any matplotlib customization you like
-   fig.savefig("no2_concentrations.png")           # Save the Figure/Axes using the existing matplotlib method.
+   axs.set_ylabel("NO$_2$ concentration")          # Do any Matplotlib customization you like
+   fig.savefig("no2_concentrations.png")           # Save the Figure/Axes using the existing Matplotlib method.
 
 .. raw:: html
 
     <div class="shadow gs-callout gs-callout-remember">
         <h4>REMEMBER</h4>
 
--  The ``.plot.*`` methods are applicable on both Series and DataFrames
+-  The ``.plot.*`` methods are applicable on both Series and DataFrames.
 -  By default, each of the columns is plotted as a different element
-   (line, boxplot,…)
+   (line, boxplot,…).
 -  Any plot created by pandas is a Matplotlib object.
 
 .. raw:: html
diff --git a/doc/source/getting_started/intro_tutorials/05_add_columns.rst b/doc/source/getting_started/intro_tutorials/05_add_columns.rst
@@ -41,7 +41,7 @@ How to create new columns derived from existing columns?
     <ul class="task-bullet">
         <li>
 
-I want to express the :math:`NO_2` concentration of the station in London in mg/m\ :math:`^3`
+I want to express the :math:`NO_2` concentration of the station in London in mg/m\ :math:`^3`.
 
 (*If we assume temperature of 25 degrees Celsius and pressure of 1013
 hPa, the conversion factor is 1.882*)
@@ -60,7 +60,7 @@ at the left side of the assignment.
     </ul>
 
 .. note::
-    The calculation of the values is done **element_wise**. This
+    The calculation of the values is done **element-wise**. This
     means all values in the given column are multiplied by the value 1.882
     at once. You do not need to use a loop to iterate each of the rows!
 
@@ -72,7 +72,7 @@ at the left side of the assignment.
     <ul class="task-bullet">
         <li>
 
-I want to check the ratio of the values in Paris versus Antwerp and save the result in a new column
+I want to check the ratio of the values in Paris versus Antwerp and save the result in a new column.
 
 .. ipython:: python
 
@@ -89,8 +89,8 @@ values in each row*.
         </li>
     </ul>
 
-Also other mathematical operators (``+``, ``-``, ``\*``, ``/``) or
-logical operators (``<``, ``>``, ``=``,…) work element wise. The latter was already
+Also other mathematical operators (``+``, ``-``, ``*``, ``/``,…) or
+logical operators (``<``, ``>``, ``==``,…) work element-wise. The latter was already
 used in the :ref:`subset data tutorial <10min_tut_03_subset>` to filter
 rows of a table using a conditional expression.
 
@@ -101,7 +101,7 @@ If you need more advanced logic, you can use arbitrary Python code via :meth:`~D
     <ul class="task-bullet">
         <li>
 
-I want to rename the data columns to the corresponding station identifiers used by openAQ
+I want to rename the data columns to the corresponding station identifiers used by `OpenAQ <https://openaq.org/>`__.
 
 .. ipython:: python
 
diff --git a/doc/source/getting_started/intro_tutorials/06_calculate_statistics.rst b/doc/source/getting_started/intro_tutorials/06_calculate_statistics.rst
@@ -74,15 +74,15 @@ What is the median age and ticket fare price of the Titanic passengers?
     titanic[["Age", "Fare"]].median()
 
 The statistic applied to multiple columns of a ``DataFrame`` (the selection of two columns
-return a ``DataFrame``, see the :ref:`subset data tutorial <10min_tut_03_subset>`) is calculated for each numeric column.
+returns a ``DataFrame``, see the :ref:`subset data tutorial <10min_tut_03_subset>`) is calculated for each numeric column.
 
 .. raw:: html
 
         </li>
     </ul>
 
 The aggregating statistic can be calculated for multiple columns at the
-same time. Remember the ``describe`` function from :ref:`first tutorial <10min_tut_01_tableoriented>`?
+same time. Remember the ``describe`` function from the :ref:`first tutorial <10min_tut_01_tableoriented>`?
 
 .. ipython:: python
 
@@ -161,7 +161,7 @@ columns:
     titanic.groupby("Sex").mean()
 
 It does not make much sense to get the average value of the ``Pclass``.
-if we are only interested in the average age for each gender, the
+If we are only interested in the average age for each gender, the
 selection of columns (rectangular brackets ``[]`` as usual) is supported
 on the grouped data as well:
 
@@ -254,7 +254,7 @@ within each group:
     <div class="d-flex flex-row gs-torefguide">
         <span class="badge badge-info">To user guide</span>
 
-The user guide has a dedicated section on ``value_counts`` , see page on :ref:`discretization <basics.discretization>`.
+The user guide has a dedicated section on ``value_counts`` , see the page on :ref:`discretization <basics.discretization>`.
 
 .. raw:: html
 
@@ -265,10 +265,10 @@ The user guide has a dedicated section on ``value_counts`` , see page on :ref:`d
     <div class="shadow gs-callout gs-callout-remember">
         <h4>REMEMBER</h4>
 
--  Aggregation statistics can be calculated on entire columns or rows
--  ``groupby`` provides the power of the *split-apply-combine* pattern
+-  Aggregation statistics can be calculated on entire columns or rows.
+-  ``groupby`` provides the power of the *split-apply-combine* pattern.
 -  ``value_counts`` is a convenient shortcut to count the number of
-   entries in each category of a variable
+   entries in each category of a variable.
 
 .. raw:: html
 
diff --git a/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst b/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst
@@ -37,7 +37,7 @@
 
 This tutorial uses air quality data about :math:`NO_2` and Particulate matter less than 2.5
 micrometers, made available by
-`openaq <https://openaq.org>`__ and using the
+`OpenAQ <https://openaq.org>`__ and using the
 `py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
 The ``air_quality_long.csv`` data set provides :math:`NO_2` and
 :math:`PM_{25}` values for the measurement stations *FR04014*, *BETR801*
@@ -117,7 +117,7 @@ I want to sort the Titanic data according to the cabin class and age in descendi
 
     titanic.sort_values(by=['Pclass', 'Age'], ascending=False).head()
 
-With :meth:`Series.sort_values`, the rows in the table are sorted according to the
+With :meth:`DataFrame.sort_values`, the rows in the table are sorted according to the
 defined column(s). The index will follow the row order.
 
 .. raw:: html
@@ -130,7 +130,7 @@ defined column(s). The index will follow the row order.
     <div class="d-flex flex-row gs-torefguide">
         <span class="badge badge-info">To user guide</span>
 
-More details about sorting of tables is provided in the using guide section on :ref:`sorting data <basics.sorting>`.
+More details about sorting of tables is provided in the user guide section on :ref:`sorting data <basics.sorting>`.
 
 .. raw:: html
 
@@ -142,7 +142,7 @@ Long to wide table format
 Let’s use a small subset of the air quality data set. We focus on
 :math:`NO_2` data and only use the first two measurements of each
 location (i.e. the head of each group). The subset of data will be
-called ``no2_subset``
+called ``no2_subset``.
 
 .. ipython:: python
 
@@ -163,7 +163,7 @@ called ``no2_subset``
     <ul class="task-bullet">
         <li>
 
-I want the values for the three stations as separate columns next to each other
+I want the values for the three stations as separate columns next to each other.
 
 .. ipython:: python
 
@@ -177,7 +177,7 @@ for each index/column combination is required.
         </li>
     </ul>
 
-As pandas support plotting of multiple columns (see :ref:`plotting tutorial <10min_tut_04_plotting>`) out of the box, the conversion from
+As pandas supports plotting of multiple columns (see :ref:`plotting tutorial <10min_tut_04_plotting>`) out of the box, the conversion from
 *long* to *wide* table format enables the plotting of the different time
 series at the same time:
 
@@ -216,7 +216,7 @@ Pivot table
     <ul class="task-bullet">
         <li>
 
-I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of the stations in table form
+I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of the stations in table form.
 
 .. ipython:: python
 
@@ -226,7 +226,7 @@ I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of
 
 In the case of :meth:`~DataFrame.pivot`, the data is only rearranged. When multiple
 values need to be aggregated (in this specific case, the values on
-different time steps) :meth:`~DataFrame.pivot_table` can be used, providing an
+different time steps), :meth:`~DataFrame.pivot_table` can be used, providing an
 aggregation function (e.g. mean) on how to combine these values.
 
 .. raw:: html
@@ -235,8 +235,8 @@ aggregation function (e.g. mean) on how to combine these values.
     </ul>
 
 Pivot table is a well known concept in spreadsheet software. When
-interested in summary columns for each variable separately as well, put
-the ``margin`` parameter to ``True``:
+interested in the row/column margins (subtotals) for each variable, set
+the ``margins`` parameter to ``True``:
 
 .. ipython:: python
 
@@ -283,7 +283,7 @@ Wide to long format
 ~~~~~~~~~~~~~~~~~~~
 
 Starting again from the wide format table created in the previous
-section:
+section, we add a new index to the ``DataFrame`` with :meth:`~DataFrame.reset_index`.
 
 .. ipython:: python
 
@@ -298,7 +298,7 @@ section:
     <ul class="task-bullet">
         <li>
 
-I want to collect all air quality :math:`NO_2` measurements in a single column (long format)
+I want to collect all air quality :math:`NO_2` measurements in a single column (long format).
 
 .. ipython:: python
 
@@ -319,7 +319,7 @@ will *melt* all columns NOT mentioned in ``id_vars`` together into two
 columns: A column with the column header names and a column with the
 values itself. The latter column gets by default the name ``value``.
 
-The :func:`pandas.melt` method can be defined in more detail:
+The parameters passed to :func:`pandas.melt` can be defined in more detail:
 
 .. ipython:: python
 
@@ -331,9 +331,9 @@ The :func:`pandas.melt` method can be defined in more detail:
     )
     no_2.head()
 
-The result in the same, but in more detail defined:
+The additional parameters have the following effects:
 
--  ``value_vars`` defines explicitly which columns to *melt* together
+-  ``value_vars`` defines which columns to *melt* together
 -  ``value_name`` provides a custom column name for the values column
    instead of the default column name ``value``
 -  ``var_name`` provides a custom column name for the column collecting
@@ -360,11 +360,11 @@ Conversion from wide to long format with :func:`pandas.melt` is explained in the
     <div class="shadow gs-callout gs-callout-remember">
         <h4>REMEMBER</h4>
 
--  Sorting by one or more columns is supported by ``sort_values``
+-  Sorting by one or more columns is supported by ``sort_values``.
 -  The ``pivot`` function is purely restructuring of the data,
-   ``pivot_table`` supports aggregations
+   ``pivot_table`` supports aggregations.
 -  The reverse of ``pivot`` (long to wide format) is ``melt`` (wide to
-   long format)
+   long format).
 
 .. raw:: html
 
diff --git a/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst b/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst
@@ -24,7 +24,7 @@
                         <p class="card-text">
 
 For this tutorial, air quality data about :math:`NO_2` is used, made available by
-`openaq <https://openaq.org>`__ and downloaded using the
+`OpenAQ <https://openaq.org>`__ and downloaded using the
 `py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
 
 The ``air_quality_no2_long.csv`` data set provides :math:`NO_2`
@@ -59,7 +59,7 @@ Westminster* in respectively Paris, Antwerp and London.
 
 For this tutorial, air quality data about Particulate
 matter less than 2.5 micrometers is used, made available by
-`openaq <https://openaq.org>`__ and downloaded using the
+`OpenAQ <https://openaq.org>`__ and downloaded using the
 `py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
 
 The ``air_quality_pm25_long.csv`` data set provides :math:`PM_{25}`
@@ -102,15 +102,15 @@ Concatenating objects
     <ul class="task-bullet">
         <li>
 
-I want to combine the measurements of :math:`NO_2` and :math:`PM_{25}`, two tables with a similar structure, in a single table
+I want to combine the measurements of :math:`NO_2` and :math:`PM_{25}`, two tables with a similar structure, in a single table.
 
 .. ipython:: python
 
     air_quality = pd.concat([air_quality_pm25, air_quality_no2], axis=0)
     air_quality.head()
 
 The :func:`~pandas.concat` function performs concatenation operations of multiple
-tables along one of the axis (row-wise or column-wise).
+tables along one of the axes (row-wise or column-wise).
 
 .. raw:: html
 
@@ -149,16 +149,13 @@ origin of the table (either ``no2`` from table ``air_quality_no2`` or
 
 In this specific example, the ``parameter`` column provided by the data
 ensures that each of the original tables can be identified. This is not
-always the case. the ``concat`` function provides a convenient solution
+always the case. The ``concat`` function provides a convenient solution
 with the ``keys`` argument, adding an additional (hierarchical) row
 index. For example:
 
 .. ipython:: python
 
     air_quality_ = pd.concat([air_quality_pm25, air_quality_no2], keys=["PM25", "NO2"])
-
-.. ipython:: python
-
     air_quality_.head()
 
 .. note::
@@ -254,7 +251,7 @@ supports multiple join options similar to database-style operations.
     <ul class="task-bullet">
         <li>
 
-Add the parameter full description and name, provided by the parameters metadata table, to the measurements table
+Add the parameters' full description and name, provided by the parameters metadata table, to the measurements table.
 
 .. warning::
     The air quality parameters metadata are stored in a data file
diff --git a/doc/source/getting_started/intro_tutorials/09_timeseries.rst b/doc/source/getting_started/intro_tutorials/09_timeseries.rst
@@ -26,7 +26,7 @@
 
 For this tutorial, air quality data about :math:`NO_2` and Particulate
 matter less than 2.5 micrometers is used, made available by
-`openaq <https://openaq.org>`__ and downloaded using the
+`OpenAQ <https://openaq.org>`__ and downloaded using the
 `py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
 The ``air_quality_no2_long.csv"`` data set provides :math:`NO_2` values
 for the measurement stations *FR04014*, *BETR801* and *London
@@ -209,7 +209,7 @@ Plot the typical :math:`NO_2` pattern during the day of our time series of all s
     air_quality.groupby(air_quality["datetime"].dt.hour)["value"].mean().plot(
         kind='bar', rot=0, ax=axs
     )
-    plt.xlabel("Hour of the day");  # custom x label using matplotlib
+    plt.xlabel("Hour of the day");  # custom x label using Matplotlib
     @savefig 09_bar_chart.png
     plt.ylabel("$NO_2 (µg/m^3)$");
 
diff --git a/doc/source/getting_started/intro_tutorials/includes/air_quality_no2.rst b/doc/source/getting_started/intro_tutorials/includes/air_quality_no2.rst
@@ -8,7 +8,7 @@
             <p class="card-text">
 
 For this tutorial, air quality data about :math:`NO_2` is used, made
-available by `openaq <https://openaq.org>`__ and using the
+available by `OpenAQ <https://openaq.org>`__ and using the
 `py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
 The ``air_quality_no2.csv`` data set provides :math:`NO_2` values for
 the measurement stations *FR04014*, *BETR801* and *London Westminster*
diff --git a/doc/source/getting_started/intro_tutorials/includes/titanic.rst b/doc/source/getting_started/intro_tutorials/includes/titanic.rst