From 112a9cd2020b3c0d14bc143ff72a9d89e0aa2032 Mon Sep 17 00:00:00 2001 From: Fabian Gabel <3385756+eltenedor@users.noreply.github.com> Date: Mon, 7 Mar 2022 08:27:29 +0100 Subject: [PATCH 01/16] N0_2 -> NO_2 **N**itrogen Di**o**xide --- doc/source/getting_started/intro_tutorials/04_plotting.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/getting_started/intro_tutorials/04_plotting.rst b/doc/source/getting_started/intro_tutorials/04_plotting.rst index a6d8142e68073..9c3aade7ca294 100644 --- a/doc/source/getting_started/intro_tutorials/04_plotting.rst +++ b/doc/source/getting_started/intro_tutorials/04_plotting.rst @@ -88,7 +88,7 @@ method. Hence, the :meth:`~DataFrame.plot` method works on both ``Series`` and .. note:: - The calculation of the values is done **element_wise**. This + The calculation of the values is done **element-wise**. This means all values in the given column are multiplied by the value 1.882 at once. You do not need to use a loop to iterate each of the rows! @@ -90,7 +90,7 @@ values in each row*. Also other mathematical operators (``+``, ``-``, ``\*``, ``/``) or -logical operators (``<``, ``>``, ``=``,…) work element wise. The latter was already +logical operators (``<``, ``>``, ``=``,…) work element-wise. The latter was already used in the :ref:`subset data tutorial <10min_tut_03_subset>` to filter rows of a table using a conditional expression. From e3c41fbfbda1807022fe2cd9fee29daecdfbff8d Mon Sep 17 00:00:00 2001 From: Fabian Gabel Date: Mon, 7 Mar 2022 22:05:26 +0100 Subject: [PATCH 04/16] DOC: add missing dot at end of sentences --- .../getting_started/intro_tutorials/03_subset_data.rst | 4 ++-- doc/source/getting_started/intro_tutorials/04_plotting.rst | 4 ++-- .../getting_started/intro_tutorials/05_add_columns.rst | 6 +++--- .../intro_tutorials/06_calculate_statistics.rst | 6 +++--- .../intro_tutorials/07_reshape_table_layout.rst | 6 +++--- 5 files changed, 13 insertions(+), 13 deletions(-) diff --git a/doc/source/getting_started/intro_tutorials/03_subset_data.rst b/doc/source/getting_started/intro_tutorials/03_subset_data.rst index 79df2b4325091..291cbddff58eb 100644 --- a/doc/source/getting_started/intro_tutorials/03_subset_data.rst +++ b/doc/source/getting_started/intro_tutorials/03_subset_data.rst @@ -358,9 +358,9 @@ See the user guide section on :ref:`different choices for indexing

REMEMBER

-- The ``.plot.*`` methods are applicable on both Series and DataFrames +- The ``.plot.*`` methods are applicable on both Series and DataFrames. - By default, each of the columns is plotted as a different element - (line, boxplot,…) + (line, boxplot,…). - Any plot created by pandas is a Matplotlib object. .. raw:: html diff --git a/doc/source/getting_started/intro_tutorials/05_add_columns.rst b/doc/source/getting_started/intro_tutorials/05_add_columns.rst index 62e9606542af9..70720415ecb2d 100644 --- a/doc/source/getting_started/intro_tutorials/05_add_columns.rst +++ b/doc/source/getting_started/intro_tutorials/05_add_columns.rst @@ -41,7 +41,7 @@ How to create new columns derived from existing columns? The aggregating statistic can be calculated for multiple columns at the -same time. Remember the ``describe`` function from :ref:`first tutorial <10min_tut_01_tableoriented>`? +same time. Remember the ``describe`` function from the :ref:`first tutorial <10min_tut_01_tableoriented>`? .. ipython:: python @@ -161,7 +161,7 @@ columns: titanic.groupby("Sex").mean() It does not make much sense to get the average value of the ``Pclass``. -if we are only interested in the average age for each gender, the +If we are only interested in the average age for each gender, the selection of columns (rectangular brackets ``[]`` as usual) is supported on the grouped data as well: @@ -254,7 +254,7 @@ within each group:
To user guide -The user guide has a dedicated section on ``value_counts`` , see page on :ref:`discretization `. +The user guide has a dedicated section on ``value_counts`` , see the page on :ref:`discretization `. .. raw:: html From 6c9103f0d3a50c686b01e92c8f62ebbc2d434cf4 Mon Sep 17 00:00:00 2001 From: Fabian Gabel Date: Tue, 8 Mar 2022 09:34:40 +0100 Subject: [PATCH 10/16] DOC: clarify titanic dataset columns --- .../intro_tutorials/includes/titanic.rst | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/doc/source/getting_started/intro_tutorials/includes/titanic.rst b/doc/source/getting_started/intro_tutorials/includes/titanic.rst index 1267a33d605ed..312ca48b45dd1 100644 --- a/doc/source/getting_started/intro_tutorials/includes/titanic.rst +++ b/doc/source/getting_started/intro_tutorials/includes/titanic.rst @@ -11,18 +11,17 @@ This tutorial uses the Titanic data set, stored as CSV. The data consists of the following data columns: - PassengerId: Id of every passenger. -- Survived: This feature have value 0 and 1. 0 for not survived and 1 - for survived. -- Pclass: There are 3 classes: Class 1, Class 2 and Class 3. +- Survived: Indication whether passenger survived. ``0`` for yes and ``1`` for no. +- Pclass: One out of the 3 ticket classes: Class ``1``, Class ``2`` and Class ``3``. - Name: Name of passenger. - Sex: Gender of passenger. -- Age: Age of passenger. -- SibSp: Indication that passenger have siblings and spouse. -- Parch: Whether a passenger is alone or have family. +- Age: Age of passenger in years. +- SibSp: Number of siblings or spouses aboard. +- Parch: Number of parents or children aboard. - Ticket: Ticket number of passenger. - Fare: Indicating the fare. -- Cabin: The cabin of passenger. -- Embarked: The embarked category. +- Cabin: Cabin number of passenger. +- Embarked: Port of embarkation. .. raw:: html From e01a671d4aefc03ea390632791ec1f3005273a8e Mon Sep 17 00:00:00 2001 From: Fabian Gabel Date: Wed, 9 Mar 2022 07:56:23 +0100 Subject: [PATCH 11/16] DOC: typos in tutorial 07 --- .../intro_tutorials/07_reshape_table_layout.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst b/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst index 018f8e7577988..8cf984a924317 100644 --- a/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst +++ b/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst @@ -130,7 +130,7 @@ defined column(s). The index will follow the row order.
To user guide -More details about sorting of tables is provided in the using guide section on :ref:`sorting data `. +More details about sorting of tables is provided in the user guide section on :ref:`sorting data `. .. raw:: html @@ -142,7 +142,7 @@ Long to wide table format Let’s use a small subset of the air quality data set. We focus on :math:`NO_2` data and only use the first two measurements of each location (i.e. the head of each group). The subset of data will be -called ``no2_subset`` +called ``no2_subset``. .. ipython:: python @@ -163,7 +163,7 @@ called ``no2_subset``
  • -I want the values for the three stations as separate columns next to each other +I want the values for the three stations as separate columns next to each other. .. ipython:: python @@ -177,7 +177,7 @@ for each index/column combination is required.
-As pandas support plotting of multiple columns (see :ref:`plotting tutorial <10min_tut_04_plotting>`) out of the box, the conversion from +As pandas supports plotting of multiple columns (see :ref:`plotting tutorial <10min_tut_04_plotting>`) out of the box, the conversion from *long* to *wide* table format enables the plotting of the different time series at the same time: @@ -216,7 +216,7 @@ Pivot table
  • -I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of the stations in table form +I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of the stations in table form. .. ipython:: python @@ -226,7 +226,7 @@ I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of In the case of :meth:`~DataFrame.pivot`, the data is only rearranged. When multiple values need to be aggregated (in this specific case, the values on -different time steps) :meth:`~DataFrame.pivot_table` can be used, providing an +different time steps), :meth:`~DataFrame.pivot_table` can be used, providing an aggregation function (e.g. mean) on how to combine these values. .. raw:: html @@ -298,7 +298,7 @@ section:
    • -I want to collect all air quality :math:`NO_2` measurements in a single column (long format) +I want to collect all air quality :math:`NO_2` measurements in a single column (long format). .. ipython:: python From 7a1572c3ac53f189ce6a461d6e7f30eaec1f8afa Mon Sep 17 00:00:00 2001 From: Fabian Gabel Date: Wed, 9 Mar 2022 16:43:20 +0100 Subject: [PATCH 12/16] DOC: adapt explanation of margins --- .../intro_tutorials/07_reshape_table_layout.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst b/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst index 8cf984a924317..8727fc9a74ffa 100644 --- a/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst +++ b/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst @@ -117,7 +117,7 @@ I want to sort the Titanic data according to the cabin class and age in descendi titanic.sort_values(by=['Pclass', 'Age'], ascending=False).head() -With :meth:`Series.sort_values`, the rows in the table are sorted according to the +With :meth:`DataFrame.sort_values`, the rows in the table are sorted according to the defined column(s). The index will follow the row order. .. raw:: html @@ -235,8 +235,8 @@ aggregation function (e.g. mean) on how to combine these values.
    Pivot table is a well known concept in spreadsheet software. When -interested in summary columns for each variable separately as well, put -the ``margin`` parameter to ``True``: +interested in the row/column margins (subtotals) for each variable, set +the ``margins`` parameter to ``True``: .. ipython:: python From 68a0481899a891527f761548ed164409882bf9e1 Mon Sep 17 00:00:00 2001 From: Fabian Gabel Date: Fri, 11 Mar 2022 07:00:02 +0100 Subject: [PATCH 13/16] DOC: consistent spelling of OpenAQ --- .../intro_tutorials/07_reshape_table_layout.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst b/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst index 8727fc9a74ffa..0b05a53110308 100644 --- a/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst +++ b/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst @@ -37,7 +37,7 @@ This tutorial uses air quality data about :math:`NO_2` and Particulate matter less than 2.5 micrometers, made available by -`openaq `__ and using the +`OpenAQ `__ and using the `py-openaq `__ package. The ``air_quality_long.csv`` data set provides :math:`NO_2` and :math:`PM_{25}` values for the measurement stations *FR04014*, *BETR801* @@ -283,7 +283,7 @@ Wide to long format ~~~~~~~~~~~~~~~~~~~ Starting again from the wide format table created in the previous -section: +section, we add a new index to the ``DataFrame`` with :meth:`~DataFrame.reset_index`. .. ipython:: python From ed31d8246c9aeeabad1e4ee68e0c39f875cfd3b6 Mon Sep 17 00:00:00 2001 From: Fabian Gabel Date: Fri, 11 Mar 2022 07:07:58 +0100 Subject: [PATCH 14/16] DOC fix typo and adapt formulation for melt with parameters --- .../intro_tutorials/07_reshape_table_layout.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst b/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst index 0b05a53110308..27d6f95923ed0 100644 --- a/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst +++ b/doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst @@ -319,7 +319,7 @@ will *melt* all columns NOT mentioned in ``id_vars`` together into two columns: A column with the column header names and a column with the values itself. The latter column gets by default the name ``value``. -The :func:`pandas.melt` method can be defined in more detail: +The parameters passed to :func:`pandas.melt` can be defined in more detail: .. ipython:: python @@ -331,9 +331,9 @@ The :func:`pandas.melt` method can be defined in more detail: ) no_2.head() -The result in the same, but in more detail defined: +The additional parameters have the following effects: -- ``value_vars`` defines explicitly which columns to *melt* together +- ``value_vars`` defines which columns to *melt* together - ``value_name`` provides a custom column name for the values column instead of the default column name ``value`` - ``var_name`` provides a custom column name for the column collecting From 3281a676b5881f474f005a9923ed8b6eed94e578 Mon Sep 17 00:00:00 2001 From: Fabian Gabel Date: Fri, 11 Mar 2022 07:15:40 +0100 Subject: [PATCH 15/16] DOC axis vs. axes --- .../getting_started/intro_tutorials/08_combine_dataframes.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst b/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst index 9203e18ac5bad..8cc9263626414 100644 --- a/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst +++ b/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst @@ -110,7 +110,7 @@ I want to combine the measurements of :math:`NO_2` and :math:`PM_{25}`, two tabl air_quality.head() The :func:`~pandas.concat` function performs concatenation operations of multiple -tables along one of the axis (row-wise or column-wise). +tables along one of the axes (row-wise or column-wise). .. raw:: html From 5a9a78beaf55d3452bf0b4a3351d7479a2d50a10 Mon Sep 17 00:00:00 2001 From: Fabian Gabel Date: Fri, 11 Mar 2022 20:39:12 +0100 Subject: [PATCH 16/16] DOC: add trailing dot --- .../intro_tutorials/08_combine_dataframes.rst | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst b/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst index 8cc9263626414..b2b3891056017 100644 --- a/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst +++ b/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst @@ -102,7 +102,7 @@ Concatenating objects
    • -I want to combine the measurements of :math:`NO_2` and :math:`PM_{25}`, two tables with a similar structure, in a single table +I want to combine the measurements of :math:`NO_2` and :math:`PM_{25}`, two tables with a similar structure, in a single table. .. ipython:: python @@ -149,16 +149,13 @@ origin of the table (either ``no2`` from table ``air_quality_no2`` or In this specific example, the ``parameter`` column provided by the data ensures that each of the original tables can be identified. This is not -always the case. the ``concat`` function provides a convenient solution +always the case. The ``concat`` function provides a convenient solution with the ``keys`` argument, adding an additional (hierarchical) row index. For example: .. ipython:: python air_quality_ = pd.concat([air_quality_pm25, air_quality_no2], keys=["PM25", "NO2"]) - -.. ipython:: python - air_quality_.head() .. note:: @@ -254,7 +251,7 @@ supports multiple join options similar to database-style operations.
      • -Add the parameter full description and name, provided by the parameters metadata table, to the measurements table +Add the parameters' full description and name, provided by the parameters metadata table, to the measurements table. .. warning:: The air quality parameters metadata are stored in a data file