Skip to content

DOC: update tutorial docs #46249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Mar 18, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions doc/source/getting_started/intro_tutorials/03_subset_data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -358,9 +358,9 @@ See the user guide section on :ref:`different choices for indexing <indexing.cho
of column/row labels, a slice of labels, a conditional expression or
a colon.
- Select specific rows and/or columns using ``loc`` when using the row
and column names
and column names.
- Select specific rows and/or columns using ``iloc`` when using the
positions in the table
positions in the table.
- You can assign new values to a selection based on ``loc``/``iloc``.

.. raw:: html
Expand Down
16 changes: 8 additions & 8 deletions doc/source/getting_started/intro_tutorials/04_plotting.rst
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ method. Hence, the :meth:`~DataFrame.plot` method works on both ``Series`` and
<ul class="task-bullet">
<li>

I want to visually compare the :math:`N0_2` values measured in London versus Paris.
I want to visually compare the :math:`NO_2` values measured in London versus Paris.

.. ipython:: python

Expand Down Expand Up @@ -197,26 +197,26 @@ I want to further customize, extend or save the resulting plot.
</ul>

Each of the plot objects created by pandas is a
`matplotlib <https://matplotlib.org/>`__ object. As Matplotlib provides
`Matplotlib <https://matplotlib.org/>`__ object. As Matplotlib provides
plenty of options to customize plots, making the link between pandas and
Matplotlib explicit enables all the power of matplotlib to the plot.
Matplotlib explicit enables all the power of Matplotlib to the plot.
This strategy is applied in the previous example:

::

fig, axs = plt.subplots(figsize=(12, 4)) # Create an empty matplotlib Figure and Axes
fig, axs = plt.subplots(figsize=(12, 4)) # Create an empty Matplotlib Figure and Axes
air_quality.plot.area(ax=axs) # Use pandas to put the area plot on the prepared Figure/Axes
axs.set_ylabel("NO$_2$ concentration") # Do any matplotlib customization you like
fig.savefig("no2_concentrations.png") # Save the Figure/Axes using the existing matplotlib method.
axs.set_ylabel("NO$_2$ concentration") # Do any Matplotlib customization you like
fig.savefig("no2_concentrations.png") # Save the Figure/Axes using the existing Matplotlib method.

.. raw:: html

<div class="shadow gs-callout gs-callout-remember">
<h4>REMEMBER</h4>

- The ``.plot.*`` methods are applicable on both Series and DataFrames
- The ``.plot.*`` methods are applicable on both Series and DataFrames.
- By default, each of the columns is plotted as a different element
(line, boxplot,…)
(line, boxplot,…).
- Any plot created by pandas is a Matplotlib object.

.. raw:: html
Expand Down
12 changes: 6 additions & 6 deletions doc/source/getting_started/intro_tutorials/05_add_columns.rst
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ How to create new columns derived from existing columns?
<ul class="task-bullet">
<li>

I want to express the :math:`NO_2` concentration of the station in London in mg/m\ :math:`^3`
I want to express the :math:`NO_2` concentration of the station in London in mg/m\ :math:`^3`.

(*If we assume temperature of 25 degrees Celsius and pressure of 1013
hPa, the conversion factor is 1.882*)
Expand All @@ -60,7 +60,7 @@ at the left side of the assignment.
</ul>

.. note::
The calculation of the values is done **element_wise**. This
The calculation of the values is done **element-wise**. This
means all values in the given column are multiplied by the value 1.882
at once. You do not need to use a loop to iterate each of the rows!

Expand All @@ -72,7 +72,7 @@ at the left side of the assignment.
<ul class="task-bullet">
<li>

I want to check the ratio of the values in Paris versus Antwerp and save the result in a new column
I want to check the ratio of the values in Paris versus Antwerp and save the result in a new column.

.. ipython:: python

Expand All @@ -89,8 +89,8 @@ values in each row*.
</li>
</ul>

Also other mathematical operators (``+``, ``-``, ``\*``, ``/``) or
logical operators (``<``, ``>``, ``=``,…) work element wise. The latter was already
Also other mathematical operators (``+``, ``-``, ``*``, ``/``,…) or
logical operators (``<``, ``>``, ``==``,…) work element-wise. The latter was already
used in the :ref:`subset data tutorial <10min_tut_03_subset>` to filter
rows of a table using a conditional expression.

Expand All @@ -101,7 +101,7 @@ If you need more advanced logic, you can use arbitrary Python code via :meth:`~D
<ul class="task-bullet">
<li>

I want to rename the data columns to the corresponding station identifiers used by openAQ
I want to rename the data columns to the corresponding station identifiers used by `OpenAQ <https://openaq.org/>`__.

.. ipython:: python

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -74,15 +74,15 @@ What is the median age and ticket fare price of the Titanic passengers?
titanic[["Age", "Fare"]].median()

The statistic applied to multiple columns of a ``DataFrame`` (the selection of two columns
return a ``DataFrame``, see the :ref:`subset data tutorial <10min_tut_03_subset>`) is calculated for each numeric column.
returns a ``DataFrame``, see the :ref:`subset data tutorial <10min_tut_03_subset>`) is calculated for each numeric column.

.. raw:: html

</li>
</ul>

The aggregating statistic can be calculated for multiple columns at the
same time. Remember the ``describe`` function from :ref:`first tutorial <10min_tut_01_tableoriented>`?
same time. Remember the ``describe`` function from the :ref:`first tutorial <10min_tut_01_tableoriented>`?

.. ipython:: python

Expand Down Expand Up @@ -161,7 +161,7 @@ columns:
titanic.groupby("Sex").mean()

It does not make much sense to get the average value of the ``Pclass``.
if we are only interested in the average age for each gender, the
If we are only interested in the average age for each gender, the
selection of columns (rectangular brackets ``[]`` as usual) is supported
on the grouped data as well:

Expand Down Expand Up @@ -254,7 +254,7 @@ within each group:
<div class="d-flex flex-row gs-torefguide">
<span class="badge badge-info">To user guide</span>

The user guide has a dedicated section on ``value_counts`` , see page on :ref:`discretization <basics.discretization>`.
The user guide has a dedicated section on ``value_counts`` , see the page on :ref:`discretization <basics.discretization>`.

.. raw:: html

Expand All @@ -265,10 +265,10 @@ The user guide has a dedicated section on ``value_counts`` , see page on :ref:`d
<div class="shadow gs-callout gs-callout-remember">
<h4>REMEMBER</h4>

- Aggregation statistics can be calculated on entire columns or rows
- ``groupby`` provides the power of the *split-apply-combine* pattern
- Aggregation statistics can be calculated on entire columns or rows.
- ``groupby`` provides the power of the *split-apply-combine* pattern.
- ``value_counts`` is a convenient shortcut to count the number of
entries in each category of a variable
entries in each category of a variable.

.. raw:: html

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@

This tutorial uses air quality data about :math:`NO_2` and Particulate matter less than 2.5
micrometers, made available by
`openaq <https://openaq.org>`__ and using the
`OpenAQ <https://openaq.org>`__ and using the
`py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
The ``air_quality_long.csv`` data set provides :math:`NO_2` and
:math:`PM_{25}` values for the measurement stations *FR04014*, *BETR801*
Expand Down Expand Up @@ -117,7 +117,7 @@ I want to sort the Titanic data according to the cabin class and age in descendi

titanic.sort_values(by=['Pclass', 'Age'], ascending=False).head()

With :meth:`Series.sort_values`, the rows in the table are sorted according to the
With :meth:`DataFrame.sort_values`, the rows in the table are sorted according to the
defined column(s). The index will follow the row order.

.. raw:: html
Expand All @@ -130,7 +130,7 @@ defined column(s). The index will follow the row order.
<div class="d-flex flex-row gs-torefguide">
<span class="badge badge-info">To user guide</span>

More details about sorting of tables is provided in the using guide section on :ref:`sorting data <basics.sorting>`.
More details about sorting of tables is provided in the user guide section on :ref:`sorting data <basics.sorting>`.

.. raw:: html

Expand All @@ -142,7 +142,7 @@ Long to wide table format
Let’s use a small subset of the air quality data set. We focus on
:math:`NO_2` data and only use the first two measurements of each
location (i.e. the head of each group). The subset of data will be
called ``no2_subset``
called ``no2_subset``.

.. ipython:: python

Expand All @@ -163,7 +163,7 @@ called ``no2_subset``
<ul class="task-bullet">
<li>

I want the values for the three stations as separate columns next to each other
I want the values for the three stations as separate columns next to each other.

.. ipython:: python

Expand All @@ -177,7 +177,7 @@ for each index/column combination is required.
</li>
</ul>

As pandas support plotting of multiple columns (see :ref:`plotting tutorial <10min_tut_04_plotting>`) out of the box, the conversion from
As pandas supports plotting of multiple columns (see :ref:`plotting tutorial <10min_tut_04_plotting>`) out of the box, the conversion from
*long* to *wide* table format enables the plotting of the different time
series at the same time:

Expand Down Expand Up @@ -216,7 +216,7 @@ Pivot table
<ul class="task-bullet">
<li>

I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of the stations in table form
I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of the stations in table form.

.. ipython:: python

Expand All @@ -226,7 +226,7 @@ I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of

In the case of :meth:`~DataFrame.pivot`, the data is only rearranged. When multiple
values need to be aggregated (in this specific case, the values on
different time steps) :meth:`~DataFrame.pivot_table` can be used, providing an
different time steps), :meth:`~DataFrame.pivot_table` can be used, providing an
aggregation function (e.g. mean) on how to combine these values.

.. raw:: html
Expand All @@ -235,8 +235,8 @@ aggregation function (e.g. mean) on how to combine these values.
</ul>

Pivot table is a well known concept in spreadsheet software. When
interested in summary columns for each variable separately as well, put
the ``margin`` parameter to ``True``:
interested in the row/column margins (subtotals) for each variable, set
the ``margins`` parameter to ``True``:

.. ipython:: python

Expand Down Expand Up @@ -283,7 +283,7 @@ Wide to long format
~~~~~~~~~~~~~~~~~~~

Starting again from the wide format table created in the previous
section:
section, we add a new index to the ``DataFrame`` with :meth:`~DataFrame.reset_index`.

.. ipython:: python

Expand All @@ -298,7 +298,7 @@ section:
<ul class="task-bullet">
<li>

I want to collect all air quality :math:`NO_2` measurements in a single column (long format)
I want to collect all air quality :math:`NO_2` measurements in a single column (long format).

.. ipython:: python

Expand All @@ -319,7 +319,7 @@ will *melt* all columns NOT mentioned in ``id_vars`` together into two
columns: A column with the column header names and a column with the
values itself. The latter column gets by default the name ``value``.

The :func:`pandas.melt` method can be defined in more detail:
The parameters passed to :func:`pandas.melt` can be defined in more detail:

.. ipython:: python

Expand All @@ -331,9 +331,9 @@ The :func:`pandas.melt` method can be defined in more detail:
)
no_2.head()

The result in the same, but in more detail defined:
The additional parameters have the following effects:

- ``value_vars`` defines explicitly which columns to *melt* together
- ``value_vars`` defines which columns to *melt* together
- ``value_name`` provides a custom column name for the values column
instead of the default column name ``value``
- ``var_name`` provides a custom column name for the column collecting
Expand All @@ -360,11 +360,11 @@ Conversion from wide to long format with :func:`pandas.melt` is explained in the
<div class="shadow gs-callout gs-callout-remember">
<h4>REMEMBER</h4>

- Sorting by one or more columns is supported by ``sort_values``
- Sorting by one or more columns is supported by ``sort_values``.
- The ``pivot`` function is purely restructuring of the data,
``pivot_table`` supports aggregations
``pivot_table`` supports aggregations.
- The reverse of ``pivot`` (long to wide format) is ``melt`` (wide to
long format)
long format).

.. raw:: html

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
<p class="card-text">

For this tutorial, air quality data about :math:`NO_2` is used, made available by
`openaq <https://openaq.org>`__ and downloaded using the
`OpenAQ <https://openaq.org>`__ and downloaded using the
`py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.

The ``air_quality_no2_long.csv`` data set provides :math:`NO_2`
Expand Down Expand Up @@ -59,7 +59,7 @@ Westminster* in respectively Paris, Antwerp and London.

For this tutorial, air quality data about Particulate
matter less than 2.5 micrometers is used, made available by
`openaq <https://openaq.org>`__ and downloaded using the
`OpenAQ <https://openaq.org>`__ and downloaded using the
`py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.

The ``air_quality_pm25_long.csv`` data set provides :math:`PM_{25}`
Expand Down Expand Up @@ -102,15 +102,15 @@ Concatenating objects
<ul class="task-bullet">
<li>

I want to combine the measurements of :math:`NO_2` and :math:`PM_{25}`, two tables with a similar structure, in a single table
I want to combine the measurements of :math:`NO_2` and :math:`PM_{25}`, two tables with a similar structure, in a single table.

.. ipython:: python

air_quality = pd.concat([air_quality_pm25, air_quality_no2], axis=0)
air_quality.head()

The :func:`~pandas.concat` function performs concatenation operations of multiple
tables along one of the axis (row-wise or column-wise).
tables along one of the axes (row-wise or column-wise).

.. raw:: html

Expand Down Expand Up @@ -149,16 +149,13 @@ origin of the table (either ``no2`` from table ``air_quality_no2`` or

In this specific example, the ``parameter`` column provided by the data
ensures that each of the original tables can be identified. This is not
always the case. the ``concat`` function provides a convenient solution
always the case. The ``concat`` function provides a convenient solution
with the ``keys`` argument, adding an additional (hierarchical) row
index. For example:

.. ipython:: python

air_quality_ = pd.concat([air_quality_pm25, air_quality_no2], keys=["PM25", "NO2"])

.. ipython:: python

air_quality_.head()

.. note::
Expand Down Expand Up @@ -254,7 +251,7 @@ supports multiple join options similar to database-style operations.
<ul class="task-bullet">
<li>

Add the parameter full description and name, provided by the parameters metadata table, to the measurements table
Add the parameters' full description and name, provided by the parameters metadata table, to the measurements table.

.. warning::
The air quality parameters metadata are stored in a data file
Expand Down
4 changes: 2 additions & 2 deletions doc/source/getting_started/intro_tutorials/09_timeseries.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

For this tutorial, air quality data about :math:`NO_2` and Particulate
matter less than 2.5 micrometers is used, made available by
`openaq <https://openaq.org>`__ and downloaded using the
`OpenAQ <https://openaq.org>`__ and downloaded using the
`py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
The ``air_quality_no2_long.csv"`` data set provides :math:`NO_2` values
for the measurement stations *FR04014*, *BETR801* and *London
Expand Down Expand Up @@ -209,7 +209,7 @@ Plot the typical :math:`NO_2` pattern during the day of our time series of all s
air_quality.groupby(air_quality["datetime"].dt.hour)["value"].mean().plot(
kind='bar', rot=0, ax=axs
)
plt.xlabel("Hour of the day"); # custom x label using matplotlib
plt.xlabel("Hour of the day"); # custom x label using Matplotlib
@savefig 09_bar_chart.png
plt.ylabel("$NO_2 (µg/m^3)$");

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
<p class="card-text">

For this tutorial, air quality data about :math:`NO_2` is used, made
available by `openaq <https://openaq.org>`__ and using the
available by `OpenAQ <https://openaq.org>`__ and using the
`py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
The ``air_quality_no2.csv`` data set provides :math:`NO_2` values for
the measurement stations *FR04014*, *BETR801* and *London Westminster*
Expand Down
Loading