Skip to content

Commit 98298aa

Browse files
authored
DOC: update tutorial docs (#46249)
1 parent 1400649 commit 98298aa

9 files changed

+57
-61
lines changed

doc/source/getting_started/intro_tutorials/03_subset_data.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -358,9 +358,9 @@ See the user guide section on :ref:`different choices for indexing <indexing.cho
358358
of column/row labels, a slice of labels, a conditional expression or
359359
a colon.
360360
- Select specific rows and/or columns using ``loc`` when using the row
361-
and column names
361+
and column names.
362362
- Select specific rows and/or columns using ``iloc`` when using the
363-
positions in the table
363+
positions in the table.
364364
- You can assign new values to a selection based on ``loc``/``iloc``.
365365

366366
.. raw:: html

doc/source/getting_started/intro_tutorials/04_plotting.rst

+8-8
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ method. Hence, the :meth:`~DataFrame.plot` method works on both ``Series`` and
8888
<ul class="task-bullet">
8989
<li>
9090

91-
I want to visually compare the :math:`N0_2` values measured in London versus Paris.
91+
I want to visually compare the :math:`NO_2` values measured in London versus Paris.
9292

9393
.. ipython:: python
9494
@@ -197,26 +197,26 @@ I want to further customize, extend or save the resulting plot.
197197
</ul>
198198

199199
Each of the plot objects created by pandas is a
200-
`matplotlib <https://matplotlib.org/>`__ object. As Matplotlib provides
200+
`Matplotlib <https://matplotlib.org/>`__ object. As Matplotlib provides
201201
plenty of options to customize plots, making the link between pandas and
202-
Matplotlib explicit enables all the power of matplotlib to the plot.
202+
Matplotlib explicit enables all the power of Matplotlib to the plot.
203203
This strategy is applied in the previous example:
204204

205205
::
206206

207-
fig, axs = plt.subplots(figsize=(12, 4)) # Create an empty matplotlib Figure and Axes
207+
fig, axs = plt.subplots(figsize=(12, 4)) # Create an empty Matplotlib Figure and Axes
208208
air_quality.plot.area(ax=axs) # Use pandas to put the area plot on the prepared Figure/Axes
209-
axs.set_ylabel("NO$_2$ concentration") # Do any matplotlib customization you like
210-
fig.savefig("no2_concentrations.png") # Save the Figure/Axes using the existing matplotlib method.
209+
axs.set_ylabel("NO$_2$ concentration") # Do any Matplotlib customization you like
210+
fig.savefig("no2_concentrations.png") # Save the Figure/Axes using the existing Matplotlib method.
211211

212212
.. raw:: html
213213

214214
<div class="shadow gs-callout gs-callout-remember">
215215
<h4>REMEMBER</h4>
216216

217-
- The ``.plot.*`` methods are applicable on both Series and DataFrames
217+
- The ``.plot.*`` methods are applicable on both Series and DataFrames.
218218
- By default, each of the columns is plotted as a different element
219-
(line, boxplot,…)
219+
(line, boxplot,…).
220220
- Any plot created by pandas is a Matplotlib object.
221221

222222
.. raw:: html

doc/source/getting_started/intro_tutorials/05_add_columns.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ How to create new columns derived from existing columns?
4141
<ul class="task-bullet">
4242
<li>
4343

44-
I want to express the :math:`NO_2` concentration of the station in London in mg/m\ :math:`^3`
44+
I want to express the :math:`NO_2` concentration of the station in London in mg/m\ :math:`^3`.
4545

4646
(*If we assume temperature of 25 degrees Celsius and pressure of 1013
4747
hPa, the conversion factor is 1.882*)
@@ -60,7 +60,7 @@ at the left side of the assignment.
6060
</ul>
6161

6262
.. note::
63-
The calculation of the values is done **element_wise**. This
63+
The calculation of the values is done **element-wise**. This
6464
means all values in the given column are multiplied by the value 1.882
6565
at once. You do not need to use a loop to iterate each of the rows!
6666

@@ -72,7 +72,7 @@ at the left side of the assignment.
7272
<ul class="task-bullet">
7373
<li>
7474

75-
I want to check the ratio of the values in Paris versus Antwerp and save the result in a new column
75+
I want to check the ratio of the values in Paris versus Antwerp and save the result in a new column.
7676

7777
.. ipython:: python
7878
@@ -89,8 +89,8 @@ values in each row*.
8989
</li>
9090
</ul>
9191

92-
Also other mathematical operators (``+``, ``-``, ``\*``, ``/``) or
93-
logical operators (``<``, ``>``, ``=``,…) work element wise. The latter was already
92+
Also other mathematical operators (``+``, ``-``, ``*``, ``/``,…) or
93+
logical operators (``<``, ``>``, ``==``,…) work element-wise. The latter was already
9494
used in the :ref:`subset data tutorial <10min_tut_03_subset>` to filter
9595
rows of a table using a conditional expression.
9696

@@ -101,7 +101,7 @@ If you need more advanced logic, you can use arbitrary Python code via :meth:`~D
101101
<ul class="task-bullet">
102102
<li>
103103

104-
I want to rename the data columns to the corresponding station identifiers used by openAQ
104+
I want to rename the data columns to the corresponding station identifiers used by `OpenAQ <https://openaq.org/>`__.
105105

106106
.. ipython:: python
107107

doc/source/getting_started/intro_tutorials/06_calculate_statistics.rst

+7-7
Original file line numberDiff line numberDiff line change
@@ -74,15 +74,15 @@ What is the median age and ticket fare price of the Titanic passengers?
7474
titanic[["Age", "Fare"]].median()
7575
7676
The statistic applied to multiple columns of a ``DataFrame`` (the selection of two columns
77-
return a ``DataFrame``, see the :ref:`subset data tutorial <10min_tut_03_subset>`) is calculated for each numeric column.
77+
returns a ``DataFrame``, see the :ref:`subset data tutorial <10min_tut_03_subset>`) is calculated for each numeric column.
7878

7979
.. raw:: html
8080

8181
</li>
8282
</ul>
8383

8484
The aggregating statistic can be calculated for multiple columns at the
85-
same time. Remember the ``describe`` function from :ref:`first tutorial <10min_tut_01_tableoriented>`?
85+
same time. Remember the ``describe`` function from the :ref:`first tutorial <10min_tut_01_tableoriented>`?
8686

8787
.. ipython:: python
8888
@@ -161,7 +161,7 @@ columns:
161161
titanic.groupby("Sex").mean()
162162
163163
It does not make much sense to get the average value of the ``Pclass``.
164-
if we are only interested in the average age for each gender, the
164+
If we are only interested in the average age for each gender, the
165165
selection of columns (rectangular brackets ``[]`` as usual) is supported
166166
on the grouped data as well:
167167

@@ -254,7 +254,7 @@ within each group:
254254
<div class="d-flex flex-row gs-torefguide">
255255
<span class="badge badge-info">To user guide</span>
256256

257-
The user guide has a dedicated section on ``value_counts`` , see page on :ref:`discretization <basics.discretization>`.
257+
The user guide has a dedicated section on ``value_counts`` , see the page on :ref:`discretization <basics.discretization>`.
258258

259259
.. raw:: html
260260

@@ -265,10 +265,10 @@ The user guide has a dedicated section on ``value_counts`` , see page on :ref:`d
265265
<div class="shadow gs-callout gs-callout-remember">
266266
<h4>REMEMBER</h4>
267267

268-
- Aggregation statistics can be calculated on entire columns or rows
269-
- ``groupby`` provides the power of the *split-apply-combine* pattern
268+
- Aggregation statistics can be calculated on entire columns or rows.
269+
- ``groupby`` provides the power of the *split-apply-combine* pattern.
270270
- ``value_counts`` is a convenient shortcut to count the number of
271-
entries in each category of a variable
271+
entries in each category of a variable.
272272

273273
.. raw:: html
274274

doc/source/getting_started/intro_tutorials/07_reshape_table_layout.rst

+18-18
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737

3838
This tutorial uses air quality data about :math:`NO_2` and Particulate matter less than 2.5
3939
micrometers, made available by
40-
`openaq <https://openaq.org>`__ and using the
40+
`OpenAQ <https://openaq.org>`__ and using the
4141
`py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
4242
The ``air_quality_long.csv`` data set provides :math:`NO_2` and
4343
:math:`PM_{25}` values for the measurement stations *FR04014*, *BETR801*
@@ -117,7 +117,7 @@ I want to sort the Titanic data according to the cabin class and age in descendi
117117
118118
titanic.sort_values(by=['Pclass', 'Age'], ascending=False).head()
119119
120-
With :meth:`Series.sort_values`, the rows in the table are sorted according to the
120+
With :meth:`DataFrame.sort_values`, the rows in the table are sorted according to the
121121
defined column(s). The index will follow the row order.
122122

123123
.. raw:: html
@@ -130,7 +130,7 @@ defined column(s). The index will follow the row order.
130130
<div class="d-flex flex-row gs-torefguide">
131131
<span class="badge badge-info">To user guide</span>
132132

133-
More details about sorting of tables is provided in the using guide section on :ref:`sorting data <basics.sorting>`.
133+
More details about sorting of tables is provided in the user guide section on :ref:`sorting data <basics.sorting>`.
134134

135135
.. raw:: html
136136

@@ -142,7 +142,7 @@ Long to wide table format
142142
Let’s use a small subset of the air quality data set. We focus on
143143
:math:`NO_2` data and only use the first two measurements of each
144144
location (i.e. the head of each group). The subset of data will be
145-
called ``no2_subset``
145+
called ``no2_subset``.
146146

147147
.. ipython:: python
148148
@@ -163,7 +163,7 @@ called ``no2_subset``
163163
<ul class="task-bullet">
164164
<li>
165165

166-
I want the values for the three stations as separate columns next to each other
166+
I want the values for the three stations as separate columns next to each other.
167167

168168
.. ipython:: python
169169
@@ -177,7 +177,7 @@ for each index/column combination is required.
177177
</li>
178178
</ul>
179179

180-
As pandas support plotting of multiple columns (see :ref:`plotting tutorial <10min_tut_04_plotting>`) out of the box, the conversion from
180+
As pandas supports plotting of multiple columns (see :ref:`plotting tutorial <10min_tut_04_plotting>`) out of the box, the conversion from
181181
*long* to *wide* table format enables the plotting of the different time
182182
series at the same time:
183183

@@ -216,7 +216,7 @@ Pivot table
216216
<ul class="task-bullet">
217217
<li>
218218

219-
I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of the stations in table form
219+
I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of the stations in table form.
220220

221221
.. ipython:: python
222222
@@ -226,7 +226,7 @@ I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of
226226
227227
In the case of :meth:`~DataFrame.pivot`, the data is only rearranged. When multiple
228228
values need to be aggregated (in this specific case, the values on
229-
different time steps) :meth:`~DataFrame.pivot_table` can be used, providing an
229+
different time steps), :meth:`~DataFrame.pivot_table` can be used, providing an
230230
aggregation function (e.g. mean) on how to combine these values.
231231

232232
.. raw:: html
@@ -235,8 +235,8 @@ aggregation function (e.g. mean) on how to combine these values.
235235
</ul>
236236

237237
Pivot table is a well known concept in spreadsheet software. When
238-
interested in summary columns for each variable separately as well, put
239-
the ``margin`` parameter to ``True``:
238+
interested in the row/column margins (subtotals) for each variable, set
239+
the ``margins`` parameter to ``True``:
240240

241241
.. ipython:: python
242242
@@ -283,7 +283,7 @@ Wide to long format
283283
~~~~~~~~~~~~~~~~~~~
284284

285285
Starting again from the wide format table created in the previous
286-
section:
286+
section, we add a new index to the ``DataFrame`` with :meth:`~DataFrame.reset_index`.
287287

288288
.. ipython:: python
289289
@@ -298,7 +298,7 @@ section:
298298
<ul class="task-bullet">
299299
<li>
300300

301-
I want to collect all air quality :math:`NO_2` measurements in a single column (long format)
301+
I want to collect all air quality :math:`NO_2` measurements in a single column (long format).
302302

303303
.. ipython:: python
304304
@@ -319,7 +319,7 @@ will *melt* all columns NOT mentioned in ``id_vars`` together into two
319319
columns: A column with the column header names and a column with the
320320
values itself. The latter column gets by default the name ``value``.
321321

322-
The :func:`pandas.melt` method can be defined in more detail:
322+
The parameters passed to :func:`pandas.melt` can be defined in more detail:
323323

324324
.. ipython:: python
325325
@@ -331,9 +331,9 @@ The :func:`pandas.melt` method can be defined in more detail:
331331
)
332332
no_2.head()
333333
334-
The result in the same, but in more detail defined:
334+
The additional parameters have the following effects:
335335

336-
- ``value_vars`` defines explicitly which columns to *melt* together
336+
- ``value_vars`` defines which columns to *melt* together
337337
- ``value_name`` provides a custom column name for the values column
338338
instead of the default column name ``value``
339339
- ``var_name`` provides a custom column name for the column collecting
@@ -360,11 +360,11 @@ Conversion from wide to long format with :func:`pandas.melt` is explained in the
360360
<div class="shadow gs-callout gs-callout-remember">
361361
<h4>REMEMBER</h4>
362362

363-
- Sorting by one or more columns is supported by ``sort_values``
363+
- Sorting by one or more columns is supported by ``sort_values``.
364364
- The ``pivot`` function is purely restructuring of the data,
365-
``pivot_table`` supports aggregations
365+
``pivot_table`` supports aggregations.
366366
- The reverse of ``pivot`` (long to wide format) is ``melt`` (wide to
367-
long format)
367+
long format).
368368

369369
.. raw:: html
370370

doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst

+6-9
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@
2424
<p class="card-text">
2525

2626
For this tutorial, air quality data about :math:`NO_2` is used, made available by
27-
`openaq <https://openaq.org>`__ and downloaded using the
27+
`OpenAQ <https://openaq.org>`__ and downloaded using the
2828
`py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
2929

3030
The ``air_quality_no2_long.csv`` data set provides :math:`NO_2`
@@ -59,7 +59,7 @@ Westminster* in respectively Paris, Antwerp and London.
5959

6060
For this tutorial, air quality data about Particulate
6161
matter less than 2.5 micrometers is used, made available by
62-
`openaq <https://openaq.org>`__ and downloaded using the
62+
`OpenAQ <https://openaq.org>`__ and downloaded using the
6363
`py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
6464

6565
The ``air_quality_pm25_long.csv`` data set provides :math:`PM_{25}`
@@ -102,15 +102,15 @@ Concatenating objects
102102
<ul class="task-bullet">
103103
<li>
104104

105-
I want to combine the measurements of :math:`NO_2` and :math:`PM_{25}`, two tables with a similar structure, in a single table
105+
I want to combine the measurements of :math:`NO_2` and :math:`PM_{25}`, two tables with a similar structure, in a single table.
106106

107107
.. ipython:: python
108108
109109
air_quality = pd.concat([air_quality_pm25, air_quality_no2], axis=0)
110110
air_quality.head()
111111
112112
The :func:`~pandas.concat` function performs concatenation operations of multiple
113-
tables along one of the axis (row-wise or column-wise).
113+
tables along one of the axes (row-wise or column-wise).
114114

115115
.. raw:: html
116116

@@ -149,16 +149,13 @@ origin of the table (either ``no2`` from table ``air_quality_no2`` or
149149
150150
In this specific example, the ``parameter`` column provided by the data
151151
ensures that each of the original tables can be identified. This is not
152-
always the case. the ``concat`` function provides a convenient solution
152+
always the case. The ``concat`` function provides a convenient solution
153153
with the ``keys`` argument, adding an additional (hierarchical) row
154154
index. For example:
155155

156156
.. ipython:: python
157157
158158
air_quality_ = pd.concat([air_quality_pm25, air_quality_no2], keys=["PM25", "NO2"])
159-
160-
.. ipython:: python
161-
162159
air_quality_.head()
163160
164161
.. note::
@@ -254,7 +251,7 @@ supports multiple join options similar to database-style operations.
254251
<ul class="task-bullet">
255252
<li>
256253

257-
Add the parameter full description and name, provided by the parameters metadata table, to the measurements table
254+
Add the parameters' full description and name, provided by the parameters metadata table, to the measurements table.
258255

259256
.. warning::
260257
The air quality parameters metadata are stored in a data file

doc/source/getting_started/intro_tutorials/09_timeseries.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626

2727
For this tutorial, air quality data about :math:`NO_2` and Particulate
2828
matter less than 2.5 micrometers is used, made available by
29-
`openaq <https://openaq.org>`__ and downloaded using the
29+
`OpenAQ <https://openaq.org>`__ and downloaded using the
3030
`py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
3131
The ``air_quality_no2_long.csv"`` data set provides :math:`NO_2` values
3232
for the measurement stations *FR04014*, *BETR801* and *London
@@ -209,7 +209,7 @@ Plot the typical :math:`NO_2` pattern during the day of our time series of all s
209209
air_quality.groupby(air_quality["datetime"].dt.hour)["value"].mean().plot(
210210
kind='bar', rot=0, ax=axs
211211
)
212-
plt.xlabel("Hour of the day"); # custom x label using matplotlib
212+
plt.xlabel("Hour of the day"); # custom x label using Matplotlib
213213
@savefig 09_bar_chart.png
214214
plt.ylabel("$NO_2 (µg/m^3)$");
215215

doc/source/getting_started/intro_tutorials/includes/air_quality_no2.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
<p class="card-text">
99

1010
For this tutorial, air quality data about :math:`NO_2` is used, made
11-
available by `openaq <https://openaq.org>`__ and using the
11+
available by `OpenAQ <https://openaq.org>`__ and using the
1212
`py-openaq <http://dhhagan.github.io/py-openaq/index.html>`__ package.
1313
The ``air_quality_no2.csv`` data set provides :math:`NO_2` values for
1414
the measurement stations *FR04014*, *BETR801* and *London Westminster*

0 commit comments

Comments
 (0)