37
37
38
38
This tutorial uses air quality data about :math: `NO_2 ` and Particulate matter less than 2.5
39
39
micrometers, made available by
40
- `openaq <https://openaq.org >`__ and using the
40
+ `OpenAQ <https://openaq.org >`__ and using the
41
41
`py-openaq <http://dhhagan.github.io/py-openaq/index.html >`__ package.
42
42
The ``air_quality_long.csv `` data set provides :math: `NO_2 ` and
43
43
:math: `PM_{25 }` values for the measurement stations *FR04014 *, *BETR801 *
@@ -117,7 +117,7 @@ I want to sort the Titanic data according to the cabin class and age in descendi
117
117
118
118
titanic.sort_values(by = [' Pclass' , ' Age' ], ascending = False ).head()
119
119
120
- With :meth: `Series .sort_values `, the rows in the table are sorted according to the
120
+ With :meth: `DataFrame .sort_values `, the rows in the table are sorted according to the
121
121
defined column(s). The index will follow the row order.
122
122
123
123
.. raw :: html
@@ -130,7 +130,7 @@ defined column(s). The index will follow the row order.
130
130
<div class =" d-flex flex-row gs-torefguide" >
131
131
<span class =" badge badge-info" >To user guide</span >
132
132
133
- More details about sorting of tables is provided in the using guide section on :ref: `sorting data <basics.sorting >`.
133
+ More details about sorting of tables is provided in the user guide section on :ref: `sorting data <basics.sorting >`.
134
134
135
135
.. raw :: html
136
136
@@ -142,7 +142,7 @@ Long to wide table format
142
142
Let’s use a small subset of the air quality data set. We focus on
143
143
:math: `NO_2 ` data and only use the first two measurements of each
144
144
location (i.e. the head of each group). The subset of data will be
145
- called ``no2_subset ``
145
+ called ``no2_subset ``.
146
146
147
147
.. ipython :: python
148
148
@@ -163,7 +163,7 @@ called ``no2_subset``
163
163
<ul class =" task-bullet" >
164
164
<li >
165
165
166
- I want the values for the three stations as separate columns next to each other
166
+ I want the values for the three stations as separate columns next to each other.
167
167
168
168
.. ipython :: python
169
169
@@ -177,7 +177,7 @@ for each index/column combination is required.
177
177
</li >
178
178
</ul >
179
179
180
- As pandas support plotting of multiple columns (see :ref: `plotting tutorial <10min_tut_04_plotting >`) out of the box, the conversion from
180
+ As pandas supports plotting of multiple columns (see :ref: `plotting tutorial <10min_tut_04_plotting >`) out of the box, the conversion from
181
181
*long * to *wide * table format enables the plotting of the different time
182
182
series at the same time:
183
183
@@ -216,7 +216,7 @@ Pivot table
216
216
<ul class =" task-bullet" >
217
217
<li >
218
218
219
- I want the mean concentrations for :math: `NO_2 ` and :math: `PM_{2.5 }` in each of the stations in table form
219
+ I want the mean concentrations for :math: `NO_2 ` and :math: `PM_{2.5 }` in each of the stations in table form.
220
220
221
221
.. ipython :: python
222
222
@@ -226,7 +226,7 @@ I want the mean concentrations for :math:`NO_2` and :math:`PM_{2.5}` in each of
226
226
227
227
In the case of :meth: `~DataFrame.pivot `, the data is only rearranged. When multiple
228
228
values need to be aggregated (in this specific case, the values on
229
- different time steps) :meth: `~DataFrame.pivot_table ` can be used, providing an
229
+ different time steps), :meth: `~DataFrame.pivot_table ` can be used, providing an
230
230
aggregation function (e.g. mean) on how to combine these values.
231
231
232
232
.. raw :: html
@@ -235,8 +235,8 @@ aggregation function (e.g. mean) on how to combine these values.
235
235
</ul >
236
236
237
237
Pivot table is a well known concept in spreadsheet software. When
238
- interested in summary columns for each variable separately as well, put
239
- the ``margin `` parameter to ``True ``:
238
+ interested in the row/column margins (subtotals) for each variable, set
239
+ the ``margins `` parameter to ``True ``:
240
240
241
241
.. ipython :: python
242
242
@@ -283,7 +283,7 @@ Wide to long format
283
283
~~~~~~~~~~~~~~~~~~~
284
284
285
285
Starting again from the wide format table created in the previous
286
- section:
286
+ section, we add a new index to the `` DataFrame `` with :meth: ` ~DataFrame.reset_index `.
287
287
288
288
.. ipython :: python
289
289
@@ -298,7 +298,7 @@ section:
298
298
<ul class =" task-bullet" >
299
299
<li >
300
300
301
- I want to collect all air quality :math: `NO_2 ` measurements in a single column (long format)
301
+ I want to collect all air quality :math: `NO_2 ` measurements in a single column (long format).
302
302
303
303
.. ipython :: python
304
304
@@ -319,7 +319,7 @@ will *melt* all columns NOT mentioned in ``id_vars`` together into two
319
319
columns: A column with the column header names and a column with the
320
320
values itself. The latter column gets by default the name ``value ``.
321
321
322
- The :func: `pandas.melt ` method can be defined in more detail:
322
+ The parameters passed to :func: `pandas.melt ` can be defined in more detail:
323
323
324
324
.. ipython :: python
325
325
@@ -331,9 +331,9 @@ The :func:`pandas.melt` method can be defined in more detail:
331
331
)
332
332
no_2.head()
333
333
334
- The result in the same, but in more detail defined :
334
+ The additional parameters have the following effects :
335
335
336
- - ``value_vars `` defines explicitly which columns to *melt * together
336
+ - ``value_vars `` defines which columns to *melt * together
337
337
- ``value_name `` provides a custom column name for the values column
338
338
instead of the default column name ``value ``
339
339
- ``var_name `` provides a custom column name for the column collecting
@@ -360,11 +360,11 @@ Conversion from wide to long format with :func:`pandas.melt` is explained in the
360
360
<div class =" shadow gs-callout gs-callout-remember" >
361
361
<h4 >REMEMBER</h4 >
362
362
363
- - Sorting by one or more columns is supported by ``sort_values ``
363
+ - Sorting by one or more columns is supported by ``sort_values ``.
364
364
- The ``pivot `` function is purely restructuring of the data,
365
- ``pivot_table `` supports aggregations
365
+ ``pivot_table `` supports aggregations.
366
366
- The reverse of ``pivot `` (long to wide format) is ``melt `` (wide to
367
- long format)
367
+ long format).
368
368
369
369
.. raw :: html
370
370
0 commit comments