|
| 1 | +.. _10min_tut_01_tableoriented: |
| 2 | + |
| 3 | +{{ header }} |
| 4 | + |
| 5 | +What kind of data does pandas handle? |
| 6 | +===================================== |
| 7 | + |
| 8 | +.. raw:: html |
| 9 | + |
| 10 | + <ul class="task-bullet"> |
| 11 | + <li> |
| 12 | + |
| 13 | +I want to start using pandas |
| 14 | + |
| 15 | +.. ipython:: python |
| 16 | +
|
| 17 | + import pandas as pd |
| 18 | +
|
| 19 | +To load the pandas package and start working with it, import the |
| 20 | +package. The community agreed alias for pandas is ``pd``, so loading |
| 21 | +pandas as ``pd`` is assumed standard practice for all of the pandas |
| 22 | +documentation. |
| 23 | + |
| 24 | +.. raw:: html |
| 25 | + |
| 26 | + </li> |
| 27 | + </ul> |
| 28 | + |
| 29 | +Pandas data table representation |
| 30 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 31 | + |
| 32 | +.. image:: ../../_static/schemas/01_table_dataframe.svg |
| 33 | + :align: center |
| 34 | + |
| 35 | +.. raw:: html |
| 36 | + |
| 37 | + <ul class="task-bullet"> |
| 38 | + <li> |
| 39 | + |
| 40 | +I want to store passenger data of the Titanic. For a number of passengers, I know the name (characters), age (integers) and sex (male/female) data. |
| 41 | + |
| 42 | +.. ipython:: python |
| 43 | +
|
| 44 | + df = pd.DataFrame({ |
| 45 | + "Name": ["Braund, Mr. Owen Harris", |
| 46 | + "Allen, Mr. William Henry", |
| 47 | + "Bonnell, Miss. Elizabeth"], |
| 48 | + "Age": [22, 35, 58], |
| 49 | + "Sex": ["male", "male", "female"]} |
| 50 | + ) |
| 51 | + df |
| 52 | +
|
| 53 | +To manually store data in a table, create a ``DataFrame``. When using a Python dictionary of lists, the dictionary keys will be used as column headers and |
| 54 | +the values in each list as rows of the ``DataFrame``. |
| 55 | + |
| 56 | +.. raw:: html |
| 57 | + |
| 58 | + </li> |
| 59 | + </ul> |
| 60 | + |
| 61 | +A :class:`DataFrame` is a 2-dimensional data structure that can store data of |
| 62 | +different types (including characters, integers, floating point values, |
| 63 | +categorical data and more) in columns. It is similar to a spreadsheet, a |
| 64 | +SQL table or the ``data.frame`` in R. |
| 65 | + |
| 66 | +- The table has 3 columns, each of them with a column label. The column |
| 67 | + labels are respectively ``Name``, ``Age`` and ``Sex``. |
| 68 | +- The column ``Name`` consists of textual data with each value a |
| 69 | + string, the column ``Age`` are numbers and the column ``Sex`` is |
| 70 | + textual data. |
| 71 | + |
| 72 | +In spreadsheet software, the table representation of our data would look |
| 73 | +very similar: |
| 74 | + |
| 75 | +.. image:: ../../_static/schemas/01_table_spreadsheet.png |
| 76 | + :align: center |
| 77 | + |
| 78 | +Each column in a ``DataFrame`` is a ``Series`` |
| 79 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 80 | + |
| 81 | +.. image:: ../../_static/schemas/01_table_series.svg |
| 82 | + :align: center |
| 83 | + |
| 84 | +.. raw:: html |
| 85 | + |
| 86 | + <ul class="task-bullet"> |
| 87 | + <li> |
| 88 | + |
| 89 | +I’m just interested in working with the data in the column ``Age`` |
| 90 | + |
| 91 | +.. ipython:: python |
| 92 | +
|
| 93 | + df["Age"] |
| 94 | +
|
| 95 | +When selecting a single column of a pandas :class:`DataFrame`, the result is |
| 96 | +a pandas :class:`Series`. To select the column, use the column label in |
| 97 | +between square brackets ``[]``. |
| 98 | + |
| 99 | +.. raw:: html |
| 100 | + |
| 101 | + </li> |
| 102 | + </ul> |
| 103 | + |
| 104 | +.. note:: |
| 105 | + If you are familiar to Python |
| 106 | + :ref:`dictionaries <python:tut-dictionaries>`, the selection of a |
| 107 | + single column is very similar to selection of dictionary values based on |
| 108 | + the key. |
| 109 | + |
| 110 | +You can create a ``Series`` from scratch as well: |
| 111 | + |
| 112 | +.. ipython:: python |
| 113 | +
|
| 114 | + ages = pd.Series([22, 35, 58], name="Age") |
| 115 | + ages |
| 116 | +
|
| 117 | +A pandas ``Series`` has no column labels, as it is just a single column |
| 118 | +of a ``DataFrame``. A Series does have row labels. |
| 119 | + |
| 120 | +Do something with a DataFrame or Series |
| 121 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 122 | + |
| 123 | +.. raw:: html |
| 124 | + |
| 125 | + <ul class="task-bullet"> |
| 126 | + <li> |
| 127 | + |
| 128 | +I want to know the maximum Age of the passengers |
| 129 | + |
| 130 | +We can do this on the ``DataFrame`` by selecting the ``Age`` column and |
| 131 | +applying ``max()``: |
| 132 | + |
| 133 | +.. ipython:: python |
| 134 | +
|
| 135 | + df["Age"].max() |
| 136 | +
|
| 137 | +Or to the ``Series``: |
| 138 | + |
| 139 | +.. ipython:: python |
| 140 | +
|
| 141 | + ages.max() |
| 142 | +
|
| 143 | +.. raw:: html |
| 144 | + |
| 145 | + </li> |
| 146 | + </ul> |
| 147 | + |
| 148 | +As illustrated by the ``max()`` method, you can *do* things with a |
| 149 | +``DataFrame`` or ``Series``. pandas provides a lot of functionalities, |
| 150 | +each of them a *method* you can apply to a ``DataFrame`` or ``Series``. |
| 151 | +As methods are functions, do not forget to use parentheses ``()``. |
| 152 | + |
| 153 | +.. raw:: html |
| 154 | + |
| 155 | + <ul class="task-bullet"> |
| 156 | + <li> |
| 157 | + |
| 158 | +I’m interested in some basic statistics of the numerical data of my data table |
| 159 | + |
| 160 | +.. ipython:: python |
| 161 | +
|
| 162 | + df.describe() |
| 163 | +
|
| 164 | +The :func:`~DataFrame.describe` method provides a quick overview of the numerical data in |
| 165 | +a ``DataFrame``. As the ``Name`` and ``Sex`` columns are textual data, |
| 166 | +these are by default not taken into account by the :func:`~DataFrame.describe` method. |
| 167 | + |
| 168 | +.. raw:: html |
| 169 | + |
| 170 | + </li> |
| 171 | + </ul> |
| 172 | + |
| 173 | +Many pandas operations return a ``DataFrame`` or a ``Series``. The |
| 174 | +:func:`~DataFrame.describe` method is an example of a pandas operation returning a |
| 175 | +pandas ``Series``. |
| 176 | + |
| 177 | +.. raw:: html |
| 178 | + |
| 179 | + <div class="d-flex flex-row gs-torefguide"> |
| 180 | + <span class="badge badge-info">To user guide</span> |
| 181 | + |
| 182 | +Check more options on ``describe`` in the user guide section about :ref:`aggregations with describe <basics.describe>` |
| 183 | + |
| 184 | +.. raw:: html |
| 185 | + |
| 186 | + </div> |
| 187 | + |
| 188 | +.. note:: |
| 189 | + This is just a starting point. Similar to spreadsheet |
| 190 | + software, pandas represents data as a table with columns and rows. Apart |
| 191 | + from the representation, also the data manipulations and calculations |
| 192 | + you would do in spreadsheet software are supported by pandas. Continue |
| 193 | + reading the next tutorials to get started! |
| 194 | + |
| 195 | +.. raw:: html |
| 196 | + |
| 197 | + <div class="shadow gs-callout gs-callout-remember"> |
| 198 | + <h4>REMEMBER</h4> |
| 199 | + |
| 200 | +- Import the package, aka ``import pandas as pd`` |
| 201 | +- A table of data is stored as a pandas ``DataFrame`` |
| 202 | +- Each column in a ``DataFrame`` is a ``Series`` |
| 203 | +- You can do things by applying a method to a ``DataFrame`` or ``Series`` |
| 204 | + |
| 205 | +.. raw:: html |
| 206 | + |
| 207 | + </div> |
| 208 | + |
| 209 | +.. raw:: html |
| 210 | + |
| 211 | + <div class="d-flex flex-row gs-torefguide"> |
| 212 | + <span class="badge badge-info">To user guide</span> |
| 213 | + |
| 214 | +A more extended explanation to ``DataFrame`` and ``Series`` is provided in the :ref:`introduction to data structures <dsintro>`. |
| 215 | + |
| 216 | +.. raw:: html |
| 217 | + |
| 218 | + </div> |
0 commit comments