Skip to content

Commit 6be9787

Browse files
committed
added checks for standardized pandas
1 parent 0bfb8cb commit 6be9787

19 files changed

+3590
-0
lines changed

ci/code_checks.sh

+7
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,13 @@ if [[ -z "$CHECK" || "$CHECK" == "lint" ]]; then
8080
flake8-rst doc/source --filename=*.rst --format="$FLAKE8_FORMAT"
8181
RET=$(($RET + $?)) ; echo $MSG "DONE"
8282

83+
# Check if pandas is referenced as pandas, not *pandas* or Pandas
84+
MSG='Checking if pandas reference is standardized or not' ; echo $MSG
85+
grep -nr '*pandas*|Pandas' doc/*
86+
grep -nr '*pandas*|Pandas' web/*
87+
RET=$(($RET + $?)) ; echo $MSG "DONE"
88+
89+
8390
# Check that cython casting is of the form `<type>obj` as opposed to `<type> obj`;
8491
# it doesn't make a difference, but we want to be internally consistent.
8592
# Note: this grep pattern is (intended to be) equivalent to the python
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,218 @@
1+
.. _10min_tut_01_tableoriented:
2+
3+
{{ header }}
4+
5+
What kind of data does pandas handle?
6+
=====================================
7+
8+
.. raw:: html
9+
10+
<ul class="task-bullet">
11+
<li>
12+
13+
I want to start using pandas
14+
15+
.. ipython:: python
16+
17+
import pandas as pd
18+
19+
To load the pandas package and start working with it, import the
20+
package. The community agreed alias for pandas is ``pd``, so loading
21+
pandas as ``pd`` is assumed standard practice for all of the pandas
22+
documentation.
23+
24+
.. raw:: html
25+
26+
</li>
27+
</ul>
28+
29+
Pandas data table representation
30+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31+
32+
.. image:: ../../_static/schemas/01_table_dataframe.svg
33+
:align: center
34+
35+
.. raw:: html
36+
37+
<ul class="task-bullet">
38+
<li>
39+
40+
I want to store passenger data of the Titanic. For a number of passengers, I know the name (characters), age (integers) and sex (male/female) data.
41+
42+
.. ipython:: python
43+
44+
df = pd.DataFrame({
45+
"Name": ["Braund, Mr. Owen Harris",
46+
"Allen, Mr. William Henry",
47+
"Bonnell, Miss. Elizabeth"],
48+
"Age": [22, 35, 58],
49+
"Sex": ["male", "male", "female"]}
50+
)
51+
df
52+
53+
To manually store data in a table, create a ``DataFrame``. When using a Python dictionary of lists, the dictionary keys will be used as column headers and
54+
the values in each list as rows of the ``DataFrame``.
55+
56+
.. raw:: html
57+
58+
</li>
59+
</ul>
60+
61+
A :class:`DataFrame` is a 2-dimensional data structure that can store data of
62+
different types (including characters, integers, floating point values,
63+
categorical data and more) in columns. It is similar to a spreadsheet, a
64+
SQL table or the ``data.frame`` in R.
65+
66+
- The table has 3 columns, each of them with a column label. The column
67+
labels are respectively ``Name``, ``Age`` and ``Sex``.
68+
- The column ``Name`` consists of textual data with each value a
69+
string, the column ``Age`` are numbers and the column ``Sex`` is
70+
textual data.
71+
72+
In spreadsheet software, the table representation of our data would look
73+
very similar:
74+
75+
.. image:: ../../_static/schemas/01_table_spreadsheet.png
76+
:align: center
77+
78+
Each column in a ``DataFrame`` is a ``Series``
79+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
80+
81+
.. image:: ../../_static/schemas/01_table_series.svg
82+
:align: center
83+
84+
.. raw:: html
85+
86+
<ul class="task-bullet">
87+
<li>
88+
89+
I’m just interested in working with the data in the column ``Age``
90+
91+
.. ipython:: python
92+
93+
df["Age"]
94+
95+
When selecting a single column of a pandas :class:`DataFrame`, the result is
96+
a pandas :class:`Series`. To select the column, use the column label in
97+
between square brackets ``[]``.
98+
99+
.. raw:: html
100+
101+
</li>
102+
</ul>
103+
104+
.. note::
105+
If you are familiar to Python
106+
:ref:`dictionaries <python:tut-dictionaries>`, the selection of a
107+
single column is very similar to selection of dictionary values based on
108+
the key.
109+
110+
You can create a ``Series`` from scratch as well:
111+
112+
.. ipython:: python
113+
114+
ages = pd.Series([22, 35, 58], name="Age")
115+
ages
116+
117+
A pandas ``Series`` has no column labels, as it is just a single column
118+
of a ``DataFrame``. A Series does have row labels.
119+
120+
Do something with a DataFrame or Series
121+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
122+
123+
.. raw:: html
124+
125+
<ul class="task-bullet">
126+
<li>
127+
128+
I want to know the maximum Age of the passengers
129+
130+
We can do this on the ``DataFrame`` by selecting the ``Age`` column and
131+
applying ``max()``:
132+
133+
.. ipython:: python
134+
135+
df["Age"].max()
136+
137+
Or to the ``Series``:
138+
139+
.. ipython:: python
140+
141+
ages.max()
142+
143+
.. raw:: html
144+
145+
</li>
146+
</ul>
147+
148+
As illustrated by the ``max()`` method, you can *do* things with a
149+
``DataFrame`` or ``Series``. pandas provides a lot of functionalities,
150+
each of them a *method* you can apply to a ``DataFrame`` or ``Series``.
151+
As methods are functions, do not forget to use parentheses ``()``.
152+
153+
.. raw:: html
154+
155+
<ul class="task-bullet">
156+
<li>
157+
158+
I’m interested in some basic statistics of the numerical data of my data table
159+
160+
.. ipython:: python
161+
162+
df.describe()
163+
164+
The :func:`~DataFrame.describe` method provides a quick overview of the numerical data in
165+
a ``DataFrame``. As the ``Name`` and ``Sex`` columns are textual data,
166+
these are by default not taken into account by the :func:`~DataFrame.describe` method.
167+
168+
.. raw:: html
169+
170+
</li>
171+
</ul>
172+
173+
Many pandas operations return a ``DataFrame`` or a ``Series``. The
174+
:func:`~DataFrame.describe` method is an example of a pandas operation returning a
175+
pandas ``Series``.
176+
177+
.. raw:: html
178+
179+
<div class="d-flex flex-row gs-torefguide">
180+
<span class="badge badge-info">To user guide</span>
181+
182+
Check more options on ``describe`` in the user guide section about :ref:`aggregations with describe <basics.describe>`
183+
184+
.. raw:: html
185+
186+
</div>
187+
188+
.. note::
189+
This is just a starting point. Similar to spreadsheet
190+
software, pandas represents data as a table with columns and rows. Apart
191+
from the representation, also the data manipulations and calculations
192+
you would do in spreadsheet software are supported by pandas. Continue
193+
reading the next tutorials to get started!
194+
195+
.. raw:: html
196+
197+
<div class="shadow gs-callout gs-callout-remember">
198+
<h4>REMEMBER</h4>
199+
200+
- Import the package, aka ``import pandas as pd``
201+
- A table of data is stored as a pandas ``DataFrame``
202+
- Each column in a ``DataFrame`` is a ``Series``
203+
- You can do things by applying a method to a ``DataFrame`` or ``Series``
204+
205+
.. raw:: html
206+
207+
</div>
208+
209+
.. raw:: html
210+
211+
<div class="d-flex flex-row gs-torefguide">
212+
<span class="badge badge-info">To user guide</span>
213+
214+
A more extended explanation to ``DataFrame`` and ``Series`` is provided in the :ref:`introduction to data structures <dsintro>`.
215+
216+
.. raw:: html
217+
218+
</div>

0 commit comments

Comments
 (0)