Skip to content

Commit 06f8568

Browse files
committed
wip
1 parent 8ed92ef commit 06f8568

File tree

2 files changed

+83
-0
lines changed

2 files changed

+83
-0
lines changed

doc/source/index.rst.template

+1
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ See the package overview for more detail about what's in the library.
139139
timeseries
140140
timedeltas
141141
categorical
142+
integer_na
142143
visualization
143144
style
144145
io

doc/source/integer_na.rst

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
.. currentmodule:: pandas
2+
3+
.. ipython:: python
4+
:suppress:
5+
6+
import numpy as np
7+
import pandas as pd
8+
9+
.. _integer_na:
10+
11+
********************************
12+
Integer Data with Missing Values
13+
********************************
14+
15+
.. versionadded:: 0.24.0
16+
17+
In :ref:`missing_data`, we say that pandas primarily uses ``NaN`` to represent
18+
missing data. Because ``NaN`` is a float, this forces an array of integers with
19+
any missing values to become floating point. In some cases, this may not matter
20+
much. But if your integer column is, say, and identifier, casting to float can
21+
lead to bad outcomes.
22+
23+
Pandas can represent integer data with missing values with the
24+
:class:`arrays.IntegerArray` array. This is an :ref:`extension types <extending.extension-types>`
25+
implemented within pandas. It is not the default dtype and will not be inferred,
26+
you must explicitly create an :class:`api.extensions.IntegerArray` using :func:`integer_array`.
27+
28+
.. ipython:: python
29+
30+
arr = integer_array([1, 2, np.nan])
31+
arr
32+
33+
This array can be stored in a :class:`DataFrame` or :class:`Series` like any
34+
NumPy array.
35+
36+
.. ipython:: python
37+
38+
pd.Series(arr)
39+
40+
Alternatively, you can instruct pandas to treat an array-like as an
41+
:class:`api.extensions.IntegerArray` by specifying a dtype with a capital "I".
42+
43+
.. ipython:: python
44+
45+
s = pd.Series([1, 2, np.nan], dtype="Int64")
46+
s
47+
48+
Note that by default (if you don't specify `dtype`), NumPy is used, and you'll end
49+
up with a ``float64`` dtype Series:
50+
51+
.. ipython:: python
52+
53+
pd.Series([1, 2, np.nan])
54+
55+
56+
Operations involving an integer array will behave similar to NumPy arrays.
57+
Missing values will be propagated, and and the data will be coerced to another
58+
dtype if needed.
59+
60+
.. ipython:: python
61+
62+
# arithmetic
63+
s + 1
64+
65+
# comparison
66+
s == 1
67+
68+
# indexing
69+
s.iloc[1:3]
70+
71+
# operate with other dtypes
72+
s + s.iloc[1:3].astype('Int8')
73+
74+
# coerce when needed
75+
s + 0.01
76+
77+
Reduction and groupby operations such as 'sum' work as well.
78+
79+
.. ipython:: python
80+
81+
df.sum()
82+
df.groupby('B').A.sum()

0 commit comments

Comments
 (0)