|
| 1 | +.. currentmodule:: pandas |
| 2 | + |
| 3 | +.. ipython:: python |
| 4 | + :suppress: |
| 5 | +
|
| 6 | + import numpy as np |
| 7 | + import pandas as pd |
| 8 | +
|
| 9 | + .. _integer_na: |
| 10 | +
|
| 11 | +******************************** |
| 12 | +Integer Data with Missing Values |
| 13 | +******************************** |
| 14 | + |
| 15 | +.. versionadded:: 0.24.0 |
| 16 | + |
| 17 | +In :ref:`missing_data`, we say that pandas primarily uses ``NaN`` to represent |
| 18 | +missing data. Because ``NaN`` is a float, this forces an array of integers with |
| 19 | +any missing values to become floating point. In some cases, this may not matter |
| 20 | +much. But if your integer column is, say, and identifier, casting to float can |
| 21 | +lead to bad outcomes. |
| 22 | + |
| 23 | +Pandas can represent integer data with missing values with the |
| 24 | +:class:`arrays.IntegerArray` array. This is an :ref:`extension types <extending.extension-types>` |
| 25 | +implemented within pandas. It is not the default dtype and will not be inferred, |
| 26 | +you must explicitly create an :class:`api.extensions.IntegerArray` using :func:`integer_array`. |
| 27 | + |
| 28 | +.. ipython:: python |
| 29 | +
|
| 30 | + arr = integer_array([1, 2, np.nan]) |
| 31 | + arr |
| 32 | +
|
| 33 | +This array can be stored in a :class:`DataFrame` or :class:`Series` like any |
| 34 | +NumPy array. |
| 35 | + |
| 36 | +.. ipython:: python |
| 37 | +
|
| 38 | + pd.Series(arr) |
| 39 | +
|
| 40 | +Alternatively, you can instruct pandas to treat an array-like as an |
| 41 | +:class:`api.extensions.IntegerArray` by specifying a dtype with a capital "I". |
| 42 | + |
| 43 | +.. ipython:: python |
| 44 | +
|
| 45 | + s = pd.Series([1, 2, np.nan], dtype="Int64") |
| 46 | + s |
| 47 | +
|
| 48 | +Note that by default (if you don't specify `dtype`), NumPy is used, and you'll end |
| 49 | +up with a ``float64`` dtype Series: |
| 50 | + |
| 51 | +.. ipython:: python |
| 52 | +
|
| 53 | + pd.Series([1, 2, np.nan]) |
| 54 | +
|
| 55 | +
|
| 56 | +Operations involving an integer array will behave similar to NumPy arrays. |
| 57 | +Missing values will be propagated, and and the data will be coerced to another |
| 58 | +dtype if needed. |
| 59 | + |
| 60 | +.. ipython:: python |
| 61 | +
|
| 62 | + # arithmetic |
| 63 | + s + 1 |
| 64 | +
|
| 65 | + # comparison |
| 66 | + s == 1 |
| 67 | +
|
| 68 | + # indexing |
| 69 | + s.iloc[1:3] |
| 70 | +
|
| 71 | + # operate with other dtypes |
| 72 | + s + s.iloc[1:3].astype('Int8') |
| 73 | +
|
| 74 | + # coerce when needed |
| 75 | + s + 0.01 |
| 76 | +
|
| 77 | +Reduction and groupby operations such as 'sum' work as well. |
| 78 | + |
| 79 | +.. ipython:: python |
| 80 | +
|
| 81 | + df.sum() |
| 82 | + df.groupby('B').A.sum() |
0 commit comments