@@ -12,10 +12,10 @@ pandas.
12
12
.. note ::
13
13
14
14
The choice of using ``NaN `` internally to denote missing data was largely
15
- for simplicity and performance reasons. It differs from the MaskedArray
16
- approach of, for example, :mod: ` scikits.timeseries `. We are hopeful that
17
- NumPy will soon be able to provide a native NA type solution (similar to R)
18
- performant enough to be used in pandas .
15
+ for simplicity and performance reasons.
16
+ Starting from pandas 1.0, some optional data types start experimenting
17
+ with a native `` NA `` scalar using a mask-based approach. See
18
+ :ref: ` here < missing_data.NA >` for more .
19
19
20
20
See the :ref: `cookbook<cookbook.missing_data> ` for some advanced strategies.
21
21
@@ -110,7 +110,7 @@ pandas objects provide compatibility between ``NaT`` and ``NaN``.
110
110
.. _missing.inserting :
111
111
112
112
Inserting missing data
113
- ----------------------
113
+ ~~~~~~~~~~~~~~~~~~~~~~
114
114
115
115
You can insert missing values by simply assigning to containers. The
116
116
actual missing value used will be chosen based on the dtype.
@@ -135,9 +135,10 @@ For object containers, pandas will use the value given:
135
135
s.loc[1 ] = np.nan
136
136
s
137
137
138
+ .. _missing_data.calculations :
138
139
139
140
Calculations with missing data
140
- ------------------------------
141
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
141
142
142
143
Missing values propagate naturally through arithmetic operations between pandas
143
144
objects.
@@ -771,3 +772,139 @@ the ``dtype="Int64"``.
771
772
s
772
773
773
774
See :ref: `integer_na ` for more.
775
+
776
+
777
+ .. _missing_data.NA :
778
+
779
+ Experimental ``NA `` scalar to denote missing values
780
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
781
+
782
+ .. warning ::
783
+
784
+ Experimental: the behaviour of ``pd.NA `` can still change without warning.
785
+
786
+ .. versionadded :: 1.0.0
787
+
788
+ Starting from pandas 1.0, an experimental ``pd.NA `` value (singleton) is
789
+ available to represent scalar missing values. At this moment, it is used in
790
+ the nullable :doc: `integer <integer_na >`, boolean and
791
+ :ref: `dedicated string <text.types >` data types as the missing value indicator.
792
+
793
+ The goal of ``pd.NA `` is provide a "missing" indicator that can be used
794
+ consistently accross data types (instead of ``np.nan ``, ``None `` or ``pd.NaT ``
795
+ depending on the data type).
796
+
797
+ For example, when having missing values in a Series with the nullable integer
798
+ dtype, it will use ``pd.NA ``:
799
+
800
+ .. ipython :: python
801
+
802
+ s = pd.Series([1 , 2 , None ], dtype = " Int64" )
803
+ s
804
+ s[2 ]
805
+ s[2 ] is pd.NA
806
+
807
+ Currently, pandas does not yet use those data types by default (when creating
808
+ a DataFrame or Series, or when reading in data), so you need to specify
809
+ the dtype explicitly.
810
+
811
+ Propagation in arithmetic and comparison operations
812
+ ---------------------------------------------------
813
+
814
+ In general, missing values *propagate * in operations involving ``pd.NA ``. When
815
+ one of the operands is unknown, the outcome of the operation is also unknown.
816
+
817
+ For example, ``pd.NA `` propagates in arithmetic operations, similarly to
818
+ ``np.nan ``:
819
+
820
+ .. ipython :: python
821
+
822
+ pd.NA + 1
823
+ " a" * pd.NA
824
+
825
+ In equality and comparison operations, ``pd.NA `` also propagates. This deviates
826
+ from the behaviour of ``np.nan ``, where comparisons with ``np.nan `` always
827
+ return ``False ``.
828
+
829
+ .. ipython :: python
830
+
831
+ pd.NA == 1
832
+ pd.NA == pd.NA
833
+ pd.NA < 2.5
834
+
835
+ To check if a value is equal to ``pd.NA ``, the :func: `isna ` function can be
836
+ used:
837
+
838
+ .. ipython :: python
839
+
840
+ pd.isna(pd.NA )
841
+
842
+ An exception on this basic propagation rule are *reductions * (such as the
843
+ mean or the minimum), where pandas defaults to skipping missing values. See
844
+ :ref: `above <missing_data.calculations >` for more.
845
+
846
+ Logical operations
847
+ ------------------
848
+
849
+ For logical operations, ``pd.NA `` follows the rules of the
850
+ `three-valued logic <https://en.wikipedia.org/wiki/Three-valued_logic >`__ (or
851
+ *Kleene logic *, similarly to R, SQL and Julia). This logic means to only
852
+ propagate missing values when it is logically required.
853
+
854
+ For example, for the logical "or" operation (``| ``), if one of the operands
855
+ is ``True ``, we already know the result will be ``True ``, regardless of the
856
+ other value (so regardless the missing value would be ``True `` or ``False ``).
857
+ In this case, ``pd.NA `` does not propagate:
858
+
859
+ .. ipython :: python
860
+
861
+ True | False
862
+ True | pd.NA
863
+ pd.NA | True
864
+
865
+ On the other hand, if one of the operands is ``False ``, the result depends
866
+ on the value of the other operand. Therefore, in this case ``pd.NA ``
867
+ propagates:
868
+
869
+ .. ipython :: python
870
+
871
+ False | True
872
+ False | False
873
+ False | pd.NA
874
+
875
+ The behaviour of the logical "and" operation (``& ``) can be derived using
876
+ similar logic (where now ``pd.NA `` will not propagate if one of the operands
877
+ is already ``False ``):
878
+
879
+ .. ipython :: python
880
+
881
+ False & True
882
+ False & False
883
+ False & pd.NA
884
+
885
+ .. ipython :: python
886
+
887
+ True & True
888
+ True & False
889
+ True & pd.NA
890
+
891
+
892
+ ``NA `` in a boolean context
893
+ ---------------------------
894
+
895
+ Since the actual value of an NA is unknown, it is ambiguous to convert NA
896
+ to a boolean value. The following raises an error:
897
+
898
+ .. ipython :: python
899
+ :okexcept:
900
+
901
+ bool (pd.NA )
902
+
903
+ This also means that ``pd.NA `` cannot be used in a context where it is
904
+ evaluated to a boolean, such as ``if condition: ... `` where ``condition `` can
905
+ potentially be ``pd.NA ``. In such cases, :func: `isna ` can be used to check
906
+ for ``pd.NA `` or ``condition `` being ``pd.NA `` can be avoided, for example by
907
+ filling missing values beforehand.
908
+
909
+ A similar situation occurs when using Series or DataFrame objects in ``if ``
910
+ statements, see :ref: `gotchas.truth `.
0 commit comments