@@ -46,8 +46,8 @@ of elements to display is five, but you may pass a custom number.
46
46
47
47
.. _basics.attrs :
48
48
49
- Attributes and the raw ndarray(s)
50
- ---------------------------------
49
+ Attributes and Underlying Data
50
+ ------------------------------
51
51
52
52
pandas objects have a number of attributes enabling you to access the metadata
53
53
@@ -65,14 +65,43 @@ Note, **these attributes can be safely assigned to**!
65
65
df.columns = [x.lower() for x in df.columns]
66
66
df
67
67
68
- To get the actual data inside a data structure, one need only access the
69
- **values ** property:
68
+ Pandas objects (:class: `Index `, :class: `Series `, :class: `DataFrame `) can be
69
+ thought of as containers for arrays, which hold the actual data and do the
70
+ actual computation. For many types, the underlying array is a
71
+ :class: `numpy.ndarray `. However, pandas and 3rd party libraries may *extend *
72
+ NumPy's type system to add support for custom arrays
73
+ (see :ref: `basics.dtypes `).
74
+
75
+ To get the actual data inside a :class: `Index ` or :class: `Series `, use
76
+ the **array ** property
77
+
78
+ .. ipython :: python
79
+
80
+ s.array
81
+ s.index.array
82
+
83
+ Depending on the data type (see :ref: `basics.dtypes `), :attr: `~Series.array `
84
+ be either a NumPy array or an :ref: `ExtensionArray <extending.extension-type >`.
85
+ If you know you need a NumPy array, use :meth: `~Series.to_numpy `
86
+ or :meth: `numpy.asarray `.
70
87
71
88
.. ipython :: python
72
89
73
- s.values
74
- df.values
75
- wp.values
90
+ s.to_numpy()
91
+ np.asarray(s)
92
+
93
+ For Series and Indexes backed by NumPy arrays (like we have here), this will
94
+ be the same as :attr: `~Series.array `. When the Series or Index is backed by
95
+ a :class: `~pandas.api.extension.ExtensionArray `, :meth: `~Series.to_numpy `
96
+ may involve copying data and coercing values.
97
+
98
+ Getting the "raw data" inside a :class: `DataFrame ` is possibly a bit more
99
+ complex. When your ``DataFrame `` only has a single data type for all the
100
+ columns, :atr: `DataFrame.to_numpy ` will return the underlying data:
101
+
102
+ .. ipython :: python
103
+
104
+ df.to_numpy()
76
105
77
106
If a DataFrame or Panel contains homogeneously-typed data, the ndarray can
78
107
actually be modified in-place, and the changes will be reflected in the data
@@ -87,6 +116,21 @@ unlike the axis labels, cannot be assigned to.
87
116
strings are involved, the result will be of object dtype. If there are only
88
117
floats and integers, the resulting array will be of float dtype.
89
118
119
+ In the past, pandas recommended :attr: `Series.values ` or :attr: `DataFrame.values `
120
+ for extracting the data from a Series or DataFrame. You'll still find references
121
+ to these in old code bases and online. Going forward, we recommend avoiding
122
+ ``.values `` and using ``.array `` or ``.to_numpy() ``. ``.values `` has the following
123
+ drawbacks:
124
+
125
+ 1. When your Series contains an :ref: `extension type <extending.extension-type >`, it's
126
+ unclear whether :attr: `Series.values ` returns a NumPy array or the extension array.
127
+ :attr: `Series.array ` will always return the actual array backing the Series,
128
+ while :meth: `Series.to_numpy ` will always return a NumPy array.
129
+ 2. When your DataFrame contains a mixture of data types, :attr: `DataFrame.values ` may
130
+ involve copying data and coercing values to a common dtype, a relatively expensive
131
+ operation. :meth: `DataFrame.to_numpy `, being a method, makes it clearer that the
132
+ returned NumPy array may not be a view on the same data in the DataFrame.
133
+
90
134
.. _basics.accelerate :
91
135
92
136
Accelerated operations
@@ -541,7 +585,7 @@ will exclude NAs on Series input by default:
541
585
.. ipython :: python
542
586
543
587
np.mean(df[' one' ])
544
- np.mean(df[' one' ].values )
588
+ np.mean(df[' one' ].to_numpy() )
545
589
546
590
:meth: `Series.nunique ` will return the number of unique non-NA values in a
547
591
Series:
@@ -839,7 +883,7 @@ Series operation on each column or row:
839
883
840
884
tsdf = pd.DataFrame(np.random.randn(10 , 3 ), columns = [' A' , ' B' , ' C' ],
841
885
index = pd.date_range(' 1/1/2000' , periods = 10 ))
842
- tsdf.values [3 :7 ] = np.nan
886
+ tsdf.iloc [3 :7 ] = np.nan
843
887
844
888
.. ipython :: python
845
889
@@ -1875,17 +1919,29 @@ dtypes
1875
1919
------
1876
1920
1877
1921
For the most part, pandas uses NumPy arrays and dtypes for Series or individual
1878
- columns of a DataFrame. The main types allowed in pandas objects are ``float ``,
1879
- ``int ``, ``bool ``, and ``datetime64[ns] `` (note that NumPy does not support
1880
- timezone-aware datetimes).
1881
-
1882
- In addition to NumPy's types, pandas :ref: `extends <extending.extension-types >`
1883
- NumPy's type-system for a few cases.
1884
-
1885
- * :ref: `Categorical <categorical >`
1886
- * :ref: `Datetime with Timezone <timeseries.timezone_series >`
1887
- * :ref: `Period <timeseries.periods >`
1888
- * :ref: `Interval <indexing.intervallindex >`
1922
+ columns of a DataFrame. NumPy provides support for ``float ``,
1923
+ ``int ``, ``bool ``, ``timedelta64[ns] `` and ``datetime64[ns] `` (note that NumPy
1924
+ does not support timezone-aware datetimes).
1925
+
1926
+ Pandas and third-party libraries *extend * NumPy's type system in a few places.
1927
+ This section describes the extensions pandas has made internally.
1928
+ See :ref: `extending.extension-types ` for how to write your own extension that
1929
+ works with pandas. See :ref: `ecosystem.extensions ` for a list of third-party
1930
+ libraries that have implemented an extension.
1931
+
1932
+ The following table lists all of pandas extension types. See the respective
1933
+ documentation sections for more on each type.
1934
+
1935
+ =================== ========================= ================== ============================= =============================
1936
+ Kind of Data Data Type Scalar Array Documentation
1937
+ =================== ========================= ================== ============================= =============================
1938
+ tz-aware datetime :class: `DatetimeArray ` :class: `Timestamp ` :class: `arrays.DatetimeArray ` :ref: `timeseries.timezone `
1939
+ Categorical :class: `CategoricalDtype ` (none) :class: `Categorical ` :ref: `categorical `
1940
+ period (time spans) :class: `PeriodDtype ` :class: `Period ` :class: `arrays.PeriodArray ` :ref: `timeseries.periods `
1941
+ sparse :class: `SparseDtype ` (none) :class: `arrays.SparseArray ` :ref: `sparse `
1942
+ intervals :class: `IntervalDtype ` :class: `Interval ` :class: `arrays.IntervalArray ` :ref: `advanced.intervalindex `
1943
+ nullable integer :clsas: `Int64Dtype `, ... (none) :class: `arrays.IntegerArray ` :ref: `integer_na `
1944
+ =================== ========================= ================== ============================= =============================
1889
1945
1890
1946
Pandas uses the ``object `` dtype for storing strings.
1891
1947
@@ -1983,13 +2039,13 @@ from the current type (e.g. ``int`` to ``float``).
1983
2039
df3
1984
2040
df3.dtypes
1985
2041
1986
- The `` values `` attribute on a DataFrame return the *lower-common-denominator * of the dtypes, meaning
2042
+ :meth: ` DataFrame.to_numpy ` will return the *lower-common-denominator * of the dtypes, meaning
1987
2043
the dtype that can accommodate **ALL ** of the types in the resulting homogeneous dtyped NumPy array. This can
1988
2044
force some *upcasting *.
1989
2045
1990
2046
.. ipython :: python
1991
2047
1992
- df3.values .dtype
2048
+ df3.to_numpy() .dtype
1993
2049
1994
2050
astype
1995
2051
~~~~~~
@@ -2211,11 +2267,11 @@ dtypes:
2211
2267
' float64' : np.arange(4.0 , 7.0 ),
2212
2268
' bool1' : [True , False , True ],
2213
2269
' bool2' : [False , True , False ],
2214
- ' dates' : pd.date_range(' now' , periods = 3 ).values ,
2270
+ ' dates' : pd.date_range(' now' , periods = 3 ),
2215
2271
' category' : pd.Series(list (" ABC" )).astype(' category' )})
2216
2272
df[' tdeltas' ] = df.dates.diff()
2217
2273
df[' uint64' ] = np.arange(3 , 6 ).astype(' u8' )
2218
- df[' other_dates' ] = pd.date_range(' 20130101' , periods = 3 ).values
2274
+ df[' other_dates' ] = pd.date_range(' 20130101' , periods = 3 )
2219
2275
df[' tz_aware_dates' ] = pd.date_range(' 20130101' , periods = 3 , tz = ' US/Eastern' )
2220
2276
df
2221
2277
0 commit comments