@@ -10,13 +10,13 @@ Frequently Asked Questions (FAQ)
10
10
11
11
DataFrame memory usage
12
12
----------------------
13
- The memory usage of a `` DataFrame ` ` (including the index) is shown when calling
13
+ The memory usage of a :class: ` DataFrame ` (including the index) is shown when calling
14
14
the :meth: `~DataFrame.info `. A configuration option, ``display.memory_usage ``
15
15
(see :ref: `the list of options <options.available >`), specifies if the
16
- `` DataFrame ``'s memory usage will be displayed when invoking the ``df.info() ``
16
+ :class: ` DataFrame ` memory usage will be displayed when invoking the ``df.info() ``
17
17
method.
18
18
19
- For example, the memory usage of the `` DataFrame ` ` below is shown
19
+ For example, the memory usage of the :class: ` DataFrame ` below is shown
20
20
when calling :meth: `~DataFrame.info `:
21
21
22
22
.. ipython :: python
@@ -53,9 +53,9 @@ By default the display option is set to ``True`` but can be explicitly
53
53
overridden by passing the ``memory_usage `` argument when invoking ``df.info() ``.
54
54
55
55
The memory usage of each column can be found by calling the
56
- :meth: `~DataFrame.memory_usage ` method. This returns a `` Series ` ` with an index
56
+ :meth: `~DataFrame.memory_usage ` method. This returns a :class: ` Series ` with an index
57
57
represented by column names and memory usage of each column shown in bytes. For
58
- the `` DataFrame ` ` above, the memory usage of each column and the total memory
58
+ the :class: ` DataFrame ` above, the memory usage of each column and the total memory
59
59
usage can be found with the ``memory_usage `` method:
60
60
61
61
.. ipython :: python
@@ -65,8 +65,8 @@ usage can be found with the ``memory_usage`` method:
65
65
# total memory usage of dataframe
66
66
df.memory_usage().sum()
67
67
68
- By default the memory usage of the `` DataFrame ``'s index is shown in the
69
- returned `` Series ` `, the memory usage of the index can be suppressed by passing
68
+ By default the memory usage of the :class: ` DataFrame ` index is shown in the
69
+ returned :class: ` Series `, the memory usage of the index can be suppressed by passing
70
70
the ``index=False `` argument:
71
71
72
72
.. ipython :: python
@@ -75,7 +75,7 @@ the ``index=False`` argument:
75
75
76
76
The memory usage displayed by the :meth: `~DataFrame.info ` method utilizes the
77
77
:meth: `~DataFrame.memory_usage ` method to determine the memory usage of a
78
- `` DataFrame ` ` while also formatting the output in human-readable units (base-2
78
+ :class: ` DataFrame ` while also formatting the output in human-readable units (base-2
79
79
representation; i.e. 1KB = 1024 bytes).
80
80
81
81
See also :ref: `Categorical Memory Usage <categorical.memory >`.
@@ -98,32 +98,28 @@ of the following code should be:
98
98
Should it be ``True `` because it's not zero-length, or ``False `` because there
99
99
are ``False `` values? It is unclear, so instead, pandas raises a ``ValueError ``:
100
100
101
- .. code-block :: python
101
+ .. ipython :: python
102
+ :okexcept:
102
103
103
- >> > if pd.Series([False , True , False ]):
104
- ... print (" I was true" )
105
- Traceback
106
- ...
107
- ValueError : The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
104
+ if pd.Series([False , True , False ]):
105
+ print (" I was true" )
108
106
109
- You need to explicitly choose what you want to do with the `` DataFrame ` `, e.g.
107
+ You need to explicitly choose what you want to do with the :class: ` DataFrame `, e.g.
110
108
use :meth: `~DataFrame.any `, :meth: `~DataFrame.all ` or :meth: `~DataFrame.empty `.
111
109
Alternatively, you might want to compare if the pandas object is ``None ``:
112
110
113
- .. code-block :: python
111
+ .. ipython :: python
114
112
115
- >> > if pd.Series([False , True , False ]) is not None :
116
- ... print (" I was not None" )
117
- I was not None
113
+ if pd.Series([False , True , False ]) is not None :
114
+ print (" I was not None" )
118
115
119
116
120
117
Below is how to check if any of the values are ``True ``:
121
118
122
- .. code-block :: python
119
+ .. ipython :: python
123
120
124
- >> > if pd.Series([False , True , False ]).any():
125
- ... print (" I am any" )
126
- I am any
121
+ if pd.Series([False , True , False ]).any():
122
+ print (" I am any" )
127
123
128
124
To evaluate single-element pandas objects in a boolean context, use the method
129
125
:meth: `~DataFrame.bool `:
@@ -138,27 +134,21 @@ To evaluate single-element pandas objects in a boolean context, use the method
138
134
Bitwise boolean
139
135
~~~~~~~~~~~~~~~
140
136
141
- Bitwise boolean operators like ``== `` and ``!= `` return a boolean `` Series ``,
142
- which is almost always what you want anyways .
137
+ Bitwise boolean operators like ``== `` and ``!= `` return a boolean :class: ` Series `
138
+ which performs an element-wise comparison when compared to a scalar .
143
139
144
- .. code-block :: python
140
+ .. ipython :: python
145
141
146
- >> > s = pd.Series(range (5 ))
147
- >> > s == 4
148
- 0 False
149
- 1 False
150
- 2 False
151
- 3 False
152
- 4 True
153
- dtype: bool
142
+ s = pd.Series(range (5 ))
143
+ s == 4
154
144
155
145
See :ref: `boolean comparisons<basics.compare> ` for more examples.
156
146
157
147
Using the ``in `` operator
158
148
~~~~~~~~~~~~~~~~~~~~~~~~~
159
149
160
- Using the Python ``in `` operator on a `` Series ` ` tests for membership in the
161
- index, not membership among the values.
150
+ Using the Python ``in `` operator on a :class: ` Series ` tests for membership in the
151
+ ** index ** , not membership among the values.
162
152
163
153
.. ipython :: python
164
154
@@ -167,15 +157,15 @@ index, not membership among the values.
167
157
' b' in s
168
158
169
159
If this behavior is surprising, keep in mind that using ``in `` on a Python
170
- dictionary tests keys, not values, and `` Series ` ` are dict-like.
160
+ dictionary tests keys, not values, and :class: ` Series ` are dict-like.
171
161
To test for membership in the values, use the method :meth: `~pandas.Series.isin `:
172
162
173
163
.. ipython :: python
174
164
175
165
s.isin([2 ])
176
166
s.isin([2 ]).any()
177
167
178
- For `` DataFrames ` `, likewise, ``in `` applies to the column axis,
168
+ For :class: ` DataFrame `, likewise, ``in `` applies to the column axis,
179
169
testing for membership in the list of column names.
180
170
181
171
.. _gotchas.udf-mutation :
@@ -206,8 +196,8 @@ causing unexpected behavior. Consider the example:
206
196
One probably would have expected that the result would be ``[1, 3, 5] ``.
207
197
When using a pandas method that takes a UDF, internally pandas is often
208
198
iterating over the
209
- `` DataFrame ` ` or other pandas object. Therefore, if the UDF mutates (changes)
210
- the `` DataFrame ` `, unexpected behavior can arise.
199
+ :class: ` DataFrame ` or other pandas object. Therefore, if the UDF mutates (changes)
200
+ the :class: ` DataFrame `, unexpected behavior can arise.
211
201
212
202
Here is a similar example with :meth: `DataFrame.apply `:
213
203
@@ -267,7 +257,7 @@ For many reasons we chose the latter. After years of production use it has
267
257
proven, at least in my opinion, to be the best decision given the state of
268
258
affairs in NumPy and Python in general. The special value ``NaN ``
269
259
(Not-A-Number) is used everywhere as the ``NA `` value, and there are API
270
- functions `` isna `` and `` notna ` ` which can be used across the dtypes to
260
+ functions :meth: ` DataFrame. isna ` and :meth: ` DataFrame. notna ` which can be used across the dtypes to
271
261
detect NA values.
272
262
273
263
However, it comes with it a couple of trade-offs which I most certainly have
@@ -293,7 +283,7 @@ arrays. For example:
293
283
s2.dtype
294
284
295
285
This trade-off is made largely for memory and performance reasons, and also so
296
- that the resulting `` Series ` ` continues to be "numeric".
286
+ that the resulting :class: ` Series ` continues to be "numeric".
297
287
298
288
If you need to represent integers with possibly missing values, use one of
299
289
the nullable-integer extension dtypes provided by pandas
@@ -318,7 +308,7 @@ See :ref:`integer_na` for more.
318
308
``NA `` type promotions
319
309
~~~~~~~~~~~~~~~~~~~~~~
320
310
321
- When introducing NAs into an existing `` Series `` or `` DataFrame ` ` via
311
+ When introducing NAs into an existing :class: ` Series ` or :class: ` DataFrame ` via
322
312
:meth: `~Series.reindex ` or some other means, boolean and integer types will be
323
313
promoted to a different dtype in order to store the NAs. The promotions are
324
314
summarized in this table:
@@ -376,18 +366,19 @@ integer arrays to floating when NAs must be introduced.
376
366
377
367
Differences with NumPy
378
368
----------------------
379
- For `` Series `` and `` DataFrame ` ` objects, :meth: `~DataFrame.var ` normalizes by
369
+ For :class: ` Series ` and :class: ` DataFrame ` objects, :meth: `~DataFrame.var ` normalizes by
380
370
``N-1 `` to produce unbiased estimates of the sample variance, while NumPy's
381
- `` var ` ` normalizes by N, which measures the variance of the sample. Note that
371
+ :meth: ` numpy. var ` normalizes by N, which measures the variance of the sample. Note that
382
372
:meth: `~DataFrame.cov ` normalizes by ``N-1 `` in both pandas and NumPy.
383
373
374
+ .. _gotchas.thread-safety :
384
375
385
376
Thread-safety
386
377
-------------
387
378
388
- As of pandas 0.11, pandas is not 100% thread safe. The known issues relate to
379
+ pandas is not 100% thread safe. The known issues relate to
389
380
the :meth: `~DataFrame.copy ` method. If you are doing a lot of copying of
390
- `` DataFrame ` ` objects shared among threads, we recommend holding locks inside
381
+ :class: ` DataFrame ` objects shared among threads, we recommend holding locks inside
391
382
the threads where the data copying occurs.
392
383
393
384
See `this link <https://stackoverflow.com/questions/13592618/python-pandas-dataframe-thread-safe >`__
@@ -406,7 +397,7 @@ symptom of this issue is an error like::
406
397
407
398
To deal
408
399
with this issue you should convert the underlying NumPy array to the native
409
- system byte order *before * passing it to `` Series `` or `` DataFrame ` `
400
+ system byte order *before * passing it to :class: ` Series ` or :class: ` DataFrame `
410
401
constructors using something similar to the following:
411
402
412
403
.. ipython :: python
0 commit comments