5
5
:suppress:
6
6
7
7
import numpy as np
8
- from pandas import *
8
+ import pandas as pd
9
9
randn = np.random.randn
10
10
np.set_printoptions(precision = 4 , suppress = True )
11
11
from pandas.compat import lrange
@@ -25,14 +25,14 @@ the equivalent (scalar) built-in string methods:
25
25
26
26
.. ipython :: python
27
27
28
- s = Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' , np.nan, ' CABA' , ' dog' , ' cat' ])
28
+ s = pd. Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' , np.nan, ' CABA' , ' dog' , ' cat' ])
29
29
s.str.lower()
30
30
s.str.upper()
31
31
s.str.len()
32
32
33
33
.. ipython :: python
34
34
35
- idx = Index([' jack' , ' jill ' , ' jesse ' , ' frank' ])
35
+ idx = pd. Index([' jack' , ' jill ' , ' jesse ' , ' frank' ])
36
36
idx.str.strip()
37
37
idx.str.lstrip()
38
38
idx.str.rstrip()
@@ -43,8 +43,8 @@ leading or trailing whitespace:
43
43
44
44
.. ipython :: python
45
45
46
- df = DataFrame(randn(3 , 2 ), columns = [' Column A ' , ' Column B ' ],
47
- index = range (3 ))
46
+ df = pd. DataFrame(randn(3 , 2 ), columns = [' Column A ' , ' Column B ' ],
47
+ index = range (3 ))
48
48
df
49
49
50
50
Since ``df.columns `` is an Index object, we can use the ``.str `` accessor
@@ -72,7 +72,7 @@ Methods like ``split`` return a Series of lists:
72
72
73
73
.. ipython :: python
74
74
75
- s2 = Series([' a_b_c' , ' c_d_e' , np.nan, ' f_g_h' ])
75
+ s2 = pd. Series([' a_b_c' , ' c_d_e' , np.nan, ' f_g_h' ])
76
76
s2.str.split(' _' )
77
77
78
78
Elements in the split lists can be accessed using ``get `` or ``[] `` notation:
@@ -106,8 +106,8 @@ Methods like ``replace`` and ``findall`` take `regular expressions
106
106
107
107
.. ipython :: python
108
108
109
- s3 = Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' ,
110
- ' ' , np.nan, ' CABA' , ' dog' , ' cat' ])
109
+ s3 = pd. Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' ,
110
+ ' ' , np.nan, ' CABA' , ' dog' , ' cat' ])
111
111
s3
112
112
s3.str.replace(' ^.a|dog' , ' XX-XX ' , case = False )
113
113
@@ -118,7 +118,7 @@ following code will cause trouble because of the regular expression meaning of
118
118
.. ipython :: python
119
119
120
120
# Consider the following badly formatted financial data
121
- dollars = Series([' 12' , ' -$10' , ' $10,000' ])
121
+ dollars = pd. Series([' 12' , ' -$10' , ' $10,000' ])
122
122
123
123
# This does what you'd naively expect:
124
124
dollars.str.replace(' $' , ' ' )
@@ -140,8 +140,8 @@ of the string, the result will be a ``NaN``.
140
140
141
141
.. ipython :: python
142
142
143
- s = Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' , np.nan,
144
- ' CABA' , ' dog' , ' cat' ])
143
+ s = pd. Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' , np.nan,
144
+ ' CABA' , ' dog' , ' cat' ])
145
145
146
146
s.str[0 ]
147
147
s.str[1 ]
@@ -157,14 +157,14 @@ regular expression with one group returns a Series of strings.
157
157
158
158
.. ipython :: python
159
159
160
- Series([' a1' , ' b2' , ' c3' ]).str.extract(' [ab](\d)' )
160
+ pd. Series([' a1' , ' b2' , ' c3' ]).str.extract(' [ab](\d)' )
161
161
162
162
Elements that do not match return ``NaN ``. Extracting a regular expression
163
163
with more than one group returns a DataFrame with one column per group.
164
164
165
165
.. ipython :: python
166
166
167
- Series([' a1' , ' b2' , ' c3' ]).str.extract(' ([ab])(\d)' )
167
+ pd. Series([' a1' , ' b2' , ' c3' ]).str.extract(' ([ab])(\d)' )
168
168
169
169
Elements that do not match return a row filled with ``NaN ``.
170
170
Thus, a Series of messy strings can be "converted" into a
@@ -178,13 +178,13 @@ Named groups like
178
178
179
179
.. ipython :: python
180
180
181
- Series([' a1' , ' b2' , ' c3' ]).str.extract(' (?P<letter>[ab])(?P<digit>\d)' )
181
+ pd. Series([' a1' , ' b2' , ' c3' ]).str.extract(' (?P<letter>[ab])(?P<digit>\d)' )
182
182
183
183
and optional groups like
184
184
185
185
.. ipython :: python
186
186
187
- Series([' a1' , ' b2' , ' 3' ]).str.extract(' (?P<letter>[ab])?(?P<digit>\d)' )
187
+ pd. Series([' a1' , ' b2' , ' 3' ]).str.extract(' (?P<letter>[ab])?(?P<digit>\d)' )
188
188
189
189
can also be used.
190
190
@@ -196,14 +196,14 @@ You can check whether elements contain a pattern:
196
196
.. ipython :: python
197
197
198
198
pattern = r ' [a-z ][0-9 ]'
199
- Series([' 1' , ' 2' , ' 3a' , ' 3b' , ' 03c' ]).str.contains(pattern)
199
+ pd. Series([' 1' , ' 2' , ' 3a' , ' 3b' , ' 03c' ]).str.contains(pattern)
200
200
201
201
or match a pattern:
202
202
203
203
204
204
.. ipython :: python
205
205
206
- Series([' 1' , ' 2' , ' 3a' , ' 3b' , ' 03c' ]).str.match(pattern, as_indexer = True )
206
+ pd. Series([' 1' , ' 2' , ' 3a' , ' 3b' , ' 03c' ]).str.match(pattern, as_indexer = True )
207
207
208
208
The distinction between ``match `` and ``contains `` is strictness: ``match ``
209
209
relies on strict ``re.match ``, while ``contains `` relies on ``re.search ``.
@@ -225,7 +225,7 @@ Methods like ``match``, ``contains``, ``startswith``, and ``endswith`` take
225
225
226
226
.. ipython :: python
227
227
228
- s4 = Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' , np.nan, ' CABA' , ' dog' , ' cat' ])
228
+ s4 = pd. Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' , np.nan, ' CABA' , ' dog' , ' cat' ])
229
229
s4.str.contains(' A' , na = False )
230
230
231
231
Creating Indicator Variables
@@ -236,7 +236,7 @@ For example if they are separated by a ``'|'``:
236
236
237
237
.. ipython :: python
238
238
239
- s = Series([' a' , ' a|b' , np.nan, ' a|c' ])
239
+ s = pd. Series([' a' , ' a|b' , np.nan, ' a|c' ])
240
240
s.str.get_dummies(sep = ' |' )
241
241
242
242
See also :func: `~pandas.get_dummies `.
0 commit comments