6
6
7
7
import numpy as np
8
8
import pandas as pd
9
- randn = np.random.randn
9
+
10
10
np.set_printoptions(precision = 4 , suppress = True )
11
- from pandas.compat import lrange
12
- pd.options.display.max_rows= 15
11
+ pd.options.display.max_rows = 15
13
12
14
13
======================
15
14
Working with Text Data
@@ -43,8 +42,8 @@ leading or trailing whitespace:
43
42
44
43
.. ipython :: python
45
44
46
- df = pd.DataFrame(randn(3 , 2 ), columns = [ ' Column A ' , ' Column B ' ] ,
47
- index = range (3 ))
45
+ df = pd.DataFrame(np.random. randn(3 , 2 ),
46
+ columns = [ ' Column A ' , ' Column B ' ], index = range (3 ))
48
47
df
49
48
50
49
Since ``df.columns `` is an Index object, we can use the ``.str `` accessor
@@ -169,12 +168,18 @@ positional argument (a regex object) and return a string.
169
168
170
169
# Reverse every lowercase alphabetic word
171
170
pat = r ' [a-z ]+ '
172
- repl = lambda m : m.group(0 )[::- 1 ]
171
+
172
+ def repl (m ):
173
+ return m.group(0 )[::- 1 ]
174
+
173
175
pd.Series([' foo 123' , ' bar baz' , np.nan]).str.replace(pat, repl)
174
176
175
177
# Using regex groups
176
178
pat = r " ( ?P<one> \w + ) ( ?P<two> \w + ) ( ?P<three> \w + ) "
177
- repl = lambda m : m.group(' two' ).swapcase()
179
+
180
+ def repl (m ):
181
+ return m.group(' two' ).swapcase()
182
+
178
183
pd.Series([' Foo Bar Baz' , np.nan]).str.replace(pat, repl)
179
184
180
185
.. versionadded :: 0.20.0
@@ -216,7 +221,7 @@ The content of a ``Series`` (or ``Index``) can be concatenated:
216
221
217
222
s = pd.Series([' a' , ' b' , ' c' , ' d' ])
218
223
s.str.cat(sep = ' ,' )
219
-
224
+
220
225
If not specified, the keyword ``sep `` for the separator defaults to the empty string, ``sep='' ``:
221
226
222
227
.. ipython :: python
@@ -239,7 +244,7 @@ The first argument to :meth:`~Series.str.cat` can be a list-like object, provide
239
244
.. ipython :: python
240
245
241
246
s.str.cat([' A' , ' B' , ' C' , ' D' ])
242
-
247
+
243
248
Missing values on either side will result in missing values in the result as well, *unless * ``na_rep `` is specified:
244
249
245
250
.. ipython :: python
@@ -260,7 +265,7 @@ The parameter ``others`` can also be two-dimensional. In this case, the number o
260
265
s
261
266
d
262
267
s.str.cat(d, na_rep = ' -' )
263
-
268
+
264
269
Concatenating a Series and an indexed object into a Series, with alignment
265
270
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
266
271
@@ -375,7 +380,7 @@ DataFrame with one column per group.
375
380
376
381
.. ipython :: python
377
382
378
- pd.Series([' a1' , ' b2' , ' c3' ]).str.extract(' ([ab])(\d)' , expand = False )
383
+ pd.Series([' a1' , ' b2' , ' c3' ]).str.extract(r ' ( [ab ]) ( \d ) ' , expand = False )
379
384
380
385
Elements that do not match return a row filled with ``NaN ``. Thus, a
381
386
Series of messy strings can be "converted" into a like-indexed Series
@@ -388,13 +393,14 @@ Named groups like
388
393
389
394
.. ipython :: python
390
395
391
- pd.Series([' a1' , ' b2' , ' c3' ]).str.extract(' (?P<letter>[ab])(?P<digit>\d)' , expand = False )
396
+ pd.Series([' a1' , ' b2' , ' c3' ]).str.extract(r ' ( ?P<letter> [ab ]) ( ?P<digit> \d ) ' ,
397
+ expand = False )
392
398
393
399
and optional groups like
394
400
395
401
.. ipython :: python
396
402
397
- pd.Series([' a1' , ' b2' , ' 3' ]).str.extract(' ([ab])?(\d)' , expand = False )
403
+ pd.Series([' a1' , ' b2' , ' 3' ]).str.extract(r ' ( [ab ]) ? ( \d ) ' , expand = False )
398
404
399
405
can also be used. Note that any capture group names in the regular
400
406
expression will be used for column names; otherwise capture group
@@ -405,13 +411,13 @@ with one column if ``expand=True``.
405
411
406
412
.. ipython :: python
407
413
408
- pd.Series([' a1' , ' b2' , ' c3' ]).str.extract(' [ab](\d)' , expand = True )
414
+ pd.Series([' a1' , ' b2' , ' c3' ]).str.extract(r ' [ab ]( \d ) ' , expand = True )
409
415
410
416
It returns a Series if ``expand=False ``.
411
417
412
418
.. ipython :: python
413
419
414
- pd.Series([' a1' , ' b2' , ' c3' ]).str.extract(' [ab](\d)' , expand = False )
420
+ pd.Series([' a1' , ' b2' , ' c3' ]).str.extract(r ' [ab ]( \d ) ' , expand = False )
415
421
416
422
Calling on an ``Index `` with a regex with exactly one capture group
417
423
returns a ``DataFrame `` with one column if ``expand=True ``.
0 commit comments