@@ -1006,7 +1006,7 @@ first read it in as an object dtype and then apply :func:`to_datetime` to each e
1006
1006
1007
1007
.. ipython :: python
1008
1008
1009
- data = io. StringIO(" date\n 12 Jan 2000\n 2000-01-13\n " )
1009
+ data = StringIO(" date\n 12 Jan 2000\n 2000-01-13\n " )
1010
1010
df = pd.read_csv(data)
1011
1011
df[' date' ] = df[' date' ].apply(pd.to_datetime)
1012
1012
df
@@ -1373,8 +1373,7 @@ Files with fixed width columns
1373
1373
1374
1374
While :func: `read_csv ` reads delimited data, the :func: `read_fwf ` function works
1375
1375
with data files that have known and fixed column widths. The function parameters
1376
- to ``read_fwf `` are largely the same as ``read_csv `` with two extra parameters, and
1377
- a different usage of the ``delimiter `` parameter:
1376
+ to ``read_fwf `` are largely the same as ``read_csv `` with five extra parameters:
1378
1377
1379
1378
* ``colspecs ``: A list of pairs (tuples) giving the extents of the
1380
1379
fixed-width fields of each line as half-open intervals (i.e., [from, to[ ).
@@ -1383,12 +1382,46 @@ a different usage of the ``delimiter`` parameter:
1383
1382
behavior, if not specified, is to infer.
1384
1383
* ``widths ``: A list of field widths which can be used instead of 'colspecs'
1385
1384
if the intervals are contiguous.
1386
- * ``delimiter ``: Characters to consider as filler characters in the fixed-width file.
1387
- Can be used to specify the filler character of the fields
1388
- if it is not spaces (e.g., '~').
1385
+ * ``keep_whitespace ``: A boolean or a tuple(bool,bool) indicating how whitespace
1386
+ at the (start,end) of each field / column should be handled.
1387
+ * ``whitespace_chars ``: A string of characters to strip from the start and/or end
1388
+ of fields / columns when 'keep_whitespace' contains a False value.
1389
+ * ``delimiter ``: Character(s) separating columns when inferring 'colspecs'.
1389
1390
1390
1391
Consider a typical fixed-width data file:
1391
1392
1393
+ .. ipython :: python
1394
+
1395
+ data = (
1396
+ " name1 VANBCCAN 107.51 46 B 8 E \n "
1397
+ " name2 BBYBCCAN* 20.00 5 1 5 7 F E\n "
1398
+ " fullname 3VICBCCAN 22.50 3 1 C 5\n "
1399
+ )
1400
+ df = pd.read_fwf(StringIO(data),
1401
+ header = None ,
1402
+ widths = [10 ,3 ,2 ,3 ,1 ,6 ,3 ,12 ],
1403
+ keep_whitespace = (True ,False ),
1404
+ names = [" Name" , " City" , " Prov" , " Country" , " Deleted" ,
1405
+ " TransAvg" , " TransCount" , " CreditScores" ],
1406
+ # Do not convert field data to Nan:
1407
+ na_filter = False ,
1408
+ )
1409
+ df
1410
+ df.values
1411
+
1412
+ Note that the name field had trailing whitespace removed, as
1413
+ did the other text fields. However, the *leading * whitespace in CreditScores was
1414
+ preserved.
1415
+
1416
+ This is due to ``keep_whitespace `` setting of (True,False) representing (start/end) and
1417
+ ``whitespace_chars `` default of ``' ' `` and ``'\t' `` ([space] and [tab]).
1418
+
1419
+ The TransAvg and TransCount fields had automatic dtype conversion to
1420
+ float64 and int64 respectively.
1421
+
1422
+
1423
+ Parsing a table is possible (see also ``read_table ``):
1424
+
1392
1425
.. ipython :: python
1393
1426
1394
1427
data1 = (
@@ -1398,52 +1431,57 @@ Consider a typical fixed-width data file:
1398
1431
" id1230 413.836124 184.375703 11916.8\n "
1399
1432
" id1948 502.953953 173.237159 12468.3"
1400
1433
)
1401
- with open (" bar.csv" , " w" ) as f:
1402
- f.write(data1)
1403
1434
1404
- In order to parse this file into a ``DataFrame ``, we simply need to supply the
1405
- column specifications to the ``read_fwf `` function along with the file name :
1435
+ In order to parse this data set into a ``DataFrame ``, we simply need to supply the
1436
+ column specifications to the ``read_fwf `` function:
1406
1437
1407
1438
.. ipython :: python
1408
1439
1409
1440
# Column specifications are a list of half-intervals
1410
1441
colspecs = [(0 , 6 ), (8 , 20 ), (21 , 33 ), (34 , 43 )]
1411
- df = pd.read_fwf(" bar.csv" , colspecs = colspecs, header = None , index_col = 0 )
1442
+ df = pd.read_fwf(StringIO(data1),
1443
+ colspecs = colspecs,
1444
+ header = None ,
1445
+ index_col = 0
1446
+ )
1412
1447
df
1413
1448
1414
1449
Note how the parser automatically picks column names X.<column number> when
1415
- ``header=None `` argument is specified. Alternatively, you can supply just the
1416
- column widths for contiguous columns:
1417
-
1418
- .. ipython :: python
1419
-
1420
- # Widths are a list of integers
1421
- widths = [6 , 14 , 13 , 10 ]
1422
- df = pd.read_fwf(" bar.csv" , widths = widths, header = None )
1423
- df
1450
+ ``header=None `` argument is specified.
1424
1451
1425
- The parser will take care of extra white spaces around the columns
1426
- so it's ok to have extra separation between the columns in the file.
1452
+ The parser will take care of extra white spaces around the numeric data columns, and
1453
+ trailing spaces on string data, so it's ok to have extra separation between the columns
1454
+ in the file.
1427
1455
1428
1456
By default, ``read_fwf `` will try to infer the file's ``colspecs `` by using the
1429
1457
first 100 rows of the file. It can do it only in cases when the columns are
1430
1458
aligned and correctly separated by the provided ``delimiter `` (default delimiter
1431
1459
is whitespace).
1432
1460
1461
+
1433
1462
.. ipython :: python
1434
1463
1435
- df = pd.read_fwf(" bar.csv" , header = None , index_col = 0 )
1464
+ df = pd.read_fwf(StringIO(data1),
1465
+ header = None ,
1466
+ index_col = 0
1467
+ )
1436
1468
df
1437
1469
1438
1470
``read_fwf `` supports the ``dtype `` parameter for specifying the types of
1439
1471
parsed columns to be different from the inferred type.
1440
1472
1441
1473
.. ipython :: python
1442
1474
1443
- pd.read_fwf(" bar.csv" , header = None , index_col = 0 ).dtypes
1444
- pd.read_fwf(" bar.csv" , header = None , dtype = {2 : " object" }).dtypes
1475
+ pd.read_fwf(StringIO(data1),
1476
+ header = None ,
1477
+ index_col = 0 ).dtypes
1478
+
1479
+ pd.read_fwf(StringIO(data1),
1480
+ header = None ,
1481
+ dtype = {2 : " object" }).dtypes
1445
1482
1446
1483
.. ipython :: python
1484
+ :okexcept:
1447
1485
:suppress:
1448
1486
1449
1487
os.remove(" bar.csv" )
0 commit comments