Skip to content

Commit 2bfa90a

Browse files
committed
Updated documentation indicating default behaviour is to strip whitespace, and how to override. Enhances GH-issue-16950 pandas-dev#16950
1 parent 025fbd0 commit 2bfa90a

File tree

2 files changed

+11
-4
lines changed

2 files changed

+11
-4
lines changed

doc/source/user_guide/io.rst

+6-3
Original file line numberDiff line numberDiff line change
@@ -1366,8 +1366,10 @@ a different usage of the ``delimiter`` parameter:
13661366
* ``widths``: A list of field widths which can be used instead of 'colspecs'
13671367
if the intervals are contiguous.
13681368
* ``delimiter``: Characters to consider as filler characters in the fixed-width file.
1369-
Can be used to specify the filler character of the fields
1370-
if it is not spaces (e.g., '~').
1369+
Default is "`` \t``" (space and tab).
1370+
Used to specify the character(s) to strip from start and end of every field.
1371+
To preserve whitespace, set to a character that does not exist in the data,
1372+
i.e. "\0".
13711373

13721374
Consider a typical fixed-width data file:
13731375

@@ -1404,8 +1406,9 @@ column widths for contiguous columns:
14041406
df = pd.read_fwf("bar.csv", widths=widths, header=None)
14051407
df
14061408
1407-
The parser will take care of extra white spaces around the columns
1409+
The parser will take care of extra whitespace around the columns,
14081410
so it's ok to have extra separation between the columns in the file.
1411+
To preserve whitespace around the columns, see ``delimiter``.
14091412

14101413
By default, ``read_fwf`` will try to infer the file's ``colspecs`` by using the
14111414
first 100 rows of the file. It can do it only in cases when the columns are

pandas/io/parsers/readers.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -1231,6 +1231,7 @@ def read_fwf(
12311231
*,
12321232
colspecs: Sequence[tuple[int, int]] | str | None = "infer",
12331233
widths: Sequence[int] | None = None,
1234+
delimiter: str | None = " \t",
12341235
infer_nrows: int = 100,
12351236
**kwds,
12361237
) -> DataFrame | TextFileReader:
@@ -1251,7 +1252,7 @@ def read_fwf(
12511252
Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is
12521253
expected. A local file could be:
12531254
``file://localhost/path/to/table.csv``.
1254-
colspecs : list of tuple (int, int) or 'infer'. optional
1255+
colspecs : list of tuple (int, int) or 'infer', optional
12551256
A list of tuples giving the extents of the fixed-width
12561257
fields of each line as half-open intervals (i.e., [from, to[ ).
12571258
String value 'infer' can be used to instruct the parser to try
@@ -1260,6 +1261,9 @@ def read_fwf(
12601261
widths : list of int, optional
12611262
A list of field widths which can be used instead of 'colspecs' if
12621263
the intervals are contiguous.
1264+
delimiter : str, default " \t" (space and tab), optional
1265+
Character(s) to strip from start and end of each field. To
1266+
preserve whitespace, must be non-default value (i.e. delimiter="\0").
12631267
infer_nrows : int, default 100
12641268
The number of rows to consider when letting the parser determine the
12651269
`colspecs`.

0 commit comments

Comments
 (0)