Skip to content

Commit f88b228

Browse files
asishmKevin D Smith
authored and
Kevin D Smith
committed
BUG: Python Parser skipping over items if BOM present in first element of header (pandas-dev#36365)
1 parent 06036ab commit f88b228

File tree

3 files changed

+15
-6
lines changed

3 files changed

+15
-6
lines changed

doc/source/whatsnew/v1.2.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,7 @@ I/O
321321
- :meth:`to_csv` did not support zip compression for binary file object not having a filename (:issue:`35058`)
322322
- :meth:`to_csv` and :meth:`read_csv` did not honor `compression` and `encoding` for path-like objects that are internally converted to file-like objects (:issue:`35677`, :issue:`26124`, and :issue:`32392`)
323323
- :meth:`to_picke` and :meth:`read_pickle` did not support compression for file-objects (:issue:`26237`, :issue:`29054`, and :issue:`29570`)
324+
- Bug in :meth:`read_csv` with `engine='python'` truncating data if multiple items present in first row and first element started with BOM (:issue:`36343`)
324325

325326
Plotting
326327
^^^^^^^^

pandas/io/parsers.py

+4-6
Original file line numberDiff line numberDiff line change
@@ -2886,14 +2886,12 @@ def _check_for_bom(self, first_row):
28862886
# quotation mark.
28872887
if len(first_row_bom) > end + 1:
28882888
new_row += first_row_bom[end + 1 :]
2889-
return [new_row] + first_row[1:]
28902889

2891-
elif len(first_row_bom) > 1:
2892-
return [first_row_bom[1:]]
28932890
else:
2894-
# First row is just the BOM, so we
2895-
# return an empty string.
2896-
return [""]
2891+
2892+
# No quotation so just remove BOM from first element
2893+
new_row = first_row_bom[1:]
2894+
return [new_row] + first_row[1:]
28972895

28982896
def _is_line_empty(self, line):
28992897
"""

pandas/tests/io/parser/test_common.py

+10
Original file line numberDiff line numberDiff line change
@@ -2128,6 +2128,16 @@ def test_first_row_bom(all_parsers):
21282128
tm.assert_frame_equal(result, expected)
21292129

21302130

2131+
def test_first_row_bom_unquoted(all_parsers):
2132+
# see gh-36343
2133+
parser = all_parsers
2134+
data = """\ufeffHead1 Head2 Head3"""
2135+
2136+
result = parser.read_csv(StringIO(data), delimiter="\t")
2137+
expected = DataFrame(columns=["Head1", "Head2", "Head3"])
2138+
tm.assert_frame_equal(result, expected)
2139+
2140+
21312141
def test_integer_precision(all_parsers):
21322142
# Gh 7072
21332143
s = """1,1;0;0;0;1;1;3844;3844;3844;1;1;1;1;1;1;0;0;1;1;0;0,,,4321583677327450765

0 commit comments

Comments
 (0)