Skip to content

Commit 9661766

Browse files
phofljreback
authored andcommitted
BUG: read_csv interpreting NA value as comment when NA contains comment string (pandas-dev#38392)
* BUG: read_csv not converting to float for python engine with decimal sep, usecols and parse_dates * Fix comment issues for python parser * Add test * Add whatsnew * Revert "BUG: read_csv not converting to float for python engine with decimal sep, usecols and parse_dates" This reverts commit 8c2e1ca * Commit merge conflict * Improve test * Remove import * Add c tests * Remove function input * Improve note Co-authored-by: Jeff Reback <[email protected]>
1 parent 29c51af commit 9661766

File tree

3 files changed

+33
-1
lines changed

3 files changed

+33
-1
lines changed

doc/source/whatsnew/v1.3.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -217,6 +217,7 @@ MultiIndex
217217
I/O
218218
^^^
219219

220+
- Bug in :func:`read_csv` interpreting ``NA`` value as comment, when ``NA`` does contain the comment string fixed for ``engine="python"`` (:issue:`34002`)
220221
- Bug in :func:`read_csv` raising ``IndexError`` with multiple header columns and ``index_col`` specified when file has no data rows (:issue:`38292`)
221222
- Bug in :func:`read_csv` not accepting ``usecols`` with different length than ``names`` for ``engine="python"`` (:issue:`16469`)
222223
- Bug in :func:`read_csv` raising ``TypeError`` when ``names`` and ``parse_dates`` is specified for ``engine="c"`` (:issue:`33699`)

pandas/io/parsers.py

+5-1
Original file line numberDiff line numberDiff line change
@@ -2983,7 +2983,11 @@ def _check_comments(self, lines):
29832983
for line in lines:
29842984
rl = []
29852985
for x in line:
2986-
if not isinstance(x, str) or self.comment not in x:
2986+
if (
2987+
not isinstance(x, str)
2988+
or self.comment not in x
2989+
or x in self.na_values
2990+
):
29872991
rl.append(x)
29882992
else:
29892993
x = x[: x.find(self.comment)]

pandas/tests/io/parser/test_comment.py

+27
Original file line numberDiff line numberDiff line change
@@ -134,3 +134,30 @@ def test_comment_first_line(all_parsers, header):
134134

135135
result = parser.read_csv(StringIO(data), comment="#", header=header)
136136
tm.assert_frame_equal(result, expected)
137+
138+
139+
def test_comment_char_in_default_value(all_parsers, request):
140+
# GH#34002
141+
if all_parsers.engine == "c":
142+
reason = "see gh-34002: works on the python engine but not the c engine"
143+
# NA value containing comment char is interpreted as comment
144+
request.node.add_marker(pytest.mark.xfail(reason=reason, raises=AssertionError))
145+
parser = all_parsers
146+
147+
data = (
148+
"# this is a comment\n"
149+
"col1,col2,col3,col4\n"
150+
"1,2,3,4#inline comment\n"
151+
"4,5#,6,10\n"
152+
"7,8,#N/A,11\n"
153+
)
154+
result = parser.read_csv(StringIO(data), comment="#", na_values="#N/A")
155+
expected = DataFrame(
156+
{
157+
"col1": [1, 4, 7],
158+
"col2": [2, 5, 8],
159+
"col3": [3.0, np.nan, np.nan],
160+
"col4": [4.0, np.nan, 11.0],
161+
}
162+
)
163+
tm.assert_frame_equal(result, expected)

0 commit comments

Comments
 (0)