Skip to content

Commit 629198d

Browse files
committed
Referenced issue in the test, rewrote the bugfix description
1 parent 834c851 commit 629198d

File tree

2 files changed

+21
-16
lines changed

2 files changed

+21
-16
lines changed

doc/source/whatsnew/v0.19.0.txt

+1-1
Original file line numberDiff line numberDiff line change
@@ -673,7 +673,7 @@ Performance Improvements
673673
Bug Fixes
674674
~~~~~~~~~
675675

676-
- Bug in ``pd.read_csv()`` causing a segfault when iterating over a large file in chunks (:issue:`13703`)
676+
- Bug in ``pd.read_csv()``, which may cause a segfault or corruption when iterating in large chunks over a stream/file under rare circumstances (:issue:`13703`)
677677
- Bug in ``io.json.json_normalize()``, where non-ascii keys raised an exception (:issue:`13213`)
678678
- Bug in ``SparseSeries`` with ``MultiIndex`` ``[]`` indexing may raise ``IndexError`` (:issue:`13144`)
679679
- Bug in ``SparseSeries`` with ``MultiIndex`` ``[]`` indexing result may have normal ``Index`` (:issue:`13144`)

pandas/io/tests/parser/common.py

+20-15
Original file line numberDiff line numberDiff line change
@@ -1493,8 +1493,13 @@ def test_memory_map(self):
14931493
tm.assert_frame_equal(out, expected)
14941494

14951495
def test_parse_trim_buffers(self):
1496+
# This test is part of a bugfix for issue #13703. It attmepts to
1497+
# to stress the system memory allocator, to cause it to move the
1498+
# stream buffer and either let the OS reclaim the region, or let
1499+
# other memory requests of parser otherwise modify the contents
1500+
# of memory space, where it was formely located.
14961501
# This test is designed to cause a `segfault` with unpatched
1497-
# `tokenizer.c`, Sometimes the test fails on `segfault`, other
1502+
# `tokenizer.c`. Sometimes the test fails on `segfault`, other
14981503
# times it fails due to memory corruption, which causes the
14991504
# loaded DataFrame to differ from the expected one.
15001505
n_lines, chunksizes = 173, range(57, 90)
@@ -1543,20 +1548,20 @@ def test_parse_trim_buffers(self):
15431548
csv_data = "\n".join([record_] * n_lines) + "\n"
15441549

15451550
output_ = []
1546-
for chunksize_ in chunksizes:
1547-
try:
1551+
try:
1552+
for chunksize_ in chunksizes:
15481553
iterator_ = self.read_csv(StringIO(csv_data), header=None,
15491554
dtype=object, chunksize=chunksize_,
15501555
na_filter=True)
1551-
except ValueError:
1552-
# Ignore unsuported dtype=object by engine=python
1553-
pass
1554-
1555-
for chunk_ in iterator_:
1556-
output_.append((chunksize_,
1557-
chunk_.iloc[0, 0],
1558-
chunk_.iloc[-1, 0]))
1559-
1560-
df = pd.DataFrame(output_, columns=None, index=None)
1561-
1562-
tm.assert_frame_equal(df, expected)
1556+
for chunk_ in iterator_:
1557+
output_.append((chunksize_,
1558+
chunk_.iloc[0, 0],
1559+
chunk_.iloc[-1, 0]))
1560+
except ValueError:
1561+
# Ignore unsuported dtype=object by engine=python
1562+
# in this case output_ list is empty
1563+
pass
1564+
1565+
if output_:
1566+
df = pd.DataFrame(output_, columns=None, index=None)
1567+
tm.assert_frame_equal(df, expected)

0 commit comments

Comments
 (0)