Skip to content

Commit 56dc719

Browse files
authored
SAS7BDAT parser: Faster string parsing (#47404)
1 parent 03af0ac commit 56dc719

File tree

2 files changed

+6
-2
lines changed

2 files changed

+6
-2
lines changed

doc/source/whatsnew/v1.5.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -799,6 +799,7 @@ Performance improvements
799799
- Performance improvement in :class:`BusinessHour` ``str`` and ``repr`` (:issue:`44764`)
800800
- Performance improvement in datetime arrays string formatting when one of the default strftime formats ``"%Y-%m-%d %H:%M:%S"`` or ``"%Y-%m-%d %H:%M:%S.%f"`` is used. (:issue:`44764`)
801801
- Performance improvement in :meth:`Series.to_sql` and :meth:`DataFrame.to_sql` (:class:`SQLiteTable`) when processing time arrays. (:issue:`44764`)
802+
- Performance improvements to :func:`read_sas` (:issue:`47403`, :issue:`47404`, :issue:`47405`)
802803
-
803804

804805
.. ---------------------------------------------------------------------------

pandas/io/sas/sas.pyx

+5-2
Original file line numberDiff line numberDiff line change
@@ -424,8 +424,11 @@ cdef class Parser:
424424
jb += 1
425425
elif column_types[j] == column_type_string:
426426
# string
427-
string_chunk[js, current_row] = np.array(source[start:(
428-
start + lngt)]).tobytes().rstrip(b"\x00 ")
427+
# Skip trailing whitespace. This is equivalent to calling
428+
# .rstrip(b"\x00 ") but without Python call overhead.
429+
while lngt > 0 and source[start+lngt-1] in b"\x00 ":
430+
lngt -= 1
431+
string_chunk[js, current_row] = (&source[start])[:lngt]
429432
js += 1
430433

431434
self.current_row_on_page_index += 1

0 commit comments

Comments
 (0)