Skip to content

CLN: writers.pyx cdef cleanup #23880

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Nov 25, 2018
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 11 additions & 15 deletions pandas/_libs/writers.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ ctypedef fused pandas_string:
@cython.boundscheck(False)
@cython.wraparound(False)
def write_csv_rows(list data, ndarray data_index,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we stand to gain anything by being more explicit about the type of the ndarrays here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find a type for these ndarrays that passes the test suite. Should be okay as is.

int nlevels, ndarray cols, object writer):
Py_ssize_t nlevels, ndarray cols, object writer):
"""
Write the given data to the writer object, pre-allocating where possible
for performance improvements.
Expand All @@ -36,21 +36,16 @@ def write_csv_rows(list data, ndarray data_index,
cols : ndarray
writer : object
"""
# In crude testing, N>100 yields little marginal improvement
cdef:
int N, j, i, ncols
Py_ssize_t i, j, k = len(data_index), N = 100, ncols = len(cols)
list rows
object val

# In crude testing, N>100 yields little marginal improvement
N = 100

# pre-allocate rows
ncols = len(cols)
rows = [[None] * (nlevels + ncols) for x in range(N)]
rows = [[None] * (nlevels + ncols) for _ in range(N)]

j = -1
if nlevels == 1:
for j in range(len(data_index)):
for j in range(k):
row = rows[j % N]
row[0] = data_index[j]
for i in range(ncols):
Expand All @@ -59,7 +54,7 @@ def write_csv_rows(list data, ndarray data_index,
if j >= N - 1 and j % N == N - 1:
writer.writerows(rows)
elif nlevels > 1:
for j in range(len(data_index)):
for j in range(k):
row = rows[j % N]
row[:nlevels] = list(data_index[j])
for i in range(ncols):
Expand All @@ -68,7 +63,7 @@ def write_csv_rows(list data, ndarray data_index,
if j >= N - 1 and j % N == N - 1:
writer.writerows(rows)
else:
for j in range(len(data_index)):
for j in range(k):
row = rows[j % N]
for i in range(ncols):
row[i] = data[i][j]
Expand All @@ -90,8 +85,9 @@ def convert_json_to_lines(object arr):
cdef:
Py_ssize_t i = 0, num_open_brackets_seen = 0, length
bint in_quotes = 0, is_escaping = 0
ndarray[uint8_t] narr
unsigned char v, comma, left_bracket, right_brack, newline
ndarray[uint8_t, ndim=1] narr
unsigned char v, newline, comma, left_bracket, right_bracket, quote
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Only nitpick while we're here: can you change v to val just to avoid 1-character names

unsigned char backslash

newline = ord('\n')
comma = ord(',')
Expand Down Expand Up @@ -159,7 +155,7 @@ def string_array_replace_from_nan_rep(
they are 'nan_rep'. Return the same array.
"""
cdef:
int length = arr.shape[0], i = 0
Py_ssize_t length = len(arr), i = 0

if replace is None:
replace = np.nan
Expand Down