Skip to content

PERF: Release GIL on some datetime ops #11263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 17, 2015
Merged

Conversation

chris-b1
Copy link
Contributor

@chris-b1 chris-b1 commented Oct 8, 2015

This is a WIP, but far enough along I thought I'd share and see if the approach was reasonable.

This releases the GIL on most vectorized field accessors (e.g. dt.year) and conversion to and from Period. May be places it could be done - obviously would be nice for parsing, but I'm not sure that's possible.

@jreback
Copy link
Contributor

jreback commented Oct 8, 2015

ohh nice!

can u share some timings?

continue
out[i] = period_ordinal_to_dt64(periodarr[i], freq)
with nogil:
if periodarr[i] == NPY_NAT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move nogil outside the loop

@chris-b1
Copy link
Contributor Author

chris-b1 commented Oct 8, 2015

Here are some timings - getting a pretty nice speedup. In single-threaded case things are looking about flat.

In [1]: from pandas.util.testing import test_parallel
In [2]: dti = pd.date_range('1900-1-1', periods=100000)

In [3]: def f():
   ...:     for i in range(4):
   ...:         dti.year
In [4]: @test_parallel(4)
   ...: def g():
   ...:     dti.year

In [8]: %timeit f()
10 loops, best of 3: 25.8 ms per loop

In [9]: %timeit g()
100 loops, best of 3: 7.71 ms per loop

pandas_datetime_to_datetimestruct(dtindex[i], PANDAS_FR_ns, &dts)
out[i] = monthrange(dts.year, dts.month)[1]
pandas_datetime_to_datetimestruct(dtindex[i], PANDAS_FR_ns, &dts)
out[i] = days_per_month_table[is_leapyear(dts.year)][dts.month-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prob makes sense to define this as a c-function and make it nogil (the days_per_month......)

@jreback jreback added Datetime Datetime data dtype Performance Memory or execution speed performance labels Oct 8, 2015
@@ -3849,6 +3849,7 @@ def get_time_micros(ndarray[int64_t] dtindex):


@cython.wraparound(False)
@cython.boundscheck(False)
def get_date_field(ndarray[int64_t] dtindex, object field):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you declared field as char[:] instead would you be able to nogil the whole thing until raise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, tried that out, but cython doesn't seem to take a view of strings like that? http://stackoverflow.com/questions/28203670/how-to-use-cython-typed-memoryviews-to-accept-strings-from-python

@jreback jreback added this to the 0.17.1 milestone Oct 16, 2015
@jreback
Copy link
Contributor

jreback commented Oct 16, 2015

@chris-b1 loooks good. can you add a whatsnew note (perf) and squash.

@chris-b1 chris-b1 changed the title (WIP) PERF: Release GIL on some datetime ops PERF: Release GIL on some datetime ops Oct 16, 2015
@chris-b1
Copy link
Contributor Author

@jreback - updated

jreback added a commit that referenced this pull request Oct 17, 2015
PERF: Release GIL on some datetime ops
@jreback jreback merged commit 7e5b223 into pandas-dev:master Oct 17, 2015
@jreback
Copy link
Contributor

jreback commented Oct 17, 2015

thanks!

@jreback
Copy link
Contributor

jreback commented Oct 20, 2015

@chris-b1 can you add these (clean then make again to see them)

warning: pandas/src/period.pyx:144:24: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:145:23: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:147:55: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:148:19: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:169:24: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:170:19: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:172:15: Use boundscheck(False) for faster access
warning: pandas/src/period.pyx:172:53: Use boundscheck(False) for faster access
building 'pandas._period' extension

@chris-b1 chris-b1 deleted the tslib-gil branch October 21, 2015 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants