Skip to content

[JIT Variant] Non-streaming Pandas approach #1014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 45 commits into
base: dev
Choose a base branch
from
Draft

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented Oct 31, 2022

A version of #646 without streaming.

Prerequisites:

  • Unless it is a documentation hotfix it should be merged against the dev branch
  • Branch is up-to-date with the branch to be merged with, i.e. dev
  • Build is successful
  • Code is cleaned up and formatted

Summary

The Python iterator-based approach to JIT has proven to be 3-5x slower than reading those signals from the database. It is well-known that: a) Python for-loops are slow and b) nested iterators incur a stack overhead, which slows computations down. This PR attempts to get around these by using Pandas to do time series computations on a per-signal basis:

  • Pandas dataframe computations call fast C code instead of Python for-loops
  • loading the data into a dataframe incurs a memory cost, but removes the nested iterator stack overhead

@dshemetov dshemetov requested a review from melange396 October 31, 2022 20:42
@dshemetov dshemetov changed the base branch from dev to jit_computations October 31, 2022 20:44
@dshemetov
Copy link
Contributor Author

Docker image tags can't have forward slashes in them, so use the tag name jit-pandas for this one.

@dshemetov dshemetov marked this pull request as draft November 11, 2022 23:39
@dshemetov dshemetov force-pushed the jit_computations branch 2 times, most recently from bd40689 to 04df084 Compare December 5, 2022 21:41
dshemetov and others added 20 commits December 5, 2022 15:37
- merge operations repo delphi_python Dockerfile into delphi_web_python
- copy Python requirements file to this directory
- copy setup.sh to this directory
* sorted requirements.txt files
* removed duplicated requirements from ./dev/docker/python/requirements.txt
* reduce runs of "pip install" when creating "delphi_web_python" docker image
* renamed requirements.txt to requirements.api.txt
* merge dev/docker/python/requirements.txt with requirements.dev.txt
* deduplicate packages in requirements.api.txt and requirements.dev.txt
* pinned packages in requirements.dev.txt and removed unused

Co-authored-by: Dmitry Shemetov <[email protected]>
dshemetov and others added 2 commits December 5, 2022 17:58
* add smooth_diff
* add model updates
* add /trend endpoint
* add /trendseries endpoint
* add /csv endpoint
* params with utility functions
* update date utility functions
- bypass run_query, use as_pandas
- bypass APrinter
- write hacky APrintery logic in handle "/" and pass tests
- queries for only base signals bypass JIT
- queries for mixed base and derived signal queries split the two in a new handler
- avoid contiguous indexing by using TimeOffset in rolling
- accept time gaps for diff
- pull DataFrame construction into the groupby for loop
* remove an extra set_df_dtypes call at the end of JIT
* remove an extra df.copy() in set_df_dtypes
* use a single .astype() call in set_df_dtypes
[JIT Variant] Improve Pandas performance by loading fewer columns into the main DataFrame
* fix issue, lag handling
* handle base signals in the JIT code together
* load a different data for base signals vs derived
@dshemetov dshemetov changed the base branch from jit_computations to dev February 23, 2023 01:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants